Using Buildkite for scaling out and running parallel CI steps

Learn how Redpanda engineers use Buildkite and GitHub to automatically trigger multiple instances of CI steps running in parallel.

By
on
June 23, 2022

At Redpanda, we want to always provide an experience that is fast, simple, and productive for developers. That applies to our own team of engineers, too. When considering how we could achieve a more stable continuous integration (CI) pipeline, we wanted that same experience: fast, simple, productive. By running multiple instances of our pipeline steps in parallel on our CI platform, Buildkite, we can now run multiple repetitions of the same Buildkite step and use only the amount of time needed for a single step.

Today, our devs can kick off any number of builds in parallel simply by attaching a label to their PR like “ci-repeat-X.” In the rest of this post, I’ll discuss how we made this easy dev experience possible. I discuss how we achieve repeatable builds by taking advantage of Buildkite’s parallelism attribute and pre-command hook, in combination with GitHub labels on pull requests for triggering parallel builds.

Buildkite parallel programming

When Buildkite introduced a new feature to run multiple repetitions of a build step in parallel, we took advantage of this by adding an attribute in our CI pipeline configuration called parallelism. We use this attribute to define the desired level of parallelism. We started off using a constant value of 1.

However, the challenge is to have the parallelism value configurable so that users can enable/disable it whenever they want, providing a value of their choice that represents the number of the parallel instances per step. Ideally, we want to grant the ability to developers to configure this number outside Buildkite’s context. A good candidate for that is GitHub, but we need a “bridge” between it and Buildkite. The bridge cannot be configured in a step’s command querying GitHub’s pull request because it would be too late to configure the parallelism attribute of a step at runtime. In seeking a way to do this, we discovered Buildkite’s pre-command hook.

Buildkite pre-command

Buildkite includes hooks that we can enable in order to have them automatically executed before a step’s command is initiated (pre-command), or after a step run (post-command). We took advantage of the pre-command hook to discover the value that the user wants to configure as the parallelism value. By doing this, we created a way to run any bash script we want before a pipeline’s step gets executed. This means that we can tweak a variable in the pre-command hook in order to update the parallelism attribute of a Buildkite step.

Having done this, we addressed the next natural question: what is the most productive process for users to follow in order to update this variable when opening a pull request? Our options were:

  • Comment on the pull request (e.g. /hey-buildkite repeat 5)
  • Edit a file to update the value and push the code
  • Add a GitHub label (e.g. ci-repeat-5)

Our choice trails are:

  • Productive
  • Easy-to-use
  • Clean

If we go with choice number one, we will end up having a big pull request conversation with many scattered comments that clutter up what should be a conversation between developers about a pull request. Thus, we didn’t select this option because it violates the second and third trails.

For choice number two, we would have to answer the questions:

  • What happens when we want to merge the PR?
  • Do we want our default branch to be based on this file and run in parallel? (If so, what’s the impact on our cost?)

Thus, the questions raised by option two also suggested it wasn’t the best course to take. Besides, it violates the second choice trail because the user has to push code each time they want to update the request level of parallelism.

So, we decided to go with the third and best choice: add a GitHub label. Using this process, users who desire to run their PR tests in parallel need only to add a label in their PR and rebuild the pipeline.

The workflow

The parallelism attribute is set in each Buildkite step of the pipeline.yml configuration. Its value is dynamically provided via an environment variable called PARALLEL_STEPS. We just have to modify this environment variable using the pre-command hook.

We wrote a script to run before the steps are loaded into Buildkite that queries the GitHub API. This allows us to get the labels of this PR (Buildkite provides the PR number as environment variable BUILDKITE_PULL_REQUEST) and match those against the pattern ci-repeat-NN. Thus, we have the whole workflow ready: the hook queries and gets the specific label, discovers the number, and exports it as the environment variable PARALLEL_STEPS.

What about the cost? Shouldn’t we require users to delete this label after their job is done? Otherwise, won’t every commit have Buildkite run multiple steps in parallel? As mentioned, we aim to increase developer productivity. Requiring users to delete the label after the job is a manual step, and we avoid these as much as we can. When the pre-command discovers the label, then it’s useless to keep it on the PR, so the bot we’re using can delete it. Thus, we decrease the manual steps required of the developer and improve the cost, just by deleting a label.

Building with DevProd in mind

In summary, our process for running multiple instances of CI steps in parallel was created with developer productivity in mind. By parallelizing and running multiple instances of CI steps on Buildkite, we decreased our build’s total running time and improved the stability of CI testing in Redpanda.

Learn more about Redpanda and download our binary on GitHub. Interact with our developers directly by joining our Slack Community to ask questions about our CI steps or anything else. For more information about Redpanda and its features, browse our documentation.

No items found.

Related articles

VIEW ALL POSTS
A tour of Redpanda Streamfest 2024
Jenny Medeiros
&
&
&
December 18, 2024
Text Link
What is a data streaming architecture?
Redpanda
&
&
&
November 14, 2024
Text Link
Batch vs. streaming data processing
Redpanda
&
&
&
November 8, 2024
Text Link