Pipeline Tools

08-07-2023

Koopa in a Pipe

Background

Early in my career I read The DevOps Handbook. The main point I took away was that DevOps is specifically when "dev" and "ops" are the same people. Often organizations will claim to do DevOps, but really they've just created "automated ops" where the ops team is composed of engineers that focus on repeatable infrastructure. This misses the main thrust of what I'll call "real" DevOps.

DevOps works when the same people writing the app are writing the infra that the app lives on. To do otherwise is akin to having one team write a PR and then an unrelated team unfamiliar with the codebase approving the PR. From my understanding, a large body of evidence supports that this definition of DevOps is successful, while merely rebranding your ops team and adding automation does not solve the root problem. (Often a centralized ops team is seen as a cost savings, but they become a bottleneck and then bleed much more money than was saved by that consolidation).

Later in my career, I read a "sequel" of sorts to The DevOps Handbook: Accelerate. This book posits that per much research, the leading and best indicator of team maturity and strength is the amount of time it takes for a piece of code to go from being committed to the master branch to being in production. Collectively these books have impressed upon me the importance of developers owning both their infrastructure and their CI/CD (Continuous Integration / Continous Deployment).

My first experience with CI/CD was with Jenkins, or "Jank-ums" as we'd refer to it at the second company I worked at that used it. Using Travis on some personal projects felt like an incredible upgrade in terms of ease of use and lack of the need of constant oversight to keep the pipelines running. When I moved off of Travis and onto Github Actions for all my personal projects, as well as occasional work projects, it again felt like a large upgrade. The next company I worked at introduced me to Concourse.

Philosophically, I should love Concourse, and I do prefer it to Jenkins (though not Github Actions). It attempts a very unixy design and tries to do just one thing at a time. Every step gets its own immutable container and passing data between containers is done through a strict interface. In practice though it's unintuitive and seemingly resource intensive and somewhat flaky. I still enjoy it, but new devs almost always need to go through a tutorial to understand it, and working with Github Actions always feels refreshingly simple in comparison.

When I was introduced to Concourse by my then-architect, he also showed me his "concourse templating tool". This node tool took a json config and a bunch of templated out yaml files, and then interpolated the variables in. This basic setup gave us unified, consistent, and dynamic pipelines. I quickly adopted the tool and worked to un-hardcode a number of variables as well as add new features. This work let me use the tool on my team as well as his old team.

Much later, after the architect had left and I had slowly evolved the tool, I completely re-wrote the tool and rebranded it to Pipeline Tools (PT). The rewritten tool generated the same yaml (at first) but did so in a completely different way. Now, instead of a node tool that did string interpolation, I used Kotlin to create a Domain Specific Language (DSL) that generated concourse specific yaml. This DSL was then leveraged by a "Pipeline Tool" to build a pipeline from a given config. Thus, PT was born.

Anatomy of a Tool

Pipeline Tools has been adopted by a number of teams in my area. The generator creates standardized pipelines that meet our various norms and requirements. It allows devs to quickly spin up CI/CD for a project without needing to intricately understand concourse. It's shared nature means any update becomes a potential update for everyone. However, unlike many pipeline tools, PT is not an enforcement mechanism. Any team can submit and review a PR. This means the council we've built up can guide PT, but not control it. Teams can modify PT to fit their needs and share those changes to others. This prevents bottlenecks, but more importantly allows teams to still operate with autonomy and flexibility.

PT is effectively broken into four layers: the DSL, the "parts" created by the DSL, the Pipeline Generators that use the DSL, and artifacts that run in concourse (like shell scripts, jars, and docker containers). The DSL knows nothing but how to build concourse yaml "parts". This specificity allows it to be reused in different tools, and means we can build in helper methods for common uses of the DSL. The parts are data classes that store data and know how to turn that data into yaml. (Often a part is composed of other parts, and delegates that part of yaml generation to its components). Generators can then leverage the DSL in different ways to create drastically different pipelines.

Because the DSL can be easily TDDed, generators can focus on pipeline generation and ignore what the yaml will eventually look like. This means that devs can focus on implementing a new area of DSL, or on creating a new pipeline, or on simply configuring and calling a generator; each as a clean abstraction. This pattern has proven really effective at allowing rapid production of new features.

As more teams have used the tool, I added the concept of not a single generator, but a generator per "kind" of pipeline. This allowed us to support radically different styles of pipelines without a sea of config variables and if statements, while still keeping each generator fairly dynamic within its realm. It's been really fun to grow the tool and also watch the contributors base rise, as others have added features and fixed bugs.

PT and the Future

As the department looks to move to hosting our apps on K8s instead of an older hosting platform, we've decided to leverage PT as our primary migration tool. Deployments to the previous platform were handled by PT generating an app manifest from its config file, and then pushing that manifest. This generation was simple string interpolation, but still dynamic enough for our purposes. Having built the concourse yaml DSL, I realized it'd be basically the same thing to build a DSL for K8s yaml. We had already decided on a "GitOps" approach, where another tool would apply committed K8s yamls; now we could automate the generation of those yamls. Because we had a core of standard use cases, I could model out the DSL for the main path, and then improve it over time as more "snowflake" apps came on board. Then teams could change their deployment platform just by adding a few additional config settings to their PT config, and target the new platform.

I've had a blast really leaning into Kotlin DSLs, and it's been really rewarding to watch a tool I (re)designed gain so much traction and be so quick to iterate on. I love that things that I can dig into in my free time (I played a lot with DSLs in Quest Command before using them in PT) can affect and influence my work, and I love having the freedom to grow an idea and see it improve tooling across teams.