Date: December 19, 2023

While FivexL’s specialization is building AWS infrastructure as code for startups, we are frequently asked about best practices for CI/CD, branching strategies, and development workflow organization. This document aims to gather collective experience and answer the most commonly asked questions we get from the customers. So our consultants can share this document and then have a conversation based on it instead of explaining all the ideas all over again. Also, it is easy to forget to mention something; thus, through this document, we attempt to maintain a consistent baseline of ideas.

Strong Opinions, Weakly Held

You will likely find the statements above strongly worded and highly opinionated. They are. Those opinions are based on years of experience, but at the same time, we do not see them as dogmas. We see them as a starting point. This is what we know. But we do not know everything and keep our eyes open to nuances and new ways of thinking. We might be wrong. There could be circumstances where the thinking outlined below does not apply. This document is a way to share our background and where our ideas come from. What they are based at. If you have a different perspective and background, reading through this document will help to highlight that you see things differently. And we would love to learn why and evolve our thinking. With this out of the way, let’s get started.

Principles for building CI/CD pipelines

Optimise for developer feedback. The speed is king. It means keeping builds and tests fast, shifting as much as possible left, and running things in parallel. The quicker we can give the developer feedback on their change, the less context-switching they will have. Also, the faster the pipeline, the more throughput it has. You will see why throughput is vital in the following principle.
One change at a time. You must get an entire pipeline run for every commit and not accumulate commits on any single branch. If you deliver to prod one change at a time, then if something is not working, it is clear where the problem is. You either revert the change or fix it with the next commit. As soon as you start to batch changes together, it will become increasingly more work to figure out the root cause of the issue and fix the problem fast. It is where the test and dev teams start to create issues/tickets and send them back and forth, wasting a lot of time. You want to avoid that. Remember, one change at a time.
Do not clog the pipe. You have to nurture and develop the culture of getting changes to prod as they merge. Changes left on stage branch will slow down the person who wants to deliver after you and potentially prevent them from delivering, thus aggravating the problem. Implement technical measures to discover clogged pipes. For example, run a daily job comparing main and release branches and notify for any stuck changes. The longer changes stay undelivered to prod, the more urgent it is to resolve. Tell developers they are done once their changes are in production and working. It is not enough to merge changes to the main branch. They have to get them all the way through.
Split systems into smaller pieces to maintain the throughput of the pipelines. The number of developers committing to the same repository will determine the requirements for pipeline throughput. You want to get one change all the way to production before the next one arrives. Otherwise, reverting becomes tricky, and you will start to batch changes together and potentially clog the pipeline. Do not clog the pipe. So, if you have one developer committing to one repo, you have all the time in the world. However, you should still prioritize speed to give fast feedback. If there are ten developers, then you are much more constrained. What to do? Split your system into smaller pieces and have a separate pipeline for each repository. Thus, you spread commits across many repositories, decreasing the requirements for throughput. This also allows you to deliver pieces of the system separately, reducing the risk of breaking production. Refer to Little’s law for more information.
If something breaks, fix it quickly. Or if something is flaky - deal with it. Need an explanation?
What gets measured gets improved. You have to measure pipeline end-to-end execution time and keep track of it. Another metric is red builds not caused by the developer changes, i.e., issues caused by automation - you have to monitor the number of red builds and see what causes them, sort out the build failure cause, and address the top 3. Do this continuously and work on the top 3. And finally, measure the commitment rate to pipeline throughput per repo. That will highlight issues to work on in the system.
Some of the tests might be long and expensive to run. Leave those for pre-production environments, and at the same time, keep pipeline throughput in mind.
Make security assessment part of the process. Developers have to focus on implementing business requirements, so it would be hard to keep all security requirements in mind; thus, it would be good if we provided additional feedback to reduce cognitive load.

Questions and answers

What about GitOps?

As you can see, FivexL maintains a relatively conservative position, sticking with the classic pipelines driven by centrally managed workflow orchestrators and version control events. We are on the lookout for emerging ways of doing CI/CD. Our primary concern with GitOps is that it breaks a delivery flow into several pieces, making it hard for developers to understand where they change in the process. You would have your build/test stage done in one place, and then a GitOps tool would take over for deployment. Another concern is that GitOps-based tools are eventually consistent, i.e., tools like ArgoCD will keep attempting to deploy even if something fails. We prefer a deterministic approach - the pipeline either passes or not. There is no middle state. How would developers know if their change reaches production or gets stuck somewhere? When is it done? How do you measure pipeline end-to-end throughput and optimize it? Is it my change that broke the prod or something else?
Another concern is that GitOps tools frequently poll APIs, creating unnecessary load and potentially causing additional expenses. For example, if a GitOps-like tool (for example, Crossplane) is used to manage cloud resources. AWS charges you for get/list operations; additional requests would also cause more records in the CloudTrail log, resulting in higher storage costs and more complex analysis.

What if we have too many developers committing to the same repo? Have you heard about Merge Queue? Is that a good idea?

If some software is being changed repeatedly, it is a sign of the underlying architectural problem. There is some stinky, highly coupled piece of code hidden somewhere there. It would benefit everyone to look closer and understand the root cause of the high rate of change. As written above, it is possible to split this repo into pieces so you can spread the commit rate across the repos. If this is not possible, consider looking into merge queues. But understand that a merge queue is not a solution - just a bandage and the problem will become more complex as time progresses.

What is your opinion on monorepos?

Everything said above should make it clear that the idea of monorepos is contrary to the principle outlined above. Why do people want to do that in the first place? The most frequent argument is that you can keep a version of the whole system tied to one commit. This sounds like a good thing, isn’t it? If your system consists of many microservices, it shouldn’t matter since each service is independent of the others, and you do not need to maintain a coupled baseline of them. However, suppose you have a distributed monolith (many services, but you can’t deploy separately, and deployments must be coordinated). In that case, having a version of the system might be handy, but we urge you to avoid distributed monolith.
Imagine that you have a piece of software that is highly confidential, and you want to restrict access to it. How would you do that with monorepo? You would need specialized tooling that would allow cloning only part of the repository, but then what is the point of having one repository for everything?
Imagine you changed something in the monorepo. Now, it is time to deliver. Will you rebuild everything and re-run every test on every change? How expensive and slow that would be? To avoid that, you have to have a sophisticated build and CI/CD system that would be able to determine what has changed and what needs to be built, adding a lot of unnecessary complexity to maintain. Ok, we got through that, but we still have only one pipeline to go to production, so everyone would have to wait for those changes to clear the pipeline before the next change could be attempted. Otherwise, you could have many pipelines for the same repository, but then you are still bound to commit rate… So why not just split it into multiple repositories and make life for yourself easier?
Another argument people use is that code sharing becomes easier since all the code is in the same repository. Well, that is true. Otherwise, you must maintain a private registry and publish reusable parts as libraries. At the same time, you should also consider that when all the code is in the same place, it will encourage tight coupling between code parts. It becomes too easy to pick something from one directory and something from another. The next thing you know, you have a distributed monolith. Also, imagine you want to retire one of the services. With a separate repository per service, you just drop the repo and forget about it. In the case of monorepo, you need to track down and untie all the dependencies accumulated over the years. Who has time to do that? As a result, as time comes, you will carry on more and more dead/legacy code that will continuously slow you down, eventually bringing you to a standstill.

Additional reading
https://www.infoworld.com/article/3585176/no-you-dont-have-to-run-like-google.html

How about GitFlow?

It is a bad idea, and when will people stop beating this dead dog? Even the author of the initial article published an update and wrote that you should think twice before adopting it. What to use instead? Consider trunk-based development as a starting point and modify it based on your needs. Note that CI/CD principles above pretty much assume you do trunk-based development and are highly aligned with its principles.

Additional reading
https://nvie.com/posts/a-successful-git-branching-model/
https://georgestocker.com/2020/03/04/please-stop-recommending-git-flow/

Andrey Devyatkin

Principal Cloud Engineering Consultant Co-Founder of FivexL Read More

FivexL’s Current Best Thinking on Branching and CI/CD Pipelines

Strong Opinions, Weakly Held

Principles for building CI/CD pipelines

Questions and answers

What about GitOps?

What if we have too many developers committing to the same repo? Have you heard about Merge Queue? Is that a good idea?

What is your opinion on monorepos?

How about GitFlow?

Andrey Devyatkin

Tags

Share Blog

FivexL’s Current Best Thinking on Branching and CI/CD Pipelines

Strong Opinions, Weakly Held

Principles for building CI/CD pipelines

Questions and answers

What about GitOps?

What if we have too many developers committing to the same repo? Have you heard about Merge Queue? Is that a good idea?

What is your opinion on monorepos?

How about GitFlow?

See also in AWS, CI/CD and fivexl-way

Andrey Devyatkin

Tags

Share Blog