Azure Cheatsheet CI/CD CLI Commands DevOps dotnet github IDEAS JavaScript Linux Node.js Pipelines programming React Security SonarQube TypeScript

Monorepo architecture, CI/CD and Build pipeline

Monorepo architecture, CI/CD and Build pipeline

Monorepo architecture can have advantages over polyrepo (or multi-repo) in certain cases. Though, implementation of a successful monorepo is not hassle free especially when it come to automation, CI/CD and Build pipeline. Examples such problems are long running tests, and unnecessary release of unchanged packages.

I am going to give you some solutions based on my experiences to make automations more optimal.

Monorepo for Microservices

Obviously when we are talking about Monorepo architecture we are NOT talking about monolithic architecture.

Monorepo architecture vs Monolithic architecture

Monolithic architecture (or in short monolith) is when you have a single solution that may contain one or several projects. These projects are build dependencies of a single project When you build your solution, you build those dependences first and use it to build your main project. This can be any kind of solution, even a full stack project (ex. MVC application). Monolithic architecture in certain cases become too large to be handled from different perspectives.

Monolithic architecture
Monolithic architecture

Monorepo architecture is when multiple solutions sit in a same repository. Think them as small monoliths that that together servers some purpose. These can be services of a microservice architecture, or they can have another relation to each other. ex. infrastructure as code and application.

Microservices architecture on a Monorepo

Although, some people are religious about separating repositories, there are cases that a putting all (or some of) the services and codes in the same repo can be useful.

Simplified microservice
Simplified microservice

The original – by the book assumption of Microservices architecture was that multiple teams, with different domain knowledge and technology expertise in a huge organization are working with different microservices. These people don’t necessarily care about each other’s service, as long as it respects the agreed upon contract (ex. API Schema). Well, this assumption is not always true in real world. Teams are utilizing microservices for variety of reasons like scaling, using serverless for some somservices, or even for monolith sounds yesterday!

When to use a monorepo

Monorepo is really useful in case there is a team that works with multiple services, specially when services has very similar code base. Sometimes there are shared in house libraries (like communication to your message queue) and you don’t want to go the extra step create a private package (like nuget).

Moreover, use of monorepo is when you want to do processes in a more structured and unified way. Things like CI/CD process, process of creating packages, testing releasing to production server. unified rules example for formatting and code coverage.

Another advantage of monorepo over poly-repo is encouraging teams to cooperate and engage in each others source code. This promotes the innersourcing , from engaging in each others pull request review and feature requests to contributing code to each others code base.

There is a case study at Google where you can read more about Advantages and Disadvantages of a Monolithic Repository

what to be careful in a monorepo?

As I mentioned the first things is dependency. Dependency is not a bad thing, tightly coupling, though is a wicked thing! Think like this, if you change something outside the common ariea it should not affect any other project. On the other words, make sure services are not depended on each other.

Also, you should decide what happens when common libraries updated; Who supposed to update depended projects and what is the process? Do all dependent reports get updated automatically? How do you handle if a some of services want to stay in older versions of your library?

The next thing consider is security! more people accessing everything means more possible for a hacker to change sensitive things and deployit! These days with everything automated, we should be more careful.

While talking about security, it worth mentioning that we should see to it that no secret is ever committed to the repository (things like connection strings and API keys). Even if you delete them hackers to scan the git history and extract them from previous commits.

The long running test are another problem. If every time you run a test you need to test all other solutions/services, as well as all integration tests, you probably waiting longer and longer as the repo gets bigger. This also should be thought of when you setup your monorepo.

Another thing that can go wrong is when repository gets problem, like when git status shows files as changed by mistake, or some workflows doesn’t do their job well. This interrupts all teams working with all projects and can be really critical. You need to have a team that has responsibility to fix repo problems with very high priority.

CICD process in Monorepo architecture

If you decided to go for monorepo you need to optimize your cicd process. basically we want to run tests and build only projects that has been changed and not everything that is in the repo.

Decide about structure

The first step is to have a nice folder structure. You need to pre decide about things like where IaC and documentations is going to end up (near their projects or totally separate in root folders ) and come up with pre agreed upon structure. Here is an example :

-- Root |--------- .github |--------- tools |--------- common |--------- starter-templates |--------- back-end ||------------------ dot-net |||------------------------ project-abc |||------------------------ project-xyz ||------------------ node-js |||------------------------ ... |--------- front-end ||------------------ node-js |||------------------------ project-abc |||------------------------ project-xyz |--------- back-end |--------- iac |--------- docs

Use Starter Templates

If you choose monorepo, it make very much sense that all projects of a same type have same structure. For example make a template for all DDD project in dotnet, make another one for CRUD projects in dotnet and …

Make sure all the boiler plate building blocks (like logging, diagnostics, message queue, dependency injection and so on) are similar (as much as it makes sense). This way you get two benefits

  • when starting new project you skip putting many hours on boiler plates
  • programmers can jump to other projects and start contributing, without need to learn another architecture and environment
  • it is much easier and safer to share CI/CD processes.

Trigger pipelines for what has changed

Although everything is in a same repo, most of the time you should consider each solution as a single entity. This mean to only run testing, building, versioning, and releasing for solutions that has changed .

To achieve this you have 2 alternative, first to add triggers manually to pipelines, second is to make some kind of automatic process to run pipelines based on chained folders .

Separate workflow for each path

In github workflows for example, there is an option of path. If you use both the branches filter and the paths filter, the workflow will only run when both filters are satisfied. So if your folder is changed the workflow will start.

on: pull_request: branches: - main paths: - 'dot-net/project-abc'
Code language: YAML (yaml)

Then you can make workflows specific to each folder project folder. This functionality though does not exist on azure devops but GitLab has rules:changes.

For the path you can also use wildcard to share the wordflow between many projects. This is easier when you have good predefined structure. For example for our folder structure a few line above a path like : ‘**/dot-net/**’ triggers the workflow when a dotnet project changes.

Automatically detect changes in repo run cicd pipelines specific to what has changed

Another option is though to detect the changes and automatically run pipeline based on what has changed. To do that you can use git functionality. Though for shared CI/CD to work correctly you need to :

  • have an strict folder structure (as mentioned before)
  • projects are started from templates so they are to some degree similar to each other.

Test if the contents of a directory has changed

The main thing you need to play with is :

git diff --quiet HEAD^ HEAD -- "./a-folder-directory" || echo true
Code language: PHP (php)

HEAD^ refers to the first parent, that always is the left hand side of the merge. -- is just there to expect path after it. Thus, line above means we compare current head and first parent for the given directory. If the folder is changed we echo the word “true”.

Loop through the whole mono repo

When you do this type of automation, you want to get the folder paths , you should get a little bit creative; one way it to scan for files like *.sln , *.csproj, pom.xml, build.gradle, package.json, dockerfile ,and etc. Here is a loop looking for dotnet solution files :

find . -name "*.sln" -print0 | while IFS= read -r -d '' f; do \ DIR=$(dirname "${f}"); DIR_CHANGED=$(git diff --quiet HEAD^ HEAD -- "./$DIR" || echo true); if [ "$DIR_CHANGED" != true ] ; then continue; fi # Do your stuff here done;
Code language: Bash (bash)

Lets us dissect the code above as it is relatively hard to understand bash script

find . -name "*.sln"
Code language: JavaScript (javascript)

This part is kind of straight forward it gets a list of all files with sln extension, in current directory (.) and all sub directory .

The -print0 separate the file names with a null character, makes it ready for the next step that is the loop. The rest of line 1 creates loop for each file name and put each file name in variable f

ex: ./dot-net/project-abc/my-project.sln

line 2 : now we need path to the folder that contains the solution file. dirname extract that from the file name ex: ./dot-net/project-abc/

Line 3 : as described tests if things in this folder has been changed . note that echo true is how you assign “true” to PATH. You cannot skip echo.

lines 4,5,6 : if folder is not change it goes to the next iteration of the loop (node that if you want you can also do more between these lines)

line 7 : now that you know things inside current path is changed ($PATH points to it ), you can do all you CI/CD process here. (example you want to test , build , publish , create docker image and … ) . Please feel free to use a shell function to keep your code clean and readable.

Some more goodreads


Leave a Reply

Your email address will not be published.