In this article, we’ll take you on our journey of implementing the GitOps methodology.
We chose Kubernetes to host our main application. We selected Google Kubernetes Engine from Google Cloud Platform. Our entire infrastructure was built “as code” using Terraform from Hashicorp. The continuous Integration tool is the SaaS version of Circle CI. In order to properly restrict access to our Kubernetes clusters, we didn’t apply descriptors directly from our CI.
As we worked on this, a new emerging methodology came to our attention: GitOps.
What is GitOps?
GitOps is the practice of using Git to store declaratively defined desired state and Continuous Delivery agents to automate the reconciliation of current state to desired state; effectively decoupling CI and CD.
The GitOps working group defines GitOps methodology in five principles:
- Declarative Configuration: All resources managed through the GitOps workflow must be expressed declaratively.
- Versioned and Immutable: Desired state is stored in a repository that supports immutability, versioning and version history such as Git.
- Deployed automatically: Delivery of the declarative descriptors, from the repository to the runtime environment is fully automated.
- Continuously reconciled: Software agents maintain system state by continuously applying the resources described in the declarative configuration as it's the single source of truth.
- Operated in a closed loop: Actions are performed on divergence between the version controlled declarative configuration and the actual state of the target system, creating a closed loop.
GitOps, how do we do it?
Basically, all you need is two Git repositories. One to store your application code that will produce immutable artefacts via the CI and another repository where you’ll store your Kubernetes descriptors of your application components. Docker image versions can be updated automatically through your CI or through a Pull Request so you have a validation gate before updating a critical environment. Then, the GitOps controller will monitor this second Git repository (which hosts your Kubernetes descriptors) and automatically apply each change to keep the current state identical to the desired state stored in Git.
This methodology allows you to benefit from all the advantages of Git:
- From a simple Git revert you can roll back your changes
- Git is well understood by development teams, so it became an enabler for DevOps contribution
- Pull Requests allow different teams to validate changes (each change is approved, authorised and auditable)
- From a simple commit, you can rollout an update on one or multiple environments
- You don’t need elevated rights on Kubernetes to make changes in your cluster
At this point, you should start to see the benefits of implementing the GitOps methodology. However, it’s fairly new and there are different tools and ways to implement it.
Which Git repository structure and branching strategy to use?
We are leveraging Github to store our Git repositories. We’ve created two repositories. The first one is dedicated to host the source code of the application. It also contains Dockerfile and the CI workflows definition which main goals are to test and build the application Docker images. Let’s focus here on the second repository that will be monitored by a GitOps software agent.
To implement the GitOps methodology, two branching strategies emerge:
- A unique branch with a repository structure per environment: the ideal solution if you are really into continuous deployment. New changes will result in an update to multiple environments. Move forward each time (for new features and fixes) and don’t look behind. If you have to maintain multiple versions, if can be tricky because you’ll have to maintain the versioning of the descriptors (e.g., Helm chart version and/or Kustomize component version).
- One branch per environment: allows you to properly segment the different environments. New features can be tested easily. You can have and maintain several versions of your infrastructure in parallel. Code promotion is handled by merge commits or merge through pull requests on the environment branches. Like a Gitflow workflow, the biggest drawback is the management of merge/rebase on the different branches. Thus, a configuration drift can occur silently on an environment branch.
We choose the branch-per-environment strategy but with a repository structure to avoid code duplication (Don't Repeat Yourself for the win). Note that we use Kustomize to customise our Kubernetes descriptors. Below you’ll see the actual repository structure.
application-deployment GitHub repository:
application-deployment: / |-- kustomize/ │ | │ |-- base-template/: Allows us to define a basic set of components. For example, a developer environment called sandbox will include a set of components identical for each sandbox environment. │ | │ |-- common-patches/: Kustomize JSON patches used in several environments such as resources limitation (less memory and CPU on development environments), reduced number of replicas on Horizontal Pod Autoscaler, IP whitelisting, etc… │ | │ |-- component-sets/: List of components per environment (one repository per environment) │ | | |-- components/: List of services (ex: Public API component which include Kubernetes resources such as deployment, horizontal pod autoscaler, service, ingress, service account) │ | | |-- environments/ | | | | | |-- kustomization-templates/: Image versions and components template used by the CI to generate a kustomization.yml file for each environment | |-- terraform/ │ | │ |-- environments/: A set of Terraform variables per environment │ | │ |-- modules/ │ │ | | | |-- app-database/: Manage the database structure such as: schemas, users and extensions. │ │ | | | |-- app-infrastructure/: Provision the infrastructure required for each environment (Buckets, Pub/Sub, Cloud functions, IAM, etc…)
Once, the repository structure and the Git branches are in place.
One central piece is missing, the GitOps controller.
GitOps software agents
When we started almost three years ago, GitOps was in its infancy. Two GitOps controllers caught our attention: Faros from Pusher and Flux from WeaveWorks.
We started with Faros for a few months but the lack of updates led us to migrate to Flux v1. This controller was great to start with and especially if you have a simple application ready for continuous deployment. With the growth of our application (introduction of new micro services) and some development shortcuts taken (breaking changes on database structure between new versions) Flux became a limitation. The execution order of the components rollout became really tedious and gave us a lot of headaches. Flux v2 was still at its early stages, so we decided to finally use ArgoCD.
The features list is very similar to its competitors. In addition, ArgoCD offers a very nice Web UI for visualising Kubernetes resources and rollout deployments. This has been a game-changer for onboarding new developers into the Kubernetes world. Second, Argo projects are not limited to ArgoCD. They proposed other useful and well-integrated tools such as:
- Argo Workflow to build and orchestrating parallel jobs and pipelines on Kubernetes.
- Argo Events which is an event-driven workflow automation framework for Kubernetes.
- Argo Rollout provides advanced deployment capabilities such as blue-green, canary, etc..
How to start?
First in ArgoCD, you need to create an Argo Application. This application will be in charge of monitoring a GitHub repository to apply and reconcile your Kubernetes state. In our case, we need one application per environment.
To do that, we are leveraging the App of Apps Pattern (ArgoCD doc) to manage our different environments. We have an Argo Application in charge of creating and managing per environment application. Each final application is a simple Yaml file where you’ll need to define the Git settings (repository URL, revision and path of your kustomization file) and targeted environment. Note, an ArgoCD instance can manage multiple applications on multiple clusters.
--- apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: myEnvironment namespace: argocd spec: destination: server: https://18.104.22.168:443 namespace: myEnvironment source: path: kustomize/environments/myEnvironment repoURL: email@example.com:myOrganisation/application-deployment.git targetRevision: myEnvironment syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=true
From this point, ArgoCD is able to maintain the deployment state of your application. If an update is committed in Git, ArgoCD will automatically apply the changes to your clusters as if you had to apply it manually (cf command below).
kustomize build kustomize/environments/myEnvironment | kubectl apply -f -
Ordering the synchronization
Sometimes you have dependencies between services/components.
For this, ArgoCD proposed two functionalities: synchronization phases and waves.
There are three phases pre-sync, sync and post-sync. You can specify an annotation to select the desired phase hook directly in your resources descriptors.
metadata: annotations: argocd.argoproj.io/hook: PreSync
Note, if you choose to synchronize some objects during the pre-synchronization hook. ArgoCD will delete and recreate your objects each times, even if they have not been modified.
In each phase, you can provide an optional annotation to specify a wave order (lower value first).
metadata: annotations: argocd.argoproj.io/sync-wave: "3"
When a new sync is triggered, ArgoCD will apply the resources in the following precedence:
- The phase
- The wave they are in (lower values first)
- The Kind of resources (e.g., namespaces first)
- The name of the resources (Alphabetically)
Automate all the steps behind a production deployment
A GitOps workflow is perfect for day-to-day operations, including managing all the lower environments. In production and other critical environments, you may have a few additional steps to consider.
A small disruption of services on a development environment during the deployment of a new release may be acceptable, but not in production. If a new version of a component depends on updating the schema of your database, even with the wave and phase this can be tricky to implement. In a critical environment, you also may want to define a maintenance period on your monitoring tool, send a message in a status page, etc…
Even with ArgoCD and the synchronization ordered steps, we had to perform some manual actions to achieve our weekly release on critical and production environments. In addition, we didn’t have a good overview of the synchronization progress and the duration to sync an environment kept increasing.
As we mentioned earlier, Argo comes with many additional projects.
We’ve selected Argo Workflow to automate our extra steps for a deployment.
Note that Argo provides good documentation for each of their tools with many examples that make it easy for you to start and also to master them.
Argo Workflow is implemented as a custom resource definition in Kubernetes. That been said, a workflow is described with a YAML file. Every step is a pod containing one or multiple containers. This way, you can do whatever you have in mind. Like a pod, you can share a persistent volume and variables between your steps. Furthermore, a workflow provides built-in features such as retry, artefact (to easily retrieve data on a bucket or a git repository), conditional steps, etc…
A workflow can also be launched manually, like a Cron or triggered during an ArgoCD synchronization by using the wave and hook previously mentioned.
How to write efficient workflows?
To stay in line with our “Don’t Repeat Yourself” philosophy, we decided to use the Cluster Workflow Templates as libraries (e.g., our database library provides various steps to run a SQL script on a PostgreSQL database, clone a Cloud SQL instance, perform a database dump, etc..). So, when you need to create a new workflow, you can pick into the existing steps or enrich the libraries. This has been really helpful especially for new contributors starting to work on workflows.
We started with two simple workflows. A pre-sync workflow to automate the pre deployment steps such as performing a database backup, scale down the applications and applying the infrastructure changes (Terraform apply of the infrastructure part). The second one is a post-sync phase workflow used to apply database changes, update Elasticsearch data and to restart the application.
Both workflows are automatically triggered by ArgoCD because they’re declared as pre and post synchronization hooks of our ArgoCD applications.
We transitioned from 15 manual tasks (mostly copy/paste commands) during a production deployment to only one: a merge of a GitHub Pull Request!
GitOps is still an emergent methodology, most of examples proposed by the community are better suited for simple applications and/or for continuously deployed application. However, even for more complex applications, it will bring you a lot of value quite quickly. Start small, gain experience and move forward. We started by applying the GitOps methodology on our infrastructure components (monitoring stack Prometheus, Alert manager, etc..). Then, we continued using it to manage our SaaS application.
We are continuously evolving on our implementation of the GitOps methodology. The branching strategy, “one branch per environment” began to show us its limitations. We are now facing the same issues as a Gitflow user. It’s fine with short-lived environments, but it can lead to big issues due to hidden configuration drift as a result of merges and reverts on long-lived branches for critical environments. We are trying to take advantage of Git tags to version our infrastructure in order to reduce the number of branches needed.
Argo CD and Argo Projects are proving to us, day after day, that we made the right choice to start using them. As our SaaS application continues to grow, Argo offers multiple tools that are already make our lives easier (automated deployment on every environment, production release duration reduce by automating all the steps, migration and operation workflows, etc..). The web interface of Argo CD and Argo Workflow helps us onboarding and empowering other teams by offering them more autonomy with an accessible interface.
Note, Flux v2 is also a very good alternative. We are still using Flux v1 to manage our infrastructure components and we still intend to use Flux for this purpose. But, we would like to migrate to Flux v2 which has evolved well. It is lighter and offers interesting cluster boot strapping solution with the use of a dedicated Terraform provider.
Our next upcoming Argo Projects will be Argo Events. This event-driven workflow automation will be used to link a Github event (creation of a Pull Request) and an Argo Workflow to create an on-demand environment from scratch. But there is still work to be done on that project and that will be the topic of another article.
If you want to be part of this adventure, we are currently hiring more TechOps Engineers and we also have several other positions open! ✨
At QIMA, we relentlessly help our clients succeed. If you have any suggestion or remark, please share them with us. We are always keen to discuss and learn!
Written by Guillaume Camus, TechOps Engineer at QIMA. ✍️