Kubernetes Database Migration with Helm and CI/CD Pipelines

Managing database migrations in secure and isolated environments presents a remarkable challenge. This blog post delves into the intricacies of setting up or migrating databases where there is a Kubernetes environments in use and direct access to database is restricted. We’ll explore strategies for handling database migrations when the database is not directly accessible by CI/CD pipelines but is available to your application within a Kubernetes cluster. This scenario is increasingly common in cloud-based architectures, where security measures like firewalls limit access. Our focus will be on practical solutions for these constrained environments, leveraging Kubernetes and Helm to ensure smooth, secure database migrations as part of your CI/CD workflow.

Note that this is apart from scenarios where app itself handles database migrations (like entity framework migrations).

On this article you read:

Types of Database Migration
- Database Initialization (or Database Setup)
- Database Schema Migration (or Database Schema Update)
Securing and Managing Credentials and Scripts for Database Migrations
CI/CD Pipelines with Database Migrations

Types of Database Migration

In the context of Kubernetes and CI/CD pipelines, it’s crucial to distinguish between two primary types of database migrations. This differentiation not only aids in understanding but also in implementing the right strategies for each scenario.

Database Initialization (or Database Setup): This type involves setting up a new database in your environment. It’s the first step in deploying your application’s data storage, where the database is created with the necessary initial structure and data. In Kubernetes environments, this setup is crucial to ensure that the application has the necessary data structure from the get-go, especially when dealing with stateful applications.
Database Schema Migration (or Database Schema Update): This type is about evolving an existing database. It includes changes like adding new tables, modifying existing ones, or updating data structures to accommodate the evolving needs of your application. This type of migration is continuous and often automated as part of the CI/CD process, reflecting the iterative nature of modern application development.

By understanding and categorizing migrations into these two types, teams can develop more effective strategies and utilize tools like Helm to manage these processes efficiently within Kubernetes clusters.

Database Initialization (or Database Setup)

Database initialization, a one-time process typically executed during the deployment phase (referred to as helm install in Helm terminology), is crucial for setting up the initial state of your database in a Kubernetes environment. An effective tool for handling this process is the use of an “init container.”

Init containers are specialized containers that run before your application’s main containers and are perfect for tasks like database setup. They ensure that your application has the necessary database structure and data before it starts running. This approach is advantageous because it allows for the execution of initialization scripts, which can be updated alongside your application as it evolves and the database requirements change. However, unlike database schema migrations, these initialization scripts do not necessarily need to be idempotent, as they are intended to run only once during the initial setup.

Lets explore how to effectively integrate init containers into your Helm charts for reliable and efficient database initialization in a Kubernetes setup.

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
spec:
  initContainers:
  - name: init-myservice
    image: mysql:5.6 # exampe if you have mysql
    #add env (continue reading)
    #add volumeMounts (continue reading)
    command: ["/bin/sh", "-c"]
    args:
      - |
         #Here you can write muitple lines of commands 
         mysql -h $(MYSQL_HOST) -u $(MYSQL_USER) -p$(MYSQL_PASSWORD) < /scripts/script.sql
       
  containers:
  - name: myapp-container
    image: myapp:1.0
    ...

YAML

Not that we can add initContainers before containers in the same deployment file (helm template as your main app)

Database Schema Migration (or Database Schema Update)

For handling database schema migrations in Kubernetes, one effective approach is to utilize Kubernetes Jobs. This method ensures that your schema updates are performed in a controlled and orderly manner.

A Kubernetes Job is a resource that runs a pod to completion, as opposed to regular pods that might run continuously. The key difference between a Kubernetes Job and an init container is their purpose and lifecycle. An init container is used to perform setup tasks before the main container starts and is part of the pod’s lifecycle. On the other hand, a Kubernetes Job is a separate entity designed to execute a batch job or a task that runs to completion. This makes Jobs ideal for tasks like database migrations, where the task needs to run independently and possibly at different times, not necessarily linked with the pod’s startup lifecycle.

Before we jump into code sample please keep in mind:

Idempotency is Key: It’s crucial that your migration scripts are idempotent. This means they can be run multiple times without causing issues or unintended side effects. This property is particularly important for ensuring consistency and reliability, especially in scenarios where migrations might be re-attempted due to failures or rollbacks.

Managing Changes Thoughtfully: When updating your database schema, consider the impact on your current application. Make changes incrementally and avoid abrupt alterations like deleting columns. For instance, if you need to remove a column, first create a new column in one release, transfer the data, and then remove the old column in a subsequent release. This approach minimizes disruption and allows for smoother rollbacks if needed.

Here’s a sample code snippet for a Kubernetes Job, designed for database schema migration:

apiVersion: batch/v1
kind: Job 
metadata:
  name: schema-migration-job
  annotations:
    "helm.sh/hook": pre-upgrade,pre-install
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed
    "helm.sh/hook-weight": "10"
spec:
  template:
    spec:
      containers:
      - name: schema-update
        image: mysql:5.6 # exampe if you have mysql
        #add env (continue reading)
        #add volumeMounts (continue reading)
        command: ["/bin/sh", "-c"]
        args:
          - |
             #Here you can write muitple lines of commands 
             mysql -h $(MYSQL_HOST) -u $(MYSQL_USER) -p$(MYSQL_PASSWORD) < /scripts/script.sql
            
  
      restartPolicy: Never
  backoffLimit: 4

YAML

Helm Hooks Annotations:

lines 6 and 7: When deploying Kubernetes Jobs using Helm for database schema migrations, a common challenge is that Jobs, once completed, cannot be simply rerun with an updated Helm chart. This is due to their immutable nature. To address this, you can use Helm hooks and hook delete policies.

The "helm.sh/hook": pre-upgrade,pre-install annotation ensures that the Job runs before the main application (both on install and upgrade).
The "helm.sh/hook-delete-policy": hook-succeeded,hook-failed annotation instructs Helm to delete the Job after it has either succeeded or failed, allowing you to rerun the migration Job on subsequent upgrades.
The “helm.sh/hook-weight”: “10” : this is the order of execution of hooks. note that value should be string not int (i.e. inside “”). You want it have higher value than things this depends on, in this example it depends on the config map that associated with the script (keep reading)

Without above hem hooks you are going to face error that looks like below on helm upgrade:
“Helm: UPGRADE FAILED: cannot patch … with kind Job, by update field image”

Securing and Managing Credentials and Scripts for Database Migrations

Ensuring secure and effective management of database credentials and migration scripts is crucial. Let’s explore how to pass these values to the command line of a running container:

Handling Credentials

In both init containers and Kubernetes Jobs, environment variables are used to pass credentials. For example, values like $(MYSQL_HOST), $(MYSQL_USER), and $(MYSQL_PASSWORD) are set through environment variables. These can typically be sourced from Helm chart values. However, passwords require more care. It’s recommended to store them in Kubernetes Secrets and reference them securely in your container configuration. Here’s an example you can add this both in code examples above (I left you placeholders):

...
      - name: schema-update
        image: mysql:5.6 # exampe if you have mysql
        env:
        - name: MYSQL_HOST
          value: {{ .Values.db.host }}
        - name: MYSQL_USER
          value: {{ .Values.db.user }}
        - name: MYSQL_PASSWORD
          valueFrom:
            secretKeyRef:
              name: secret-mysql-db
              key: password
        command: ["/bin/sh", "-c"]

YAML

Example for password secret
secret-mysql-db.yaml:

apiVersion: v1
kind: Secret
metadata:
  name: secret-mysql-db
type: Opaque
data:
  password: {{ .Values.db.password | b64enc | quote }}

YAML

Managing Script Files

Including your script files in your Helm chart is a practical option for the init container scenario. However, for better accessibility and maintainability, it’s often advantageous to store migration scripts in a separate directory at the same level as your code in your repository. This setup enhances visibility for developers and reminds them to update the scripts as necessary.

To implement this, create a ConfigMap that contains your script files, and then mount this ConfigMap to your migration container as a volume. This approach enables you to read scripts from external files outside the Helm chart.

Here’s how you can set up a ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: {{ include "chart.namespace" . }}
  name: configmap-migration-script
data:
  script.sql: |-
    {{ .Values.db.migrationScript | nindent 4 }}

YAML

And then, integrate it into your container setup with volume mounts:

        ...
        image: mysql:5.6 # exampe if you have mysql
        env:
         ...
        volumeMounts:
        - name: migration-script
          mountPath: /scripts
        command: ["/bin/sh", "-c"]
    ...    
        
    ...
    #also add     
    volumes:
      - name: migration-script
        configMap:
          name: configmap-migration-script

YAML

Example

Below is a comprehensive example of a Kubernetes Job tailored for database migration. The principles demonstrated here can be similarly applied to init containers, ensuring consistency and efficiency in both setups.
db-migration-job.yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration-job
  namespace: {{ include "chart.namespace" . }}
  annotations:
    "helm.sh/hook": pre-upgrade,pre-install
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed
    "helm.sh/hook-weight": "10"
spec:
  template:
    spec:
      containers:
      - name: migration
        image:  {{ .Values.db.image }}
        volumeMounts:
        - name: migration-script
          mountPath: /scripts
        env:
        - name: MYSQL_HOST
          value: {{ .Values.db.host }}
        - name: MYSQL_USER
          value: {{ .Values.db.user }}
        - name: MYSQL_PASSWORD
          valueFrom:
            secretKeyRef:
              name: secret-mysql-db
              key: password
        command: ["/bin/sh", "-c"]
        args:
          - |
            mysql -h $(MYSQL_HOST) -u $(MYSQL_USER) -p$(MYSQL_PASSWORD) < /scripts/script.sql
      restartPolicy: Never
      volumes:
      - name: migration-script
        configMap:
          name: configmap-migration-script

YAML

CI/CD Pipelines with Database Migrations

In the realm of CI/CD Pipelines and Database Migrations, particularly within Kubernetes environments, various scenarios are optimal for different situations. In our case, we focus on a setup where your application is deployed using a Helm chart (stored in the same repository), and the database is inaccessible to pipeline runners due to firewall restrictions or because it resides within the same Kubernetes cluster.

For deployment, you’ll typically use helm upgrade. The process is standard, you just need to give access to your Kubernetes from your ci/cd agent (runner). For github and Azure Kubernetes (AKS) GitHub Federated integration with Azure is recommended.

Note that a special attention is needed for the values, particularly for sensitive information like passwords. For non-sensitive values like host and username, you can either use --set command or define them in a values file. However, for passwords, it’s safer to use --set in combination with a secret manager like Azure Key Vault or GitHub secrets.

For script files, you can pass their content using --set-file. This approach ensures secure and efficient handling of both configuration values and migration scripts in your CI/CD pipeline.

Here’s an example of how you might use these commands:

helm upgrade myapp ./mychart \
  --set db.host=mydbhost \
  --set db.user=mydbuser \
  --set db.password="${{ secrets.DB_PASSWORD }}" \  #GitHub Secret for example 
  --set-file db.migrationScript=path/to/your/script.sql

Bash

Posted

January 21, 2024

DevOps

Daniel Abrahamberg

Tags:

CI/CD, DevOps, Kubernetes