Managing database migrations in secure and isolated environments presents a remarkable challenge. This blog post delves into the intricacies of setting up or migrating databases where there is a Kubernetes environments in use and direct access to database is restricted. We’ll explore strategies for handling database migrations when the database is not directly accessible by CI/CD pipelines but is available to your application within a Kubernetes cluster. This scenario is increasingly common in cloud-based architectures, where security measures like firewalls limit access. Our focus will be on practical solutions for these constrained environments, leveraging Kubernetes and Helm to ensure smooth, secure database migrations as part of your CI/CD workflow.
Note that this is apart from scenarios where app itself handles database migrations (like entity framework migrations).
On this article you read:
- Types of Database Migration
- Database Initialization (or Database Setup)
- Database Schema Migration (or Database Schema Update)
- Securing and Managing Credentials and Scripts for Database Migrations
- CI/CD Pipelines with Database Migrations
Types of Database Migration
In the context of Kubernetes and CI/CD pipelines, it’s crucial to distinguish between two primary types of database migrations. This differentiation not only aids in understanding but also in implementing the right strategies for each scenario.
- Database Initialization (or Database Setup): This type involves setting up a new database in your environment. It’s the first step in deploying your application’s data storage, where the database is created with the necessary initial structure and data. In Kubernetes environments, this setup is crucial to ensure that the application has the necessary data structure from the get-go, especially when dealing with stateful applications.
- Database Schema Migration (or Database Schema Update): This type is about evolving an existing database. It includes changes like adding new tables, modifying existing ones, or updating data structures to accommodate the evolving needs of your application. This type of migration is continuous and often automated as part of the CI/CD process, reflecting the iterative nature of modern application development.
By understanding and categorizing migrations into these two types, teams can develop more effective strategies and utilize tools like Helm to manage these processes efficiently within Kubernetes clusters.
Database Initialization (or Database Setup)
Database initialization, a one-time process typically executed during the deployment phase (referred to as
helm install in Helm terminology), is crucial for setting up the initial state of your database in a Kubernetes environment. An effective tool for handling this process is the use of an “init container.”
Init containers are specialized containers that run before your application’s main containers and are perfect for tasks like database setup. They ensure that your application has the necessary database structure and data before it starts running. This approach is advantageous because it allows for the execution of initialization scripts, which can be updated alongside your application as it evolves and the database requirements change. However, unlike database schema migrations, these initialization scripts do not necessarily need to be idempotent, as they are intended to run only once during the initial setup.
Lets explore how to effectively integrate init containers into your Helm charts for reliable and efficient database initialization in a Kubernetes setup.
Not that we can add initContainers before containers in the same deployment file (helm template as your main app)
Database Schema Migration (or Database Schema Update)
For handling database schema migrations in Kubernetes, one effective approach is to utilize Kubernetes Jobs. This method ensures that your schema updates are performed in a controlled and orderly manner.
A Kubernetes Job is a resource that runs a pod to completion, as opposed to regular pods that might run continuously. The key difference between a Kubernetes Job and an init container is their purpose and lifecycle. An init container is used to perform setup tasks before the main container starts and is part of the pod’s lifecycle. On the other hand, a Kubernetes Job is a separate entity designed to execute a batch job or a task that runs to completion. This makes Jobs ideal for tasks like database migrations, where the task needs to run independently and possibly at different times, not necessarily linked with the pod’s startup lifecycle.
Before we jump into code sample please keep in mind:
Idempotency is Key: It’s crucial that your migration scripts are idempotent. This means they can be run multiple times without causing issues or unintended side effects. This property is particularly important for ensuring consistency and reliability, especially in scenarios where migrations might be re-attempted due to failures or rollbacks.
Managing Changes Thoughtfully: When updating your database schema, consider the impact on your current application. Make changes incrementally and avoid abrupt alterations like deleting columns. For instance, if you need to remove a column, first create a new column in one release, transfer the data, and then remove the old column in a subsequent release. This approach minimizes disruption and allows for smoother rollbacks if needed.
Here’s a sample code snippet for a Kubernetes Job, designed for database schema migration:
Helm Hooks Annotations:
lines 6 and 7: When deploying Kubernetes Jobs using Helm for database schema migrations, a common challenge is that Jobs, once completed, cannot be simply rerun with an updated Helm chart. This is due to their immutable nature. To address this, you can use Helm hooks and hook delete policies.
"helm.sh/hook": pre-upgrade,pre-installannotation ensures that the Job runs before the main application (both on install and upgrade).
"helm.sh/hook-delete-policy": hook-succeeded,hook-failedannotation instructs Helm to delete the Job after it has either succeeded or failed, allowing you to rerun the migration Job on subsequent upgrades.
Without above hem hooks you are going to face error that looks like below on
“Helm: UPGRADE FAILED: cannot patch … with kind Job, by update field image”
Securing and Managing Credentials and Scripts for Database Migrations
Ensuring secure and effective management of database credentials and migration scripts is crucial. Let’s explore how to pass these values to the command line of a running container:
In both init containers and Kubernetes Jobs, environment variables are used to pass credentials. For example, values like
$(MYSQL_PASSWORD) are set through environment variables. These can typically be sourced from Helm chart values. However, passwords require more care. It’s recommended to store them in Kubernetes Secrets and reference them securely in your container configuration. Here’s an example you can add this both in code examples above (I left you placeholders):
Example for password secret
Managing Script Files
Including your script files in your Helm chart is a practical option for the init container scenario. However, for better accessibility and maintainability, it’s often advantageous to store migration scripts in a separate directory at the same level as your code in your repository. This setup enhances visibility for developers and reminds them to update the scripts as necessary.
To implement this, create a ConfigMap that contains your script files, and then mount this ConfigMap to your migration container as a volume. This approach enables you to read scripts from external files outside the Helm chart.
Here’s how you can set up a ConfigMap:
And then, integrate it into your container setup with volume mounts:
Below is a comprehensive example of a Kubernetes Job tailored for database migration. The principles demonstrated here can be similarly applied to init containers, ensuring consistency and efficiency in both setups.
CI/CD Pipelines with Database Migrations
In the realm of CI/CD Pipelines and Database Migrations, particularly within Kubernetes environments, various scenarios are optimal for different situations. In our case, we focus on a setup where your application is deployed using a Helm chart (stored in the same repository), and the database is inaccessible to pipeline runners due to firewall restrictions or because it resides within the same Kubernetes cluster.
For deployment, you’ll typically use
helm upgrade. The process is standard, you just need to give access to your Kubernetes from your ci/cd agent (runner). For github and Azure Kubernetes (AKS) GitHub Federated integration with Azure is recommended.
Note that a special attention is needed for the values, particularly for sensitive information like passwords. For non-sensitive values like host and username, you can either use
--set command or define them in a values file. However, for passwords, it’s safer to use
--set in combination with a secret manager like Azure Key Vault or GitHub secrets.
For script files, you can pass their content using
--set-file. This approach ensures secure and efficient handling of both configuration values and migration scripts in your CI/CD pipeline.
Here’s an example of how you might use these commands: