Published: Nov. 23, 2022
Tags: git deployment terraform
Git-Namespaced Deployment¶
If you do any work in software testing you’re going to hear a lot about the three environments in which code is run: development, staging, and production.
Separating environments is necessary, but there’s some pitfalls to the way that many organizations set up persistent infrastructure for staging that manifest as teams grow and many developers need concurrent access to staging or user-acceptance-testing (UAT) environments.
As part of our series on our site stack, this post presents some simple techniques we use on this site for managing and deploying cloud environments.
We’ll cover the following:
How to tightly couple a branch name in a git-based Version Control System (VCS) to the deployment namespace to Infrastructure-as-Code (IaC) tooling, with a specific example for Terraform
How this method allows many developers to deploy and stage their work in parallel without collisions, either with persistent or ephemeral evironments
Example code snippets and a working example from our site (available on github)
Background & Motivation¶
Application deployment is often more computer art than computer science. Even if the application software looks good on a localhost, testing and shipping it can be a daily slog for developers without some consideration for their workflows.
When assessing a deployment mechanism, it’s helpful to pose questions like the following:
Can we automatically (or at least easily) create a fully-featured staging environment that replicates production?
Can our developers rapidly prototype changes to an application that require changes to the stack, such as adding a new datastore and network configuration?
Have we eliminated or minimized per-environment branching logic in our application code?
Far from being rhetorical, the answer to at least one of these questions is often no. Poorly devised deployment mechanisms manifest in several common pain-points for developers:
Multiple developers are unable to concurrently deploy their changes to a shared persistent staging environment, since they are blocked by other revisions in the queue
The codebase has tripartite (prod/staging/dev) logical branches or configurations that developers need to edit during their development but later revert before merging to production [1]
To experiment or prototype features, developers are constantly dependent on other infrastructure or security teams to make changes outside their control [2]
An elegant deployment mechanism will ideally avoid all of these problems and systematize processes for developers of different stripes in an organization of any size.
Method Overview¶
Branch-namespaced deployment is done by using VCS branch names as a fundamental input parameter to determine both the deployment context and the resultant resource names created in the build environment.
The process is illustrated in the diagram below, which shows how the branches of a source code repository (left) are manifested into namespaced cloud projects during the build process, coupling the branch names to the built resources:
With respect to the above diagram, we may note that:
Any branch in the VCS is used to derive the namespace for a buildable environment using a single Terraform module codebase [3] in the repository,
We consider our production branch as simply the main (formerly master) branch
All other feature branches are generally considered non-production or development branches
The build agent (i.e. an IAM service agent or priveleged admin user) that runs the Terraform CLI inherits the branch name
$BRANCH
in the build process context, and performs the following:Create a cloud project
project-prefix-$BRANCH
for the given branch, if it does not existCreate a storage bucket
gs://tfstate-$BRANCH}/
within the project for use as the Terraform storage backend, if it does not existInitialize Terraform using the state from
gs://tfstate-$BRANCH/
, and build the module-defined cloud resources withinproject-prefix-$BRANCH
The projects created for most development branches are ephemeral by design, and can be discarded after use (i.e. staging and integration tests) by the branch author. However the namespacing method is equally applicable for managing persistent resources, if desired.
Suppose your organization or team still wants to maintain at least one always-up UAT environment into which revisions are required to be merged for a user acceptance review before shipping them into production.
For this use case, it’s straightforward to maintain a persistent uat
branch that is typically several commits ahead of the repository’s main
branch, and into which any feature
branch may be merged.
When user acceptance testing is completed successfully, the commits from feature
can then be merged into main
, after which feature
may be discarded.
Terraform Implementation¶
This section outlines an implementation of branch namespacing with Terraform, and covers:
Structuring the Terraform module and configuring the backend and inputs
Managing the shell environment to set up the branch namespace and other variables
Configuring the terraform state backend
Module Configuration¶
The setup required in the Terraform module is minimal, with only a few considerations for the backend configuration and required input variables:
Backend Configuration
Terraform should be configured with a suitable non-local backend, such as
gcs
, that is only partial configured such as to require the bucket name at runtimeFor resource creation, it will also be necessary to create input variables without default values for the namespaced project name
The Terraform state file for each given branch will also be stored in its own (secure) namespaced remote storage bucket, separate from every other branch. Distinct buckets rather than a single centralized bucket is a practical necessity since the
tfstate
for themain
branch is highly sensitive, but it’s reasonable to set up access rules for developer groups to access thetfstate
files for their own branches.
As an example, the main module and variables file should contain blocks like the following:
1# ---- main.tf ----
2terraform {
3 # Partially configured backend; requires "bucket"
4 backend "gcs" {
5 prefix = "terraform/tfstate"
6 }
7}
8...
1# ---- variables.tf ----
2variable "project_id" {
3 type = string
4}
5variable "branch_name" {
6 type = string
7}
8...
Shell Context & Namespace Setup¶
No matter which system you deploy Terraform with, you’re likely running the terraform
CLI in a shell context with modifiable environment variables.
We recommend wrapping all terraform
calls in a dedicated shell script responsible for setting up the environment and passing structural arguments like the branch name using the input variable -var
switch.
By structural we specifically mean inputs to terraform that dictate how an environment and resources should be set up, but not what resources are to be created. The structural parameters (and basically the entirety of the shell script) should be essentially unchanging such there are no resource-related edits made in the build script; such edits are better made elsewhere, using .tfvars files files. Examples of parameters that meet these criteria are: branch namespace string, billing accounts, organization identifiers, etc.
A build script resembles the following stripped-down snippet, adapted from our own with error checking removed for brevity.
The salient function to note is export_env()
, which exports the shell environment for all subcommands invoked in the script, and is responsible for retrieving the branch name here with git branch --show
[4].
1#!/usr/bin/env bash
2
3SCRIPT_NAME=$(basename $0)
4
5export_env() {
6 # Export all required environment variables
7
8 ROOT_DIR=$(dirname $SCRIPT_NAME)
9
10 # Define the variables for the app & orgaization
11 export SITE_DOMAIN='example.com'
12 export GOOGLE_CLOUD_PROJECT_PREFIX='appname'
13 export GCS_TFSTATE_BUCKET_PREFIX='tfstate'
14
15 # Set the namespaced variables to be used by terraform
16 export BRANCH_NAME="$(git branch --show)"
17 export GOOGLE_CLOUD_PROJECT="${GOOGLE_CLOUD_PROJECT_PREFIX}-${BRANCH_NAME}"
18 export GCS_TFSTATE_BUCKET="${GCS_TFSTATE_BUCKET_PREFIX}-${BRANCH_NAME}.${SITE_DOMAIN}"
19}
20
21create_project() {
22 ...
23}
24
25create_terraform_tfstate_bucket() {
26 ...
27}
28
29run_terraform_init() {
30 ...
31}
32
33# Only run & exit if this script is called directly; run & exit will
34# not be called if dev is being sourced with '.' or 'source'
35if [ "${BASH_SOURCE[0]}" -ef "$0" ]; then
36 export_env
37 create_project
38 create_terraform_tfstate_bucket
39 run_terraform_init
40
41 if (($# == 0)); then
42 exit 0
43 fi
44
45 terraform_vars=$(cat <<eof
46 --var project_id=$GOOGLE_CLOUD_PROJECT \
47 --var branch_name=$BRANCH_NAME
48eof
49 )
50 _cmd="terraform $*"
51 if [[ "$1" == "refresh" ]] \
52 || [[ "$1" == "apply" ]] \
53 || [[ "$1" == "destroy" ]] \
54 || [[ "$1" == "plan" ]]; then
55 _cmd="$_cmd ${terraform_vars}"
56 fi
57 $_cmd
58fi
We’ve exluded the contents of create_project()
and create_terraform_tfstate_bucket()
above, as these will be specific to each organization.
For us, it suffices to run several gcloud
commands to create the project and bucket resources, e.g. with:
create_project() { gcloud projects create "$GOOGLE_CLOUD_PROJECT" } create_terraform_tfstate_bucket() { gcloud storage buckets create \ "gs://$GCS_TFSTATE_BUCKET" \ --project "$GOOGLE_CLOUD_PROJECT" }
An organization with additional security requirements would likely have additional steps after creating a project or bucket, such as applying additional firewall rules for networks in development projects to allow ingress only from allow-listed VPN address ranges.
Terraform State Management¶
Whenever the build agent invokes terraform
it has to retrieve the backend bucket path from the runtime context to determine the tfstate
storage location.
If the agent is run on a short-lived container (for instance after a VCS commit triggers a build), there is no local terraform state file to consider, and the remote state will be queried.
However, if you allow for developers to directly modify state during development, either by assigning role permissions or impersonating a service account, it’s important to ensure that only the remote statefile is used as a source of truth and that correct branch state is queried.
To ensure a clean working state, we recommend the nuclear option of discarding any local configuration each time the terraform build script is invoked using -terraform init -reconfigure
[5].
The run_terraform_init()
function in the snippet above should look like the following, where the $GCS_TFSTATE_BUCKET_PREFIX
has been set in the environment prior to invocation:
run_terraform_init() {
# Initialize the local terraform environment with -reconfigure,
# which will use the remote statefile and not attempt to migrate
# any local existing state, which is desired when switching
# between git branches with branch-namespaced backend state buckets.
terraform init -reconfigure \
-backend-config="bucket=$GCS_TFSTATE_BUCKET"
}
Discussion¶
Applicability in Practice¶
We’ve presented branch namespaces as a technique that addresses common developer painpoints, namely:
Allow concurrent deployment for multiple developers
Eliminate of configuration forking in a single codebase
Enable developers to create and prototype infrastructure as needed
It’s straightforward to see we’ve hit point #1; since each deployed project is guaranteed to be isolated, developers are able to create environments that are identical to production except for their revisions. However this itself merits discussion with respect to point #2: when is desirable to have identical environments and how should we fork when needed?
Having certain kinds of forked configurations for prod vs staging environments is indeed desirable.
Suppose you have a resource-intensive application in production, perhaps some sexy machine learning service with a fleet of expensive GPU instances.
To verify a revision to the application code it will likely be be sufficient and much cheaper to deploy the revision’s code on a stack with fewer, smaller instances.
This use case is certainly possible, and we recommend using file-based tfvars
to provide input variables for any resource-related attribute (e.g. compute instance family or number of instances) as follows:
Maintain multiple environment-specific configurations as is often done, e.g.
main.tfvars
,staging.tfvars
Add conditional logic to determine which
tfvars
file to use in the shell script, and pass the result to terraform using the--var-file
switch
Keeping logical branching outside of the terraform modules has the advantage of maintaining the separation of responbilities between the application owners who maintain the build process and application developers who use the build process. While having multiple variable files still creates forked configurations in the codebase, it’s an acceptable fork lite ™ since the tfvars
files can be structured as simple dictionary payloads to be accessed by the modules, which are structurally identical regardless of the source file for the input variables.
With regards to point #3, we’ll be blunt: whether branch namespacing will enable developers to experiment and prototype really depends on the particularities of your organization, rather than on the technical feasibility.
Enabling sandboxed experimentation by allowing developers to spin up stack and applications will likely require a little up-front philosophical discussion, some planning, and then development time to set up the appropriate build agents and project creation mechanism in a large organization.
At the end of the day, it might be a tough sell for an existing application with a large codebase.
However if you’re in the process of doing the math on the cost of developer salary x
hours spent waiting in staging environment queues, or perhaps the opportunity cost of your engineers not experimenting, it may be worth prototyping a branch-namespaced deployment mechanism.
Target Architectures¶
There are many different kinds of enterprise applications out there, and branch namespaced deployment as we’ve implemented for this site isn’t necessarily a good fit for all application architectures.
We’re painting with a broad brush, but this method is best applied to:
monolithic database-backed applications
microservice architectures (with either monorepos or separate repos per service) [6]
In these types of applications, staging code revisions with a fully mirrored environment is desired due to the interconnected nature of the resources: if you want to run a suite of integration tests properly, you actually do need to spin up the resources exactly as they’ll be in production.
Conversely, this technique is less suitable applications running on shared resource computing architectures, such as a Spark or Hadoop cluster used by multiple teams that run independent jobs. In this setting in which the cluster users treat the cluster resources as a service prodided by the data platform, it’s relatively rare that the data job developers will be modifying the cluster configuration, and it’s unnecessary to spin up a new cluster on a per-branch basis. In this case it’s preferable to have a persistent staging cluster that’s dedicated to resourcing adhoc jobs submitted by the (data) developers. Note however that the cluster infrastructure itself could still be versioned and deployed with a branch namespace, enabling the data platform maintainers to spin up new clusters as needed to test new configurations.
Implementation Gotchas¶
There are always gotchas that are easy to walk into without advance notice. If you’re interested in branch namespacing your deployment, watch out for the following:
Branch name length & allowed characters.
Cloud providers have restrictions on the allowed characters and maximum length of bucket and project names
Make sure that branch names meet all the criteria for the resources in your cloud, and err on side of caution with only alphanumeric characters and hyphens
Tip
Name VCS branches systematically, ideally by concatenating:
an alphanumeric ticket or issue identifier from your organization’s issue tracking system
the deveoper’s unique username in your organization
E.g.: Branch named:
msmith-101
for the authormsmith
working on a branch that addresses Issue #101.For most feature development, name projects by concatenating:
a short prefix that’s mnemomic for your company/application
the branch name
E.g.: Project named:
exdc-msmith-101
for work atexample.com
on the branchmsmith-101
.Add unique suffices to problematic resources.
Some resources have additional practical restrictions due to the difficulty in destroying & recreating them during development (for instance you cannot delete a Google CloudSQL database and subsequently create a new one in any timeframe you’re likely willing to stomach)
To simplify resource destruction during iterative testing, add random suffices or (more user-friendly) pet names to problematic resources specifically in your modules
Conclusion¶
This post was a deep dive into using VCS branch names as deployment namespaces within a software project, with the goal of managing an IaC stack elegantly and improving developer workflows.
If you found these techniques interesting and would like to see a working example, we’ve made a snapshot of our terraform modules available on github for reference. While our stack is relatively simple, it’s still typical of many web applications, and we hope you find the presentation and examples helpful in your own work.
If you have any feedback or questions, don’t hestitate to contact us and we’ll be happy to get in touch.
Footnotes