Published: Nov. 23, 2022

Git-Namespaced Deployment¶

If you do any work in software testing you’re going to hear a lot about the three environments in which code is run: development, staging, and production.

Separating environments is necessary, but there’s some pitfalls to the way that many organizations set up persistent infrastructure for staging that manifest as teams grow and many developers need concurrent access to staging or user-acceptance-testing (UAT) environments.

As part of our series on our site stack, this post presents some simple techniques we use on this site for managing and deploying cloud environments.

We’ll cover the following:

How to tightly couple a branch name in a git-based Version Control System (VCS) to the deployment namespace to Infrastructure-as-Code (IaC) tooling, with a specific example for Terraform
How this method allows many developers to deploy and stage their work in parallel without collisions, either with persistent or ephemeral evironments
Example code snippets and a working example from our site (available on github)

Background & Motivation¶

Application deployment is often more computer art than computer science. Even if the application software looks good on a localhost, testing and shipping it can be a daily slog for developers without some consideration for their workflows.

When assessing a deployment mechanism, it’s helpful to pose questions like the following:

Can we automatically (or at least easily) create a fully-featured staging environment that replicates production?
Can our developers rapidly prototype changes to an application that require changes to the stack, such as adding a new datastore and network configuration?
Have we eliminated or minimized per-environment branching logic in our application code?

Far from being rhetorical, the answer to at least one of these questions is often no. Poorly devised deployment mechanisms manifest in several common pain-points for developers:

Multiple developers are unable to concurrently deploy their changes to a shared persistent staging environment, since they are blocked by other revisions in the queue
The codebase has tripartite (prod/staging/dev) logical branches or configurations that developers need to edit during their development but later revert before merging to production [1]
To experiment or prototype features, developers are constantly dependent on other infrastructure or security teams to make changes outside their control [2]

An elegant deployment mechanism will ideally avoid all of these problems and systematize processes for developers of different stripes in an organization of any size.

Method Overview¶

Branch-namespaced deployment is done by using VCS branch names as a fundamental input parameter to determine both the deployment context and the resultant resource names created in the build environment.

The process is illustrated in the diagram below, which shows how the branches of a source code repository (left) are manifested into namespaced cloud projects during the build process, coupling the branch names to the built resources:

With respect to the above diagram, we may note that:

Any branch in the VCS is used to derive the namespace for a buildable environment using a single Terraform module codebase [3] in the repository,
- We consider our production branch as simply the main (formerly master) branch
- All other feature branches are generally considered non-production or development branches
The build agent (i.e. an IAM service agent or priveleged admin user) that runs the Terraform CLI inherits the branch name $BRANCH in the build process context, and performs the following:
- Create a cloud project project-prefix-$BRANCH for the given branch, if it does not exist
- Create a storage bucket gs://tfstate-$BRANCH}/ within the project for use as the Terraform storage backend, if it does not exist
- Initialize Terraform using the state from gs://tfstate-$BRANCH/, and build the module-defined cloud resources within project-prefix-$BRANCH

The projects created for most development branches are ephemeral by design, and can be discarded after use (i.e. staging and integration tests) by the branch author. However the namespacing method is equally applicable for managing persistent resources, if desired.

Suppose your organization or team still wants to maintain at least one always-up UAT environment into which revisions are required to be merged for a user acceptance review before shipping them into production. For this use case, it’s straightforward to maintain a persistent uat branch that is typically several commits ahead of the repository’s main branch, and into which any feature branch may be merged. When user acceptance testing is completed successfully, the commits from feature can then be merged into main, after which feature may be discarded.

Terraform Implementation¶

This section outlines an implementation of branch namespacing with Terraform, and covers:

Structuring the Terraform module and configuring the backend and inputs
Managing the shell environment to set up the branch namespace and other variables
Configuring the terraform state backend

Module Configuration¶

The setup required in the Terraform module is minimal, with only a few considerations for the backend configuration and required input variables:

Backend Configuration
- Terraform should be configured with a suitable non-local backend, such as gcs, that is only partial configured such as to require the bucket name at runtime
- For resource creation, it will also be necessary to create input variables without default values for the namespaced project name
- The Terraform state file for each given branch will also be stored in its own (secure) namespaced remote storage bucket, separate from every other branch. Distinct buckets rather than a single centralized bucket is a practical necessity since the tfstate for the main branch is highly sensitive, but it’s reasonable to set up access rules for developer groups to access the tfstate files for their own branches.

As an example, the main module and variables file should contain blocks like the following:

# ---- main.tf ----
terraform {
  # Partially configured backend; requires "bucket"
  backend "gcs" {
    prefix = "terraform/tfstate"
  }
}
...

# ---- variables.tf ----
variable "project_id" {
  type = string
}
variable "branch_name" {
  type = string
}
...

Shell Context & Namespace Setup¶

No matter which system you deploy Terraform with, you’re likely running the terraform CLI in a shell context with modifiable environment variables. We recommend wrapping all terraform calls in a dedicated shell script responsible for setting up the environment and passing structural arguments like the branch name using the input variable -var switch.

By structural we specifically mean inputs to terraform that dictate how an environment and resources should be set up, but not what resources are to be created. The structural parameters (and basically the entirety of the shell script) should be essentially unchanging such there are no resource-related edits made in the build script; such edits are better made elsewhere, using .tfvars files files. Examples of parameters that meet these criteria are: branch namespace string, billing accounts, organization identifiers, etc.

A build script resembles the following stripped-down snippet, adapted from our own with error checking removed for brevity. The salient function to note is export_env(), which exports the shell environment for all subcommands invoked in the script, and is responsible for retrieving the branch name here with git branch --show [4].

#!/usr/bin/env bash

SCRIPT_NAME=$(basename $0)

export_env() {
  # Export all required environment variables

  ROOT_DIR=$(dirname $SCRIPT_NAME)

  # Define the variables for the app & orgaization
  export SITE_DOMAIN='example.com'
  export GOOGLE_CLOUD_PROJECT_PREFIX='appname'
  export GCS_TFSTATE_BUCKET_PREFIX='tfstate'

  # Set the namespaced variables to be used by terraform
  export BRANCH_NAME="$(git branch --show)"
  export GOOGLE_CLOUD_PROJECT="${GOOGLE_CLOUD_PROJECT_PREFIX}-${BRANCH_NAME}"
  export GCS_TFSTATE_BUCKET="${GCS_TFSTATE_BUCKET_PREFIX}-${BRANCH_NAME}.${SITE_DOMAIN}"
}

create_project() {
  ...
}

create_terraform_tfstate_bucket() {
  ...
}

run_terraform_init() {
  ...
}

# Only run & exit if this script is called directly; run & exit will
# not be called if dev is being sourced with '.' or 'source'
if [ "${BASH_SOURCE[0]}" -ef "$0" ]; then
  export_env
  create_project
  create_terraform_tfstate_bucket
  run_terraform_init

  if (($# == 0)); then
    exit 0
  fi

  terraform_vars=$(cat <<eof
    --var project_id=$GOOGLE_CLOUD_PROJECT \
    --var branch_name=$BRANCH_NAME
eof
  )
  _cmd="terraform $*"
  if [[ "$1" == "refresh" ]] \
  || [[ "$1" == "apply" ]] \
  || [[ "$1" == "destroy" ]] \
  || [[ "$1" == "plan" ]]; then
   _cmd="$_cmd ${terraform_vars}"
  fi
  $_cmd
fi

We’ve exluded the contents of create_project() and create_terraform_tfstate_bucket() above, as these will be specific to each organization. For us, it suffices to run several gcloud commands to create the project and bucket resources, e.g. with:

create_project() {
  gcloud projects create "$GOOGLE_CLOUD_PROJECT"
}

create_terraform_tfstate_bucket() {
  gcloud storage buckets create \
    "gs://$GCS_TFSTATE_BUCKET" \
    --project "$GOOGLE_CLOUD_PROJECT"
}

An organization with additional security requirements would likely have additional steps after creating a project or bucket, such as applying additional firewall rules for networks in development projects to allow ingress only from allow-listed VPN address ranges.

Terraform State Management¶

Whenever the build agent invokes terraform it has to retrieve the backend bucket path from the runtime context to determine the tfstate storage location. If the agent is run on a short-lived container (for instance after a VCS commit triggers a build), there is no local terraform state file to consider, and the remote state will be queried. However, if you allow for developers to directly modify state during development, either by assigning role permissions or impersonating a service account, it’s important to ensure that only the remote statefile is used as a source of truth and that correct branch state is queried.

To ensure a clean working state, we recommend the nuclear option of discarding any local configuration each time the terraform build script is invoked using -terraform init -reconfigure [5]. The run_terraform_init() function in the snippet above should look like the following, where the $GCS_TFSTATE_BUCKET_PREFIX has been set in the environment prior to invocation:

run_terraform_init() {
  # Initialize the local terraform environment with -reconfigure,
  # which will use the remote statefile and not attempt to migrate
  # any local existing state, which is desired when switching
  # between git branches with branch-namespaced backend state buckets.

  terraform init -reconfigure \
    -backend-config="bucket=$GCS_TFSTATE_BUCKET"
}

Discussion¶

Applicability in Practice¶

We’ve presented branch namespaces as a technique that addresses common developer painpoints, namely:

Allow concurrent deployment for multiple developers
Eliminate of configuration forking in a single codebase
Enable developers to create and prototype infrastructure as needed

It’s straightforward to see we’ve hit point #1; since each deployed project is guaranteed to be isolated, developers are able to create environments that are identical to production except for their revisions. However this itself merits discussion with respect to point #2: when is desirable to have identical environments and how should we fork when needed?

Having certain kinds of forked configurations for prod vs staging environments is indeed desirable. Suppose you have a resource-intensive application in production, perhaps some sexy machine learning service with a fleet of expensive GPU instances. To verify a revision to the application code it will likely be be sufficient and much cheaper to deploy the revision’s code on a stack with fewer, smaller instances. This use case is certainly possible, and we recommend using file-based tfvars to provide input variables for any resource-related attribute (e.g. compute instance family or number of instances) as follows:

Maintain multiple environment-specific configurations as is often done, e.g. main.tfvars, staging.tfvars
Add conditional logic to determine which tfvars file to use in the shell script, and pass the result to terraform using the --var-file switch

Keeping logical branching outside of the terraform modules has the advantage of maintaining the separation of responbilities between the application owners who maintain the build process and application developers who use the build process. While having multiple variable files still creates forked configurations in the codebase, it’s an acceptable fork lite ™ since the tfvars files can be structured as simple dictionary payloads to be accessed by the modules, which are structurally identical regardless of the source file for the input variables.

With regards to point #3, we’ll be blunt: whether branch namespacing will enable developers to experiment and prototype really depends on the particularities of your organization, rather than on the technical feasibility. Enabling sandboxed experimentation by allowing developers to spin up stack and applications will likely require a little up-front philosophical discussion, some planning, and then development time to set up the appropriate build agents and project creation mechanism in a large organization. At the end of the day, it might be a tough sell for an existing application with a large codebase. However if you’re in the process of doing the math on the cost of developer salary x hours spent waiting in staging environment queues, or perhaps the opportunity cost of your engineers not experimenting, it may be worth prototyping a branch-namespaced deployment mechanism.

Target Architectures¶

There are many different kinds of enterprise applications out there, and branch namespaced deployment as we’ve implemented for this site isn’t necessarily a good fit for all application architectures.

We’re painting with a broad brush, but this method is best applied to:

monolithic database-backed applications
microservice architectures (with either monorepos or separate repos per service) [6]

In these types of applications, staging code revisions with a fully mirrored environment is desired due to the interconnected nature of the resources: if you want to run a suite of integration tests properly, you actually do need to spin up the resources exactly as they’ll be in production.

Conversely, this technique is less suitable applications running on shared resource computing architectures, such as a Spark or Hadoop cluster used by multiple teams that run independent jobs. In this setting in which the cluster users treat the cluster resources as a service prodided by the data platform, it’s relatively rare that the data job developers will be modifying the cluster configuration, and it’s unnecessary to spin up a new cluster on a per-branch basis. In this case it’s preferable to have a persistent staging cluster that’s dedicated to resourcing adhoc jobs submitted by the (data) developers. Note however that the cluster infrastructure itself could still be versioned and deployed with a branch namespace, enabling the data platform maintainers to spin up new clusters as needed to test new configurations.

Implementation Gotchas¶

There are always gotchas that are easy to walk into without advance notice. If you’re interested in branch namespacing your deployment, watch out for the following:

Branch name length & allowed characters.
- Cloud providers have restrictions on the allowed characters and maximum length of bucket and project names
- Make sure that branch names meet all the criteria for the resources in your cloud, and err on side of caution with only alphanumeric characters and hyphens
Tip

Name VCS branches systematically, ideally by concatenating:
an alphanumeric ticket or issue identifier from your organization’s issue tracking system

the deveoper’s unique username in your organization
E.g.: Branch named:msmith-101 for the author msmith working on a branch that addresses Issue #101.

For most feature development, name projects by concatenating:
a short prefix that’s mnemomic for your company/application

the branch name
E.g.: Project named: exdc-msmith-101 for work at example.com on the branch msmith-101.
Add unique suffices to problematic resources.
- Some resources have additional practical restrictions due to the difficulty in destroying & recreating them during development (for instance you cannot delete a Google CloudSQL database and subsequently create a new one in any timeframe you’re likely willing to stomach)
- To simplify resource destruction during iterative testing, add random suffices or (more user-friendly) pet names to problematic resources specifically in your modules

Conclusion¶

This post was a deep dive into using VCS branch names as deployment namespaces within a software project, with the goal of managing an IaC stack elegantly and improving developer workflows.

If you found these techniques interesting and would like to see a working example, we’ve made a snapshot of our terraform modules available on github for reference. While our stack is relatively simple, it’s still typical of many web applications, and we hope you find the presentation and examples helpful in your own work.

If you have any feedback or questions, don’t hestitate to contact us and we’ll be happy to get in touch.

Footnotes

[1]

It’s not uncommon or necessarily problematic to have tripartite config files (e.g. prod.yml, staging.yml, dev.yml) all tracked in VCS. However this practice can cross a line into an anti-pattern when config files balloon in size beyond a small variable payload, or when there are material differences in the stack that the configs define, such that discrepancies between staging and production can arise. The worst case we’ve seen out there in the wild is when all three of these configs require nearly-identical edits to deploy revisions in a staging environment, even if the deployment in staging is intended to be ephemeral and later require further edits to amend the non-production configs–what a hassle! All code changes come with risk, so be wary of a workflow that requires VCS-tracked edits (especially soon-to-be-reverted edits) for a short-lived purpose. It’s not the end of the world but it’s a symptom of a workflow that could use a little elbow grease.

[2]

Don’t take this too far out of context–security teams rightfully create friction for developers! We’re not arguing for the removal of oversight. Rather, we believe it’s important to encourage ideation and experimentation within an organization, and there are relatively straightforward tools to achieve this with proper oversight by information security. This is just yet another opinion on the internet, but we think information security in a lot of organizations could be closer to the development workflow on a supervisory basis, for instance acting as required code-owners to approve changes to Terraform module files. What can happen in large organizations is far from this: InfoSec is distant and clunky, and has weirdly non-technical bureaucratic layers encapsulating the organization’s security experts such that it feels eerily similar to calling an internet company to report a problem with your router and having to painfully walk through the rote debugging steps over the phone–and then once you’ve finally reached the right person you have a hell of a time speaking in a common language because the stack is often obfuscated from developers and its details known only to InfoSec, such that both parties in the conversation are talking in circles trying to solve respective XY Problems for each other. Achieving common understanding and sufficient background context is the first step to solving any problem after all, and we posit to you that it’s possible to satisfy both security and developer teams with a little more up-front collaboration in an application’s workflow tooling.