Published: Nov. 23, 2022

Tags: git deployment terraform

Git-Namespaced Deployment

If you do any work in software testing you’re going to hear a lot about the three environments in which code is run: development, staging, and production.

Separating environments is necessary, but there’s some pitfalls to the way that many organizations set up persistent infrastructure for staging that manifest as teams grow and many developers need concurrent access to staging or user-acceptance-testing (UAT) environments.

As part of our series on our site stack, this post presents some simple techniques we use on this site for managing and deploying cloud environments.

We’ll cover the following:

Background & Motivation

Application deployment is often more computer art than computer science. Even if the application software looks good on a localhost, testing and shipping it can be a daily slog for developers without some consideration for their workflows.

When assessing a deployment mechanism, it’s helpful to pose questions like the following:

  • Can we automatically (or at least easily) create a fully-featured staging environment that replicates production?

  • Can our developers rapidly prototype changes to an application that require changes to the stack, such as adding a new datastore and network configuration?

  • Have we eliminated or minimized per-environment branching logic in our application code?

Far from being rhetorical, the answer to at least one of these questions is often no. Poorly devised deployment mechanisms manifest in several common pain-points for developers:

  • Multiple developers are unable to concurrently deploy their changes to a shared persistent staging environment, since they are blocked by other revisions in the queue

  • The codebase has tripartite (prod/staging/dev) logical branches or configurations that developers need to edit during their development but later revert before merging to production [1]

  • To experiment or prototype features, developers are constantly dependent on other infrastructure or security teams to make changes outside their control [2]

An elegant deployment mechanism will ideally avoid all of these problems and systematize processes for developers of different stripes in an organization of any size.

Method Overview

Branch-namespaced deployment is done by using VCS branch names as a fundamental input parameter to determine both the deployment context and the resultant resource names created in the build environment.

The process is illustrated in the diagram below, which shows how the branches of a source code repository (left) are manifested into namespaced cloud projects during the build process, coupling the branch names to the built resources:

Git namespaced deployment

With respect to the above diagram, we may note that:

  • Any branch in the VCS is used to derive the namespace for a buildable environment using a single Terraform module codebase [3] in the repository,

    • We consider our production branch as simply the main (formerly master) branch

    • All other feature branches are generally considered non-production or development branches

  • The build agent (i.e. an IAM service agent or priveleged admin user) that runs the Terraform CLI inherits the branch name $BRANCH in the build process context, and performs the following:

    • Create a cloud project project-prefix-$BRANCH for the given branch, if it does not exist

    • Create a storage bucket gs://tfstate-$BRANCH}/ within the project for use as the Terraform storage backend, if it does not exist

    • Initialize Terraform using the state from gs://tfstate-$BRANCH/, and build the module-defined cloud resources within project-prefix-$BRANCH

The projects created for most development branches are ephemeral by design, and can be discarded after use (i.e. staging and integration tests) by the branch author. However the namespacing method is equally applicable for managing persistent resources, if desired.

Suppose your organization or team still wants to maintain at least one always-up UAT environment into which revisions are required to be merged for a user acceptance review before shipping them into production. For this use case, it’s straightforward to maintain a persistent uat branch that is typically several commits ahead of the repository’s main branch, and into which any feature branch may be merged. When user acceptance testing is completed successfully, the commits from feature can then be merged into main, after which feature may be discarded.

Terraform Implementation

This section outlines an implementation of branch namespacing with Terraform, and covers:

  • Structuring the Terraform module and configuring the backend and inputs

  • Managing the shell environment to set up the branch namespace and other variables

  • Configuring the terraform state backend

Module Configuration

The setup required in the Terraform module is minimal, with only a few considerations for the backend configuration and required input variables:

  • Backend Configuration

    • Terraform should be configured with a suitable non-local backend, such as gcs, that is only partial configured such as to require the bucket name at runtime

    • For resource creation, it will also be necessary to create input variables without default values for the namespaced project name

    • The Terraform state file for each given branch will also be stored in its own (secure) namespaced remote storage bucket, separate from every other branch. Distinct buckets rather than a single centralized bucket is a practical necessity since the tfstate for the main branch is highly sensitive, but it’s reasonable to set up access rules for developer groups to access the tfstate files for their own branches.

As an example, the main module and variables file should contain blocks like the following:

1# ---- main.tf ----
2terraform {
3  # Partially configured backend; requires "bucket"
4  backend "gcs" {
5    prefix = "terraform/tfstate"
6  }
7}
8...
1# ---- variables.tf ----
2variable "project_id" {
3  type = string
4}
5variable "branch_name" {
6  type = string
7}
8...

Shell Context & Namespace Setup

No matter which system you deploy Terraform with, you’re likely running the terraform CLI in a shell context with modifiable environment variables. We recommend wrapping all terraform calls in a dedicated shell script responsible for setting up the environment and passing structural arguments like the branch name using the input variable -var switch.

By structural we specifically mean inputs to terraform that dictate how an environment and resources should be set up, but not what resources are to be created. The structural parameters (and basically the entirety of the shell script) should be essentially unchanging such there are no resource-related edits made in the build script; such edits are better made elsewhere, using .tfvars files files. Examples of parameters that meet these criteria are: branch namespace string, billing accounts, organization identifiers, etc.

A build script resembles the following stripped-down snippet, adapted from our own with error checking removed for brevity. The salient function to note is export_env(), which exports the shell environment for all subcommands invoked in the script, and is responsible for retrieving the branch name here with git branch --show [4].

 1#!/usr/bin/env bash
 2
 3SCRIPT_NAME=$(basename $0)
 4
 5export_env() {
 6  # Export all required environment variables
 7
 8  ROOT_DIR=$(dirname $SCRIPT_NAME)
 9
10  # Define the variables for the app & orgaization
11  export SITE_DOMAIN='example.com'
12  export GOOGLE_CLOUD_PROJECT_PREFIX='appname'
13  export GCS_TFSTATE_BUCKET_PREFIX='tfstate'
14
15  # Set the namespaced variables to be used by terraform
16  export BRANCH_NAME="$(git branch --show)"
17  export GOOGLE_CLOUD_PROJECT="${GOOGLE_CLOUD_PROJECT_PREFIX}-${BRANCH_NAME}"
18  export GCS_TFSTATE_BUCKET="${GCS_TFSTATE_BUCKET_PREFIX}-${BRANCH_NAME}.${SITE_DOMAIN}"
19}
20
21create_project() {
22  ...
23}
24
25create_terraform_tfstate_bucket() {
26  ...
27}
28
29run_terraform_init() {
30  ...
31}
32
33# Only run & exit if this script is called directly; run & exit will
34# not be called if dev is being sourced with '.' or 'source'
35if [ "${BASH_SOURCE[0]}" -ef "$0" ]; then
36  export_env
37  create_project
38  create_terraform_tfstate_bucket
39  run_terraform_init
40
41  if (($# == 0)); then
42    exit 0
43  fi
44
45  terraform_vars=$(cat <<eof
46    --var project_id=$GOOGLE_CLOUD_PROJECT \
47    --var branch_name=$BRANCH_NAME
48eof
49  )
50  _cmd="terraform $*"
51  if [[ "$1" == "refresh" ]] \
52  || [[ "$1" == "apply" ]] \
53  || [[ "$1" == "destroy" ]] \
54  || [[ "$1" == "plan" ]]; then
55   _cmd="$_cmd ${terraform_vars}"
56  fi
57  $_cmd
58fi

We’ve exluded the contents of create_project() and create_terraform_tfstate_bucket() above, as these will be specific to each organization. For us, it suffices to run several gcloud commands to create the project and bucket resources, e.g. with:

create_project() {
  gcloud projects create "$GOOGLE_CLOUD_PROJECT"
}

create_terraform_tfstate_bucket() {
  gcloud storage buckets create \
    "gs://$GCS_TFSTATE_BUCKET" \
    --project "$GOOGLE_CLOUD_PROJECT"
}

An organization with additional security requirements would likely have additional steps after creating a project or bucket, such as applying additional firewall rules for networks in development projects to allow ingress only from allow-listed VPN address ranges.

Terraform State Management

Whenever the build agent invokes terraform it has to retrieve the backend bucket path from the runtime context to determine the tfstate storage location. If the agent is run on a short-lived container (for instance after a VCS commit triggers a build), there is no local terraform state file to consider, and the remote state will be queried. However, if you allow for developers to directly modify state during development, either by assigning role permissions or impersonating a service account, it’s important to ensure that only the remote statefile is used as a source of truth and that correct branch state is queried.

To ensure a clean working state, we recommend the nuclear option of discarding any local configuration each time the terraform build script is invoked using -terraform init -reconfigure [5]. The run_terraform_init() function in the snippet above should look like the following, where the $GCS_TFSTATE_BUCKET_PREFIX has been set in the environment prior to invocation:

run_terraform_init() {
  # Initialize the local terraform environment with -reconfigure,
  # which will use the remote statefile and not attempt to migrate
  # any local existing state, which is desired when switching
  # between git branches with branch-namespaced backend state buckets.

  terraform init -reconfigure \
    -backend-config="bucket=$GCS_TFSTATE_BUCKET"
}

Discussion

Applicability in Practice

We’ve presented branch namespaces as a technique that addresses common developer painpoints, namely:

  1. Allow concurrent deployment for multiple developers

  2. Eliminate of configuration forking in a single codebase

  3. Enable developers to create and prototype infrastructure as needed

It’s straightforward to see we’ve hit point #1; since each deployed project is guaranteed to be isolated, developers are able to create environments that are identical to production except for their revisions. However this itself merits discussion with respect to point #2: when is desirable to have identical environments and how should we fork when needed?

Having certain kinds of forked configurations for prod vs staging environments is indeed desirable. Suppose you have a resource-intensive application in production, perhaps some sexy machine learning service with a fleet of expensive GPU instances. To verify a revision to the application code it will likely be be sufficient and much cheaper to deploy the revision’s code on a stack with fewer, smaller instances. This use case is certainly possible, and we recommend using file-based tfvars to provide input variables for any resource-related attribute (e.g. compute instance family or number of instances) as follows:

  • Maintain multiple environment-specific configurations as is often done, e.g. main.tfvars, staging.tfvars

  • Add conditional logic to determine which tfvars file to use in the shell script, and pass the result to terraform using the --var-file switch

Keeping logical branching outside of the terraform modules has the advantage of maintaining the separation of responbilities between the application owners who maintain the build process and application developers who use the build process. While having multiple variable files still creates forked configurations in the codebase, it’s an acceptable fork lite ™ since the tfvars files can be structured as simple dictionary payloads to be accessed by the modules, which are structurally identical regardless of the source file for the input variables.

With regards to point #3, we’ll be blunt: whether branch namespacing will enable developers to experiment and prototype really depends on the particularities of your organization, rather than on the technical feasibility. Enabling sandboxed experimentation by allowing developers to spin up stack and applications will likely require a little up-front philosophical discussion, some planning, and then development time to set up the appropriate build agents and project creation mechanism in a large organization. At the end of the day, it might be a tough sell for an existing application with a large codebase. However if you’re in the process of doing the math on the cost of developer salary x hours spent waiting in staging environment queues, or perhaps the opportunity cost of your engineers not experimenting, it may be worth prototyping a branch-namespaced deployment mechanism.

Target Architectures

There are many different kinds of enterprise applications out there, and branch namespaced deployment as we’ve implemented for this site isn’t necessarily a good fit for all application architectures.

We’re painting with a broad brush, but this method is best applied to:

  • monolithic database-backed applications

  • microservice architectures (with either monorepos or separate repos per service) [6]

In these types of applications, staging code revisions with a fully mirrored environment is desired due to the interconnected nature of the resources: if you want to run a suite of integration tests properly, you actually do need to spin up the resources exactly as they’ll be in production.

Conversely, this technique is less suitable applications running on shared resource computing architectures, such as a Spark or Hadoop cluster used by multiple teams that run independent jobs. In this setting in which the cluster users treat the cluster resources as a service prodided by the data platform, it’s relatively rare that the data job developers will be modifying the cluster configuration, and it’s unnecessary to spin up a new cluster on a per-branch basis. In this case it’s preferable to have a persistent staging cluster that’s dedicated to resourcing adhoc jobs submitted by the (data) developers. Note however that the cluster infrastructure itself could still be versioned and deployed with a branch namespace, enabling the data platform maintainers to spin up new clusters as needed to test new configurations.

Implementation Gotchas

There are always gotchas that are easy to walk into without advance notice. If you’re interested in branch namespacing your deployment, watch out for the following:

  1. Branch name length & allowed characters.

    • Cloud providers have restrictions on the allowed characters and maximum length of bucket and project names

    • Make sure that branch names meet all the criteria for the resources in your cloud, and err on side of caution with only alphanumeric characters and hyphens

    Tip

    Name VCS branches systematically, ideally by concatenating:

    1. an alphanumeric ticket or issue identifier from your organization’s issue tracking system

    2. the deveoper’s unique username in your organization

    E.g.: Branch named:msmith-101 for the author msmith working on a branch that addresses Issue #101.

    For most feature development, name projects by concatenating:

    1. a short prefix that’s mnemomic for your company/application

    2. the branch name

    E.g.: Project named: exdc-msmith-101 for work at example.com on the branch msmith-101.

  2. Add unique suffices to problematic resources.

    • Some resources have additional practical restrictions due to the difficulty in destroying & recreating them during development (for instance you cannot delete a Google CloudSQL database and subsequently create a new one in any timeframe you’re likely willing to stomach)

    • To simplify resource destruction during iterative testing, add random suffices or (more user-friendly) pet names to problematic resources specifically in your modules

Conclusion

This post was a deep dive into using VCS branch names as deployment namespaces within a software project, with the goal of managing an IaC stack elegantly and improving developer workflows.

If you found these techniques interesting and would like to see a working example, we’ve made a snapshot of our terraform modules available on github for reference. While our stack is relatively simple, it’s still typical of many web applications, and we hope you find the presentation and examples helpful in your own work.

If you have any feedback or questions, don’t hestitate to contact us and we’ll be happy to get in touch.


Footnotes