Published: Nov. 22, 2022
Tags: web-development google-cloud
Site Architecture¶
The hynescorp site is a Django application deployed in Google Cloud, using a mixture of serverless and persistent instances for serving our frontend traffic and running backend processing tasks.
At a high level, the primary functions of the site are to serve:
landing and blog pages for anonymous traffic
dynamically generated pages in the feedgrid news reader app for authenticated users
As part of our series on our site stack, this post gives a high level overview of the application architecture and our development workflows for managing the system.
Application Components¶
Our site has a typical architecture for a cloud-based model-view-controller (MVC) web application that serves request/response HTTP traffic with content generated by backend procsses:
(Serving) We run Django on Google App Engine instances for our frontend, connected to a Postgres database backend on a managed Cloud SQL instance as a backend.
(Scraping) Our web scrapers are deployed on isolated instances in the same virtual prviate cloud (VPC) as our database, and are decoupled from the frontend Django instances.
(VPC Topology) Our backend resources are hosted in a VPC with firewall rules to deny network ingress except for authenticated ssh tunnels to a dedicated bastion instance as a single secured point of entry to provide port forwarding for administrative tasks such as database management.
A high level view of this architecture is shown in the diagram below:
Components Reference Table¶
The application components describe above are summarized in the table below for reference:
Component |
Component Parts |
Summary |
---|---|---|
Web Server |
|
|
Database |
|
|
Web Scraper |
|
|
Bastion-Proxied Access |
|
Database Access Workflow¶
During code development and system maintenance, software engineers need processes for:
local development, with short iterations to make changes and which gives fast feedback about those changes and which is independent of any network resources (even the internet) [1]
staging development, in which code changes can be deployed and iterated on in an environment that mirrors production
production access, in which privelleged users (or agents) can access production environments securely for inspection or administrative tasks
Our workflows for these 3 cases can blur together, since the main distinction we make is how we connect to the database that needs to be accessed. Removing external IP addresses from the database is a trade-off in convenience for increased security, but it’s relatively straightfoward to set up a workflow that uses bastion-mediated database access with ssh forwarding.
Local DB¶
When making making small changes, we run the local Django development server with a sqlite database file
There are only minor dialetical differences between sqlite and Postgres, and so this method works well when focusing primarily on template changes.
Bastion-Proxied Remote DB¶
When making larger changes to views or database models, the most common workflow we use is to connect directly to the database instance created for our development branch, mediated by the bastion host that runs the Cloud SQL Auth proxy as a systemd service.
We still run a local Django development server, but use ssh port forwarding to bind the postgres post 5432 on our local machine, such that the local Django instance will connect to any database (either production or development databases) at 127.0.0.1:5432.
This process is illustrated in the following sequence diagram, which shows the order in which the bastion is used as a jump box to forward TCP traffic to the postgres instance on our private subnet. Once port 5432 is forwarded we can connect to the database using multiple tools: the Django shell, the development server, the psql CLI, etc.
The connection pattern above allows us to create a postgres instance in a development project, develop locally and interact with the remote database using the Django server, and then deploy the revised code to app engine after which the GAE instance will connect to the same database [2].
Access and administration for the production may be performed analogously, if needed. This is particularly convenient when querying or modifying the database directly within a psql session, which is often the simplest route for record inspection and bulk updates.
Next Up: Deployment
Interested in how we build and manage the architecture? Our next post in this series will deep dive into our cloud infrastructure Terraform code and how we create and manage staging environments.
Footnotes