tinybox

Introduction

Monorepo for an as-yet-unnamed renter data platform, working title "tinybox".

Melbourne - Annual Average Traffic Volume

CBD - Bike Paths and Docks

Victoria - Median Population Age by Local Government Area

Victoria - Crime Rate by Local Government Area

There are two main "applications" in this repo: the ETL pipeline (tools/etl), and the web app (tools/web). The ETL pipeline uses the Dagster framework; the web app uses the Django framework.

Currently, the purposes of this README.md file are to (i) sketch out the structure of the repo at a high level; and (ii) describe how to set up a local development environment, which of necessity includes (iii) some instruction on how to use our Python package manager, Hatch and (iv) establish some development conventions. Note that it is entirely possible to work on the web app without touching the ETL pipeline, and vice versa - they are independent of each other (although they currently share a single project Python interpreter).

In due course, this README.md will also contain (i) a more complete introduction to the project and (ii) instructions for deploying the application to a production environment.

The ARCHITECTURE.md file will contain a high-level overview of the project architecture, including diagrams written in the D2 diagramming DSL, using the C4 paradigm (https://c4model.com/) for simplicty and clarity.

On the other hand, comprehensive development documentation will live under the docs folder. If you need to add documentation while developing, please do so there under the relevant application subfolder.

This repo is structured using src-layout: local library code is in src, and the main applications are in tools. Accordingly, if you need to write an importable Python module, please do so in a subfolder under src and import it into the relevant application in tools. This will allow us to keep the main applications as clean as possible. When invoked, Hatch automatically ensures that all library code under src is installed in development mode, rendering it callable from anywhere else in the project. There is no need to do so manually.

Contributions

This repo implements the 12 Factor App model.

Contributions should follow the Github Flow workflow model, with each feature or bugfix being developed on a separate branch (ideally called dev-<contributor_initials>-<descriptive_name>), and then merged into main via a pull request. The main branch is protected and requires pull requests to be reviewed and approved before merging. Note: please ensure that your development branch is up to date with main before creating the pull request.

This project uses semantic versioning and is currently in pre-release. Please use Conventional Commits for your commit messages. This will allow us to automatically generate changelog and release notes. Please also follow the [50/72] convention (https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html) for commit messages to ensure they can be read from the terminal.

Local development environment setup

For development purposes, the following assumes local installation on macOS (running on Apple Silicon), without containers. In due course we'll move to a local Docker development pipeline (using VS Code's devcontainer.json). MVP deployment will be using Docker on Digital Ocean VPS.

System dependencies

Ensure that the following required dependencies are installed and on path.

Postgres.app: Optionally, download and install the latest macOS Universal binary. Note that Postgres.app comes with PostGIS pre-installed. If you choose not to use Postgres.app, you will need to install Postgres and PostGIS separately and ensure that both are on path.
Homebrew: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Ensure that you also follow instructions to add Homebrew to your path.
Git: brew install git
Git-LFS: brew install git-lfs
Hatch: brew install hatch
nvm: curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.3/install.sh | bash
Using nvm, the latest Node and NPM: nvm install node
GeoDjango binary dependencies: brew install postgis gdal libgeoip #TODO: Note that not installing postgis here appears to lead to a failure to find the gdal library; but presumably that library should already be available with the PostGIS binaries installed with Postgres.app, so is this a PATH issue? Need to look into this more because it will affect deployment too.
Tippecanoe fork, maintained by "Felt", for generating custom vector tilesets: brew install tippecanoe.
- Note that as of today, the brew formula for Tippecanoe is current (v2.24.0). Should that change, you can install from source using the instructions on the repo.
Maputnik for styling tilesets: brew install kevinschaul/homebrew-core/maputnik
Ngrok: brew install ngrok/ngrok/ngrok

Optionally, we also recommend installing Direnv (brew install direnv) to handle auto-loading of environment variables. Don't forget to hook Direnv into your (zsh) shell. Once install, call direnv allow from the repo root to enable automatic sourcing of environment variables from the .envrc file.

Note that if at any point any of the above commands fail, try closing and re-opening your terminal to ensure that your path is up to date before trying again.

Finally, we need to tell Hatch to build its virtual environment in the repo itself, instaed of elsewhere - as is default. Having the virtual environment local to the repo (excluded, of course, from version control) will allow you to tell VS Code to use it as the interpreter for debugging purposes. Edit the config.toml file located at ~/Library/Preferences/hatch and add the following under [dirs.env]:

[dirs.env]
virtual = ".hatch"

Then, in your repo root directory, create a .env file and ensure that the following env is present: HATCH_CONFIG="/Users/<username>/Library/Preferences/hatch/config.toml"

Install Node development dependencies

All Node development dependencies (as opposed to the build dependencies that would need to ship with our web app) are specified in package.json. To install them, run npm install from the repo root. This will enable IDE tooling such as ESLint.

Setup PostgreSQL for use with Django

You now need to setup a project database. Use whatever interface you like to spin up a new database, recommended called tinybox. The following instructions are for Postgres.app.

Open Postgres.app interface, click on any database to open the command line, then create the database with: CREATE DATABASE tinybox;.

Tell Postgres.app to enable (they come pre-bundled with Postgres.app) the relevant PostGIS extensions:

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_raster;
CREATE EXTENSION postgis_topology;

Next, create an .envrc file from .envrc.sample. No changes are needed to this file at this time.

Next, create an .env file from .env.sample. Fill out all of the envs prepended with PGDB_ with details of your database.

Note that the .env file exports a PATH env pointing to Postgres.app. This is because the psycopg2 Python dependency requires a pg_config binary to be on path. Such a binary is included with the Postgres.app install. If you're not using Postgres.app for development (or if you're deploying to production), you'll need to change this accordingly.

You should also update the following envs in your .env file by replacing <repo-root> with the absolute path to this repo on your local machine.

WORKING_INPUT_DIR
WORKING_OUTPUT_DIR
DAGSTER_HOME

Project install

Python dependencies (including the required 3.9.x Python interpreter itself) are specified in a PEP518-compliant pyproject.toml file in the project root.

We now tell Hatch to build the development environment. Open a terminal then, from the repo root, call:

hatch shell.

Since this is the first time we are calling Hatch, it installs the project Python interpreter and all Python dependencies as specified in the pyproject.toml file, ensures that the project Python interpreter is activated in the current shell session, and that the project root directory is on the Python path.

Next, we install the Tailwind CSS framework and its Node dependencies.

hatch run python tools/web/manage.py tailwind install

We now need to set a new DJANGO_SECRET_KEY env. Open a terminal then, from the repo root, call:

hatch run python tools/web/manage.py shell

This launches a Python interpreter. You can then generate the key with the following:

from django.core.management.utils import get_random_secret_key
get_random_secret_key()

Copy the output into the DJANGO_SECRET_KEY environment variable in your .env file.

Next, add yourself as a Django admin user: hatch run python tools/web/manage.py createsuperuser from the repo root, then follow the prompts.

Note that if you are using VS Code, you should also ensure that MallocNanoZone=1 (as specified in .env.sample file) is set in your .env file to prevent malloc errors on launching a Dagster development server.[^1] This should be the case by default.

Third Party API Keys

Open your .env file. You now need to add the required API keys for Stadia Maps and Mapbox, our mapping API providers. If you don't have an account for either, you'll need to create one and then generate the API key.

STADIA_MAPS_API_KEY
MAPBOX_API_KEY

Development

Your local development environment is now ready to go.

Ensure you are in the project root directory, then:

for the ETL pipeline server, call hatch run etldev.
for the web app development server, call hatch run webdev.

(Note that, to avoid conflicts with other Django projects, our entrypoint launches the development server on non-default port 8080, instead of 8000; accordingly, it can be accessed at: http://localhost:8080/).

Each of the etldev and webdev arguments passed to hatch run are known as "entrypoints", and they are defined in the pyproject.toml file (under [tool.hatch.envs.default.scripts]). Should you need to add new entrypoints, feel free to do so there. For instance, I've also added Django migration entrypoints as follows:

webmakemigrate = "python tools/web/manage.py makemigrations"
webmigrate = "python tools/web/manage.py migrate"
webshell = "python tools/web/manage.py shell"

You can then call these from the root directory to run the Django migrations for the web app.

Essentially, you should use hatch run to run any command that you would normally run from the command line, but which requires the project Python interpreter to be activated and the project root directory to be on the Python path. This includes running the Django development server, running the Dagster development server, running the Dagster CLI, running the Django CLI, running the Django test suite, etc.

Each time Hatch is called (in any capacity) from the command line, it will automatically ensure that any changes to Python dependencies in the pyproject.toml file are automatically synced to the project Python interpreter. Accordingly, if you need to add Python dependencies - be they runtime or development dependencies - add them to pyproject.toml, under the [project] -> dependencies section for now (in due course we may end up specifying more than one interpreter for the project, but for now it's shared across tools for simplicity).

VS Code debugger setup

Interpreter

If you are using VS Code, open this project repo as a Workspace using the enviro.code-workspace file in the repo root. Hit Cmd + P, Python: Select Interpreter, the choose the newly-installed project Python interpreter (which should be at .hatch/tinybox/bin/python3.9). This will allow you to conveniently use the VS Code debugger, should you need it.

Python debugging with `launch.json` and the Python extension

On account of the way that string arguments containing spaces in launch.json are parsed before being passed to the integrated terminal, attempting to run the debugger in such circumstances will leave you tearing your hair out. Accordingly, all launch-type debug configurations presently use "console": "externalTerminal". Any new debug configurations added should also use this flag.

Updating pinned dependencies

If you are doing a fresh deploy, go here and create a new app. Select this GitHub repo. As our Django web app code is not located in the repo root, you'll need to specify the tools/web directory as the app root. Principal Python dependencies, as specified in pyproject.toml, should to the extent possible be pinned. Unfortunately this is currently a manual process with Hatch. We'll need to do it every couple of weeks or so.

Use:

pip-compile --extra dev -o dev-requirements.txt pyproject.toml --resolver=backtracking

to export a list of currently installed dependencies, then adjust the pinned versions manually in pyproject.toml to match.

Deployment

Django web app

To deploy our Django web app to Digital Ocean's App Platform, I followed the official tutorial.

The complete Digital Ocean App Platform Python buildpack documentation is also handy.

From the repo root:

pip freeze > tools/web/requirements.txt

Note that this file must be called requirements.txt for the buildpack to work.

You must then delete the log entries at the top of that file, along with the reference to the development install of this project itself, which should look something like this:

-e git+https://github.com/alexclaydon/tinybox.git@0018991471fd73c7b1e2fc953e61ddfd06e42253#egg=tinybox

Ensure that in the tools/web directory (i.e., the Django project root), the runtime.txt file - which tells the buildpack which version of Python to use - specifies the same Python version as the project Python interpreter (i.e., the one specified in pyproject.toml). In our case, this is python-3.9.16. See available runtimes here - they appear to be using some part of the Heroku stack.

Hit create. On the next page, ensure the "Name" is set to "tinybox-app". Next, edit the "Run Command" to append web/wsgi.py, which is the entrypoint for the Django web app, as follows:

gunicorn --worker-tmp-dir /dev/shm web/wsgi.py.

Next, hit the "Back" button so that we can add resources.

Select the Plan first - I'm using "Basic" for our current test deploy.

Click on "Add Resource (Optional)", then add a dev database and call it the default db.

Click next and then set up the environment variables - as "Global" envs (I think) - as follows. Ensure that for each one, you enable the "Encrypt" checkbox.

DJANGO_ALLOWED_HOSTS = ${APP_DOMAIN} DATABASE_URL -> ${<NAME_OF_YOUR_DATABASE>.DATABASE_URL} Note that as we named our database db above, this should be ${db.DATABASE_URL} DEBUG -> True DJANGO_SECRET_KEY -> create and store one in Bitwarden.

Keep clicking the "Next" button until you reach the "Review" page.

Click "Create Resources". At this stage, even if the build fails, as far as I understand it, the app is live and so you can just make whatever changes you need to to the code, push it and it will try the build again.

Assuming the build process completes successfully, we next need to perform Django first-time setup. Go to the "Console" tab and running the following commands:

python manage.py migrate python manage.py createsuperuser

Create at least one superuser.

That's it for now, but once everything is up and running, come back here and follow the next part of the [tutorial], which deals with serving static files.

Footnotes

[^1] Regarding the malloc error, see this helpful comment (requires Slack account 🙄) and this.

alexclaydon / tinybox

tinybox

Introduction

Contributions

Local development environment setup

System dependencies

Install Node development dependencies

Setup PostgreSQL for use with Django

Project install

Third Party API Keys

Development

VS Code debugger setup

Interpreter

Python debugging with `launch.json` and the Python extension

Updating pinned dependencies

Deployment

Django web app

Footnotes

About

Languages

tinybox

Introduction

Contributions

Local development environment setup

System dependencies

Install Node development dependencies

Setup PostgreSQL for use with Django

Project install

Third Party API Keys

Development

VS Code debugger setup

Interpreter

Python debugging with launch.json and the Python extension

Updating pinned dependencies

Deployment

Django web app

Footnotes

About

Languages

Python debugging with `launch.json` and the Python extension