This is a data warehouse and data lake system for social, civic and social analysis from Siege Analytics.
Built with:
Runs:
- PostgreSQL + PostGIS
- Python
- R (NOT YET)
- Geoserver (NOT YET)
- Zeppelin Notebook
- GeoTrellis (NOT YET)
- Ubuntu
- dbt
- Spark (FIXING)
Data warehouse is built to enable longitudinal analysis from Census and Bureau of Labour Statistics. Intended growth:
- FEC information
- Election results
- Media markets
- Officials and jurisdictions
It's recommended to use make
to run the docker compose
commands below because we are auto-generating docker-compose.yml
and .env
files. Because of the magic of make
, changes to the source includes will automatically trigger a re-make of the compose files.See below for more on how to work with the auto-generation.
As always, compose.override.yml
can be used if you need changes to the auto-generated configs.
Here are the make
commands that wrap docker compose:
down
- this will terminate the containers, volumes, networks and remove them. It's a last resort command.up
- this will start containers, networks, volumes from rest and run them in detached mode.build
- this will build the containers, networks and volumes.rebuild
- this will build the containers, networks, volumes from nothing, not relying on cached resources. [docker compose build --no-cache
]clean
- this will terminate the containers, volumes and networks, and remove them.prune
- this will remove all stopped containers, without removing running containers.
Here are some of the important make
commands for working with the containers:
pg_shell
- this will create anssh
connection to thePostgreSQL
server container.python_term
- this will create anssh
connection to thePython
containerfetch_jars
- this usesmaven
to getjar
files that are used bySpark
to operate. It will save them in the default location copy them to thejars
directory in the project.
By default, all .yml
files in the docker/
directory are used to generate the docker-compose.yml
file. The .env
file is generated from .env
files in the conf/
directory.
All of the wrapped compose commands declare .env
and docker-compose.yml
as dependencies, so they will be re-generated if any of the .yml
files or .env
files change.
Note, that a service may be defined across multiple files, and the order of the files is important. The docker-compose.yml
file is generated by concatenating the files in the order they are listed in the COMPOSE_FILES
variable. The auto-generated file-list sorts .profile.yml
files to the end of the list, so that they have precedence over the plain .yml
files.
You can explicitly select the include files by using the COMPOSE_FILES
and COMPOSE_ENV_FILES
variables when you run make
. You might add such overrides to new targets in the Makefile, so you can define custom stacks for different environments or purposes.
mycustom:
# use -B to force a rebuild of docker-compose.yml
$(MAKE) -B COMPOSE_FILES="docker/mycustom.yml docker/mycustom.profile.yml" up
If you don't want your additional configs auto-included in default runs, just use a different naming convention.
To compile the compose file snipits from the docker/
sub-directory, we set the --project-directory
to the repo root. Beware then defining any paths in the compose snipits, that the current working directory is the repo root. Whereas, remember that the Dockerfile paths are relative to the build-context you set.
It is recommended that you create a generalized .yml
file for each image you build. Then put additional configurations into a .profile.yml
file. The .profile.yml
files will always have precedence over service descriptions in plain .yml
files. You might group related services into a profile, or add project-specific or environment-specific configurations, such as volumes, networks, or environment variables.
For instance, we define image-building configurations in docker/spark-build-image.yml
and define our integration of the image into the various services in docker/spark.profile.yml
. You can see how we use one image in multiple services with different envrionment variables and volumes.