ISI Datamart
This git repository contains the ISI Datamart using REST endpoints.
The content of the Datamart is a set of datasets, which in turn consists of one or more variables. The Dataset Metadata Schema and the Variable Metadata schema are described here: Metadata Schema
The canonical data format used by the Datamart is the text delimited file (CSV). Details of the canonical data format and examples are here: Canonical Data Format
Using the default configuration the Datamart REST URL is http://localhost:14080/
. The details of the individual REST endpoints are described here: Datamart REST API
If for some reason your are running the development version of the Datamart, the URL is http://localhost:5000/
See examples in the Datamart Demo Jupyter notebook for sample usage: Datamart Data API Demo
Installation
Edit the docker/docker_config.py
file to change the Postgres user password.
Change to the docker
directory and build the docker container.
cd docker
docker-compose build
This will build the backend container. It may take a while the first time you do it, as there are a lot of Python packages that need to be installed. Every time you change the source you should build the container again. Subsequent building runs will be faster.
Running the System
From the docker
directory and run
docker-compose up -d
The docker compose yaml file, docker-compose.yml
, uses docker compose version 3.7.
On start up Postgres checks if the postgres
volume exists. If it does not exist, the volume is created using the contents of the dev-env/data/postgres/datamart.sql.gz
file.
The ISI Datamart REST endpoints is http://localhost:14080/
.
Once database is up and running, run this script (first time only) to create SQL views for variable search
python script/create_search_views.py
After adding more data to the database, please run,
python script/refresh_search_views.py
IMPORTANT: refreshing SQL views is vital to ensure country and admin level search working
Datasets
The Datamart comes with a few datasets pre-loaded. They include data from OECD, FSI, UAZ and WGI.
Managing the Datamart Database
Backing up the existing database
To backup the current Postgres database, run
docker exec -it datamart-postgres /bin/bash
# From inside the docker container
pg_dump --user postgres wikidata | gzip > /backup/datamart-backup.sql.gz
The backup file is place in the dev-env/data/postgres
directory.
Adding datasets directly to the database
To add additional datasets in the form of a TSV file, switch to the scripts directory and run
python import_tsv_postgres.py <filepath/to/tsv/file>
The script assumes the default Postgres username and password in config.py
.
Shutting down the dockers
To bring the docker down, from the docker
folder, run
docker-compose down
This command safely shut downs the dockers, saving the database. Next time dockers are brought up, the data will still be there and the load time is considerably faster than the first time.
Wiping existing database and updating with new content
To delete the existing database use the --volumes
option to bring down docker compose. This command destroys the postgres
volume.
docker-compose down --volumes
IMPORTANT: this command will wipe out the database. To simply shutdown the docker, run docker-compose down
instead
Replace the dev-env/data/postgres/datamart.sql.gz
file with the new .sql.gz
file.
And, restart the system.
docker-compose up