open-contracting / notebooks-oc4ids

A collection of notebooks used to store and query OC4IDS data in a database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation in Redmine

jpmckinney opened this issue · comments

I am pasting here the documentation from CRM-6335, in case there is no other copy (passwords omitted).


We now have a central database for OC4IDS data and a set of queries to assess data quality.

To load data into the database, use the OC4IDS Database - Import Data notebook. The database follows similar conventions to Kingfisher Process and Views, but with fewer features. It has the following tables:

  • collection - equivalent to the collection table in Kingfisher Process with one row per collection.
  • projects - equivalent to the join of the release and data tables in Kingfisher Process with one row per project.
  • collection_check - similar to release_check in Kingfisher Process, but with one row per collection.
  • field_counts - equivalent to field_counts in Kingfisher Views
  • oc4ids_schema - a flattened version of the OC4IDS schema, for use in coverage queries

To analyze data and prepare feedback, use the OC4IDS Data Feedback Notebook. The queries in the notebook cover:

  • Scope
  • Structure
  • Format
  • Conformance
  • Coherence
  • Basic usability checks.

To query the database, use the following details for the read-only user:

Host: oc4ids-database-2.cuujgua4wses.us-east-1.rds.amazonaws.com
Port: 5432 (default)
User: readonly

Things to note:

  • The Data Feedback Notebook is a work in progress, some queries require testing and documentation is needed for most queries.
  • Don't rely on this database for anything mission-critical, it's a hack rather than a proper piece of software, and we don't have any backups in place.
  • The postgres used in the import data notebook has full permissions, use the following connection details if you need those:

Host: oc4ids-database-2.cuujgua4wses.us-east-1.rds.amazonaws.com
Port: 5432 (default)
User: postgres

In the event we need to build a fresh copy of the database on a new server, I've documented the steps in the OC4IDS Database - Setup notebook.