open-contracting / notebooks-oc4ids

A collection of notebooks used to store and query OC4IDS data in a database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Document data model, tables and structure

duncandewhurst opened this issue · comments

Some rough notes:


The structure is similar to Kingfisher Process, but a bit simpler. The main table is the projects table, which has one project per row in jsonb format in the data column. You can use the collection table to identify the collection you want to work with and then filter the projects table using the collection_id column.

you can find the data review results for each collection in collection_check , the coverage for each collection in field_counts (same format as field_counts in Kingfisher Summarize) and the coverage for each project in project_fields (same format as the new field_list columns in Kingfisher Summarize).

There is also a tabular copy of the OC4IDS schema in oc4ids_schema

  • `collection` - equivalent to the collection table in Kingfisher Process with one row per collection.
    
  • `projects` - equivalent to the join of the release and data tables in Kingfisher Process with one row per project.
    
  • `collection_check` - similar to release_check in Kingfisher Process, but with one row per collection.
    
  • `field_counts` - equivalent to field_counts in Kingfisher Views
    
  • `oc4ids_schema` - a flattened version of the OC4IDS schema, for use in coverage queries