kws / rtof-data-model-tools

The specification parser and datamodel generator for the Refugee Transitions Outcomes Fund (RTOF) Datamodel

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Model Tools for the Refugee Transitions Outcomes Fund (RTOF)

This is the source repository for the RTOF data model tools for generating different outputs from the core specification.

There are a number of different modules involved with parsing and validating the specification, compiling the documentation website and associated outputs, creating sample data and file validators.

The repositories are:

  1. Data Model
  2. Data Model Tools (this repo)

The [Data Model][rtof-data-model] contains the actual data specification that is used by the tools to validate and transform any received data.

The tools are mostly written in Python, and the website is powered by Jekyll and GitHub Pages.

Python Documentation Generators

The documentation generator uses Poetry for dependency management. Make sure you have poetry installed, then install the project dependencies:

poetry install

Once installed, you can either run the generator from the command line by typing:

poetry run python main.py

or you can launch VS Code with:

poetry shell
code .

Unit Tests

Unit tests can be found in /test and can be run from the project root with

poetry run python -m unittest discover

Individual tests can also be run by specifying the module:

 poetry run python -m unittest tests.test_stream_insert

Linting

To check your codestyle, run:

poetry run flake8

ERD output

The Entity Relationship Diagram (ERD) generator (erd.py) uses Graphviz to produce an ERD showing the relationships between the records.

Word output

The word output module uses docxtpl to provide a Word version of the specification, including the ERD.

Excel output

Whilst the Word documentation is intended for human consumption, the excel version is better for importing categories into a CMS. Simply using [openpyxl][https://openpyxl.readthedocs.io/en/stable/] to create tables for import into other data systems.

Jekyll output

Finally, the Jekyll output module generates a set of Jekyll pages to produce the website. The Jekyll configuration can be found in website.

Transfer formats

So far this documentation has discussed the "core" data model but not really discussed representation and information exchange. To make the transfer of data from providers to the central reporting function as streamlined as possible, provide a number of transfer formats that can be used for upload and reporting.

The most commonly used data interchange formats in this sector are Excel (xlsx), CSV, JSON and XML. Excel and CSV are tabular formats, whilst JSON and XML are hierarchical. We will therefore focus on these two abstract representations.

The core datamodel is relational but does include many-to-one relationships that are not easily represented in pure tabular formats without either duplicating rows or columns.

So for tabular formats, records may be provided either one-by-one (separate file / worksheet for each type) or as a long format. All column names are unique, so if column names from multiple record types appear in the same file, then all records will be added / updated. Omitted records will retain their previous values if they already existed.

For many-to-one relationship, columns can contain a suffix, e.g. integration_outcome_type_1. All columns belonging to that record must have the same suffix, i.e.:

  • integration_outcome_type_1
  • integration_outcome_achieved_date_1

etc.

Alternatively, multiple rows can be provided, one for each record.

Releasing this project

Published releases of this project are hosted on PyPI.

To build and release a new version, make sure all your unit tests pass.

We use semantic versioning, so update the project version in pyproject.toml accordingly and commit, creating a PR. Once the release version is on GitHub, create a GitHub release naming the release with the current release name, e.g. 1.0 and the tag with the release name prefixed with a v, i.e. v1.0. Alpha and beta releases can be flagged by appending -alpha.<number> and -beta.<number>.

About

The specification parser and datamodel generator for the Refugee Transitions Outcomes Fund (RTOF) Datamodel


Languages

Language:Python 87.7%Language:HTML 9.1%Language:CSS 1.8%Language:Ruby 1.2%Language:JavaScript 0.2%