minvws / nl-covid19-data-backend-processing

Calculations behind the corona dashboard, which provides information on the outbreak and prevalence of COVID-19 in The Netherlands.

Home Page:https://coronadashboard.rijksoverheid.nl/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NL CORONAVIRUS DASHBOARD


The Coronadashboard provides information on the breakout and prevalence of the Coronavirus in The Netherlands. It combines measured and modelled indicators from various sources to give a broad perspective on the subject. See documentation for more information.

This repository includes all the business logic for the data-processing from source data to .json files in the form of the archive protos.zip. This protos.zip archive is ingested by the front-end application to display all the data on the Coronadashboard. The code for the front-end application can be found here.

The Coronadashboard was taken offline on 2024-04-02 and now redirects to the RIVM website containing information about Corona.

Getting Started


The CoronaDashboard aggregates data from various sources into a single database, which it then provides to a frontend solution. This aggregation happens in three stages:

  • Staging, the raw data from the source
  • Intermediate, basic type conversions applied on the raw data
  • Destination, data has been processed into a form that is ready for consumption

All calculations and data-manipulations (usually constrained to one specific workflow) are described as stored procedures in Microsoft SQL (i.e., MSSQL) scripts. An orchestrator is in charge of running the correct stored procedures based on the configuration tables located in the DATATINO_ORCHESTRATOR_1 schema. For some exceptional flows, Azure Data Factory is used as an orchestrator instead (see the ./src/adf-flows/pipeline directory for the definitions of the pipelines to understand which stored procedures belong together).

The configuration tables located in the DATATINO_PROTO_1 schema can be used to map the views from the VWSDEST schema into separate json files that are ready for consumption by a frontend.

Relational Databases

A Microsoft SQL Server 2019 is used to ingest and digest the various sources. Container images - using Docker, minikube, Podman or Docker runtimes that support Docker CLI) can be used used when on-premise or cloud solution are not available during development (Docker Hub). It is also possible to run a local version of MSSQL for development purposes.

  • Deploy the SQL project in ./src/sln/CoronaDashboard.BusinessLogic/CoronaDashboard.BusinessLogic.Database to the created database.
  • Then fill the configuration tables via the PowerShell load-protos-script.ps1 and load-workflow-configs.ps1 scripts located in ./.devops/scripts directory.
    • The scripts make use of the configurations as stored in .json files in the ./.devops/configs directory.
  • All input source files as of the archiving of the application have been provided in the ./src/input-sources directory in a zip-file named ArchivingCDBsources.7z.

Troubleshooting


Name Description Why?
PowerShell 7+ Install the latest version of PowerShell. Older version of PowerShell do not support chained commands (i.e., &&) and Null coalescing operators (i.e., ??).

NOTE! Every time the code has been modified, publish the changes to your local database.

Extract Load Transform

To schedule, and automate, the ingestion and digestion of the indicators, a custom application (i.e Datatino Orchestrator) and PowerApp are used. However a different schedule and automation solution (or manually) can be used, for example Azure Data Factory.

After the indicators are calculated, Datatino Protocols (i.e., Proto) are used to package the results in predefined and desired (human readable) JSON documents.

Powershell

# Create protos (i.e. JSON formatted documents)
$body = @{
    protoPath = "datatino/test"
    zipPath = "datatino/testzip"
    zipFileName = "protos.zip"
    refreshProtos = $true
};

$parameters = @{
    Method = "POST"
    Uri =  "https://<azure-function>.azurewebsites.net/api/a_proto_zipprotos"
    Body = ($body | ConvertTo-Json) 
    ContentType = "application/json"
};

Invoke-RestMethod @parameters;

CI/CD PIPELINES

Azure DevOps CI/CD Pipelines are used to build executable Transact-SQL (i.e. T-SQL) scripts and release these scripts to their respective environment (i.e. Development, Acceptance and Production).

Troubleshooting


When running into CI/CD issues (e.g. Ip Address restrictions), use the following command to start listening for Azure DevOps jobs with self-hosted agents - on a machine with the right permissions:

Powershell

# Set variables
$TAG="local-agent"
$DEVOPS_SERVER_INSTANCE_URL="https://dev.azure.com/VWSCoronaDashboard"
$DEVOPS_AGENT_TOKEN="<Personal Access Token>"

# Create local image
$CONTEXT_DIR="./.agents"
docker build -t $TAG -f "$CONTEXT_DIR/Dockerfile" $CONTEXT_DIR

# Run local image as container with the required PAT and URL.
docker run `
    -e DEVOPS_SERVER_INSTANCE_URL=$DEVOPS_SERVER_INSTANCE_URL `
    -e DEVOPS_AGENT_TOKEN=$DEVOPS_AGENT_TOKEN `
    --restart unless-stopped `
    -d `
    $TAG

See Self-Hosted Agent Container and Micosoft Docs for more information.

DEVELOPMENT PROCESS


The core team aims to define and calculate various related indicators which are ultimately presented on Corona Dashboard. Some indicators are calculated using a single data source, others require a combination of data sources. The calculation of indicators is logically split in separate workflows (see HOW TO IMPLEMENT WORKFLOWS).

Supplementary information regarding the dashboard can be found here.

About

Calculations behind the corona dashboard, which provides information on the outbreak and prevalence of COVID-19 in The Netherlands.

https://coronadashboard.rijksoverheid.nl/

License:Other


Languages

Language:TSQL 92.7%Language:PowerShell 6.7%Language:Shell 0.5%Language:Dockerfile 0.1%