SchweizerischeBundesbahnen / BCNFStar

Interactive Schema Management for Data Warehouses

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BCNFStar

BCNFStar is a tool for interactive database schema management. It helps you to

  • normalize or denormalize your schema (by splitting or joining tables)
  • create star schemas with a dedicated optimized mode
  • replace primary and foreign keys by automatically generated surrogate keys
  • find valid foreign key candidates (based on inclusion dependencies) present in the data
    • and investigate why others are invalid
    • same for functional dependencies, which are the basis for normalising (splitting) tables
  • integrate new tables into an existing schema (coming soon!)

It ensures that all transformations are valid for a given data instance and generates SQL downloads for transforming

For a more comprehensive overview, please consult our demo paper and video from the SIGMOD 2023 conference.

Example image of BCNFStar

Setup

You can either setup BCNFStar manually with nodejs or use our docker image. For most users, we recommend Docker as it is easier to setup.

Using Docker

The docker image does not include a database as BCNFStar is intended to run on your existing data. You can configure the database connection in docker-compose.yml. By default we use a standard postgres database configuration. The available options for DB_TYPE are postgres and mssql (for Microsoft SQL Server based databases)

- DB_HOST=host.docker.internal
- DB_PORT=5432
- DB_DATABASE=postgres
- DB_USER=postgres
- DB_TYPE=postgres
- DB_PASSWORD=

After that, you can use the following command to launch the app

docker-compose up

to create docker containers for redis and BCNFStar.

You can open BCNFStar on http://localhost/#/.

Manual

Java

This project uses metanome, which requires java. We recommend Java 8-11.

Node

This project requires nodejs. After installing node, you need to execute

npm install

from the project root folder, which will also install all dependencies in the server and frontend projects.

Database

BCNFStar currently works on Postgres and Microsoft SQL Server databases. You need to set an environment variable called DB_TYPE to either postgres, mssql (or sqledge). Regardless of the database type, a .pgpass-like is needed. Its path needs to be in an environment variable called DB_PASSFILE. Environment variables can be placed in a file called .env.local in the project root like this:

DB_TYPE="postgres"
DB_PASSFILE="~/.pgpass"

Redis

Since metanome jobs might take a lot of time and resources, we queue them. This requires running a Redis instance for storing the queue through server restarts, which can be obtained from many standard package managers on Unix or from a tarball. On Windows, you can use this download. If you host Redis on a different machine or change its config, you may pass REDIS_HOST and REDIS_PORT env variables.

Deploying

To build the production app, run

npm run build

from the project root, which will build both the server and the frontend. After that, you can start the server by invoking

npm run start

Configure FD-Ranking

To better assess the subject-specific correctness of functional dependencies, the functional dependencies can be evaluated using various ranking approaches. The defaultRankingWeights constant in the FdScore.ts file can be used to specify whether and to what extent a ranking approach is included in the overall ranking calculation. By default, only the keyValue ranking is used. The attributes of defaultRankingWeights may only have values between 0 and 1 and the sum of all attributes must be 1.

Troubleshooting

Something doesn't work? Always try to run npm install && npm run build first.

Documentation

There are two types of documentation for this project.

  1. A documentation close to the source code that is auto-generated for the frontend and the backend respectively.
  2. A documentation for all the features the tool implements. This can be found here.

About

Interactive Schema Management for Data Warehouses

License:MIT License


Languages

Language:TypeScript 87.0%Language:HTML 8.0%Language:CSS 2.8%Language:JavaScript 2.1%Language:Dockerfile 0.1%Language:Shell 0.1%