minkymorgan / data-quality-profiler-and-rules-engine

Data Quality Profiler and Rules Engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Quality Profiler and Rules Engine

Provides the following:

  • Data Profilers for large volume data profiling in Spark
  • Assertion rule definitions and checking
  • Reference data loading and joining
  • Excel and CSV reference data parsing
  • JSON output enriched with data quality markers/profilers
  • Metrics and summary dataframe output
  • Dimensional tagging of profiler outputs (additional identifiers)
  • JSON flattener
  • JSON and CSV loader, extensible to other formats
  • Custom key pre-processor and custom parquet row reader functionality
  • Comprehensive built-in assertion rules modules, extensible
  • Built-in set of field-level profile masks
  • Compound assertion rule definition (i.e. a set of sub-rules must all pass)
  • Human-readable Data Quality and Assertion Rule Compliance report output

Repository Layout

Licence

Licensed under the MIT License. See LICENSE

About

Data Quality Profiler and Rules Engine

License:MIT License


Languages

Language:Scala 97.9%Language:Mustache 2.1%