apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine

Home Page:https://datafusion.apache.org/ballista

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data quality framework

explicite opened this issue · comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

(This section helps Arrow developers understand the context and why for this feature, in addition to the what)

Describe the solution you'd like
When building DAG of transformations, I want to be able define tests which can prove data correctness. On the end of the DAG I should be able to o review data quality and provide context to end user if required.

Like in Deequ I can check if all id's are unique or in some column I can find data in correct format. Other approaches Apache Glue, dbt test or Great Expectation

Describe alternatives you've considered
Instead of building framework it's maybe possible to extend Great Expectation

Additional context
Add any other context or screenshots about the feature request here.