unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

Home Page:https://www.union.ai/pandera

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to load schema from pyspark struct or avro format from schema registry ?

pthalasta opened this issue · comments

Question about pandera

How do i create the DataFrameSchema using the avro schema? What are our options? If used, i see the DataFrameSchema object to have an empty column field. Can this be added as a feature that can help pull the schema from the registries that are most widely used?

Hi @pthalasta looking at the avro schema docs it looks like we'll need to write a translation layer between avro -> pandera, similar to the frictionless integration: https://pandera.readthedocs.io/en/stable/frictionless.html?highlight=frictionless#frictionless-data-schema

Feel free to change the label of this issue to enhancement and re-write the title as a feature request.

Happy to review a PR contribution from you or someone in the community!