Can you use Pydantic Field Aliasing with Pandera / PydanticModel schema definitions?
mcmasty opened this issue · comments
How to use Pydantic Field Alias with pandera
I am processing a CSV and I am trying to use Pandera to validate the data. The names in the CSV header row are not what I want the names in my model to be. I haven't figured out how to achieve field aliasing. Any suggestions?
Here is a snippet that reproduces the error I am getting.
import io
import pydantic
import pandas as pd
import pandera as pa
from pandera.engines.pandas_engine import PydanticModel
class AliasedRecord(pydantic.BaseModel):
name: str = pydantic.Field(alias="Name")
amt_in_local: float = pydantic.Field(alias="Amount in local currency")
class AliasDFSchema(pa.DataFrameModel):
"""Pandera schema using the pydantic model."""
class Config:
"""Config with dataframe-level data type."""
dtype = PydanticModel(AliasedRecord)
strict=True
coerce = True # this is required, otherwise a SchemaInitError is raised
# Direct Pydantic Model Validation
ar_m = AliasedRecord.model_validate({"Name":"Foo", "Amount in local currency": 1.32})
print(f"My Model is: {ar_m}")
# Now try validating a DataFrame
# Generate data similar to the source CSV
f = io.StringIO('Name,Amount in local currency\nfoo,1.32\nbar,3.34')
df1 = pd.read_csv(f)
validated_df = AliasDFSchema(df1)
Output
The successful Model:
My Model is: name='Foo' amt_in_local=1.32
The DataFrame / Pandera error ...
... bunch of stuff removed for brevity
SchemaError: column 'Name' not in DataFrameSchema {}
df1 is correctly created
Looks like PydanticModel doesn't interact well with strict=True
. This works:
class AliasDFSchema(pa.DataFrameModel):
"""Pandera schema using the pydantic model."""
class Config:
"""Config with dataframe-level data type."""
dtype = PydanticModel(AliasedRecord)
coerce = True # this is required, otherwise a SchemaInitError is raised
One potential fix for this would be to update the DataFrameSchema.__init__
method to special case the case where dtype = PydanticModel
. Basically, just pull out the column names/aliases from the pydantic model and create a column dictionary.
Turning this into a bug
issue in case anyone wants to open a PR!
I would like to have a crack at this please