[BUG]: using `typing.Optional` on a column in `DataFrameModel` with `polars`
jjfantini opened this issue · comments
Describe the bug
There is an error in making a column optional in the DataFrameModel
to validate a PolarsDataType
object.
- [x ] I have checked that this issue has not already been reported.
- [x ] I have confirmed this bug exists on the latest version of pandera.
Code Sample, a copy-pastable example
from typing import Optional
import pandera.polars as pa
import polars as pl
class ModelWithChecks(pa.DataFrameModel):
a: int
b: str
c: Optional[float]
date: pl.Date
valid_lf = pl.LazyFrame(
{
"a": pl.Series([1, 2, 3], dtype=pl.Int64),
"b": ["d", "e", "f"],
# "c": [0.0, 0.6, 1],
"date": pl.date_range(
start=datetime.date(2021, 1, 1),
end=datetime.date(2021, 1, 3),
eager=True,
),
}
)
ModelWithChecks.validate(valid_lf, lazy=True).collect()
Bug I got:
---------------------------------------------------------------------------
ColumnNotFoundError Traceback (most recent call last)
Cell In[83], [line 27](vscode-notebook-cell:?execution_count=83&line=27)
[10](vscode-notebook-cell:?execution_count=83&line=10) date: pl.Date
[13](vscode-notebook-cell:?execution_count=83&line=13) valid_lf = pl.LazyFrame(
[14](vscode-notebook-cell:?execution_count=83&line=14) {
[15](vscode-notebook-cell:?execution_count=83&line=15) "a": pl.Series([1, 2, 3], dtype=pl.Int64),
(...)
[23](vscode-notebook-cell:?execution_count=83&line=23) }
[24](vscode-notebook-cell:?execution_count=83&line=24) )
---> [27](vscode-notebook-cell:?execution_count=83&line=27) ModelWithChecks.validate(valid_lf, lazy=True).collect()
File [c:\Users\jjfan\github\humblFINANCE-org\humbldata\menv\Lib\site-packages\pandera\api\dataframe\model.py:270](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:270), in DataFrameModel.validate(cls, check_obj, head, tail, sample, random_state, lazy, inplace)
[255](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:255) @classmethod
[256](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:256) @docstring_substitution(validate_doc=BaseSchema.validate.__doc__)
[257](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:257) def validate(
(...)
[265](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:265) inplace: bool = False,
[266](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:266) ) -> DataFrameBase[TDataFrameModel]:
[267](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:267) """%(validate_doc)s"""
[268](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:268) return cast(
[269](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:269) DataFrameBase[TDataFrameModel],
--> [270](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:270) cls.to_schema().validate(
[271](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:271) check_obj, head, tail, sample, random_state, lazy, inplace
[272](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:272) ),
...
ColumnNotFoundError: c
Expected behavior
Pass validation with no error
Desktop (please complete the following information):
Windows 11, Cursor IDE, Python 3.11.7
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
Good catch! PR ^^ should fix this
awesome! can't wait for the PR, I have been waiting for pandera.polars for so long, thanks for everything!! truly!