unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

Home Page:https://www.union.ai/pandera

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG]: using `typing.Optional` on a column in `DataFrameModel` with `polars`

jjfantini opened this issue · comments

Describe the bug
There is an error in making a column optional in the DataFrameModel to validate a PolarsDataType object.

  • [x ] I have checked that this issue has not already been reported.
  • [x ] I have confirmed this bug exists on the latest version of pandera.

Code Sample, a copy-pastable example

from typing import Optional
import pandera.polars as pa
import polars as pl


class ModelWithChecks(pa.DataFrameModel):
    a: int
    b: str
    c: Optional[float]
    date: pl.Date


valid_lf = pl.LazyFrame(
    {
        "a": pl.Series([1, 2, 3], dtype=pl.Int64),
        "b": ["d", "e", "f"],
        # "c": [0.0, 0.6, 1],
        "date": pl.date_range(
            start=datetime.date(2021, 1, 1),
            end=datetime.date(2021, 1, 3),
            eager=True,
        ),
    }
)


ModelWithChecks.validate(valid_lf, lazy=True).collect()

Bug I got:

---------------------------------------------------------------------------
ColumnNotFoundError                       Traceback (most recent call last)
Cell In[83], [line 27](vscode-notebook-cell:?execution_count=83&line=27)
     [10](vscode-notebook-cell:?execution_count=83&line=10)     date: pl.Date
     [13](vscode-notebook-cell:?execution_count=83&line=13) valid_lf = pl.LazyFrame(
     [14](vscode-notebook-cell:?execution_count=83&line=14)     {
     [15](vscode-notebook-cell:?execution_count=83&line=15)         "a": pl.Series([1, 2, 3], dtype=pl.Int64),
   (...)
     [23](vscode-notebook-cell:?execution_count=83&line=23)     }
     [24](vscode-notebook-cell:?execution_count=83&line=24) )
---> [27](vscode-notebook-cell:?execution_count=83&line=27) ModelWithChecks.validate(valid_lf, lazy=True).collect()

File [c:\Users\jjfan\github\humblFINANCE-org\humbldata\menv\Lib\site-packages\pandera\api\dataframe\model.py:270](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:270), in DataFrameModel.validate(cls, check_obj, head, tail, sample, random_state, lazy, inplace)
    [255](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:255) @classmethod
    [256](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:256) @docstring_substitution(validate_doc=BaseSchema.validate.__doc__)
    [257](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:257) def validate(
   (...)
    [265](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:265)     inplace: bool = False,
    [266](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:266) ) -> DataFrameBase[TDataFrameModel]:
    [267](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:267)     """%(validate_doc)s"""
    [268](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:268)     return cast(
    [269](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:269)         DataFrameBase[TDataFrameModel],
--> [270](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:270)         cls.to_schema().validate(
    [271](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:271)             check_obj, head, tail, sample, random_state, lazy, inplace
    [272](file:///C:/Users/jjfan/github/humblFINANCE-org/humbldata/menv/Lib/site-packages/pandera/api/dataframe/model.py:272)         ),
...

ColumnNotFoundError: c

Expected behavior

Pass validation with no error

Desktop (please complete the following information):

Windows 11, Cursor IDE, Python 3.11.7

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Good catch! PR ^^ should fix this

awesome! can't wait for the PR, I have been waiting for pandera.polars for so long, thanks for everything!! truly!