unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

Home Page:https://www.union.ai/pandera

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inconsistent support of type hints with mypy

miguel-mi-silva opened this issue · comments

Describe the bug
I have read through the documentation and a few issues here (example), but could not find if this is a bug or not yet supported. From the discussion on the above link, it seems that it should be supported.

The bugs are:

  • type hints to function arguments or return values are not correctly handled by mypy.
  • schema argument of check_input or check_output does not recognize SchemaModel as valid type.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the master branch of pandera.

Code Sample, a copy-pastable example

from pandera import SchemaModel, check_input, check_output
from pandas import DataFrame
from pandera.typing import DataFrame as TypeDataFrame


class InputSchema(SchemaModel):
    a: int
    b: bool

class OutputSchema(SchemaModel):
    a: int
    b: bool
    c: str

df = DataFrame({"a": [1], "b": [True]})

@check_input(schema=InputSchema)
@check_output(schema=OutputSchema)
def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]:
    return df.assign(c="c")

Expected behavior

Running mypy on this code should not raise any errors (this is how I understand it, based on this reply to a previous issue).
However, the following errors are raised:

  • Argument "schema" to "check_input" has incompatible type "type[InputSchema]"; expected "DataFrameSchema | SeriesSchema" [arg-type]
  • Argument "schema" to "check_output" has incompatible type "type[OutputSchema]"; expected "DataFrameSchema | SeriesSchema" [arg-type]
  • Incompatible return value type (got "pandas.core.frame.DataFrame", expected "pandera.typing.pandas.DataFrame[OutputSchema]") [return-value]
  • Argument 1 to "process" has incompatible type "pandas.core.frame.DataFrame"; expected "pandera.typing.pandas.DataFrame[InputSchema]" [arg-type]

Additional context

Pandera version: 0.17.2
Mypy version: 1.6.0
Pandas version: 2.1.1

Hi @miguel-mi-silva if you're using DataFrameModels you should use pa.check_types

@check_types
def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]:
    return df.assign(c="c")

check_input and check_output are for DataFrameSchemas

Hi @miguel-mi-silva if you're using DataFrameModels you should use pa.check_types

@check_types
def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]:
    return df.assign(c="c")

check_input and check_output are for DataFrameSchemas

Thanks for that. However, even with those changes, the error on the return type Incompatible return value type (got "pandas.core.frame.DataFrame", expected "pandera.typing.pandas.DataFrame[OutputSchema]") [return-value] still exists. Is that a known issue or am I doing something wrong?

Here's the new code to facilitate:

from pandera import SchemaModel, check_types
from pandas import DataFrame
from pandera.typing import DataFrame as TypeDataFrame


class InputSchema(SchemaModel):
    a: int
    b: bool

class OutputSchema(SchemaModel):
    a: int
    b: bool
    c: str

df = DataFrame({"a": [1], "b": [True]})

@check_types
def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]:
    return df.assign(c="c")

What does your mypy config look like?

See the docs for a discussion on mypy support. If you care about type lints there's some guidance there on how to make you mypy linting pass, e.g.

@check_types
def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]:
    return df.assign(c="c").pipe(TypeDataFrame[OutputSchema])

or

from typing import cast

@check_types
def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]:
    return cast(TypeDataFrame[OutputSchema], df.assign(c="c"))