Inconsistent support of type hints with mypy
miguel-mi-silva opened this issue · comments
Describe the bug
I have read through the documentation and a few issues here (example), but could not find if this is a bug or not yet supported. From the discussion on the above link, it seems that it should be supported.
The bugs are:
- type hints to function arguments or return values are not correctly handled by mypy.
- schema argument of
check_input
orcheck_output
does not recognizeSchemaModel
as valid type.
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandera.
- (optional) I have confirmed this bug exists on the master branch of pandera.
Code Sample, a copy-pastable example
from pandera import SchemaModel, check_input, check_output
from pandas import DataFrame
from pandera.typing import DataFrame as TypeDataFrame
class InputSchema(SchemaModel):
a: int
b: bool
class OutputSchema(SchemaModel):
a: int
b: bool
c: str
df = DataFrame({"a": [1], "b": [True]})
@check_input(schema=InputSchema)
@check_output(schema=OutputSchema)
def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]:
return df.assign(c="c")
Expected behavior
Running mypy on this code should not raise any errors (this is how I understand it, based on this reply to a previous issue).
However, the following errors are raised:
Argument "schema" to "check_input" has incompatible type "type[InputSchema]"; expected "DataFrameSchema | SeriesSchema" [arg-type]
Argument "schema" to "check_output" has incompatible type "type[OutputSchema]"; expected "DataFrameSchema | SeriesSchema" [arg-type]
Incompatible return value type (got "pandas.core.frame.DataFrame", expected "pandera.typing.pandas.DataFrame[OutputSchema]") [return-value]
Argument 1 to "process" has incompatible type "pandas.core.frame.DataFrame"; expected "pandera.typing.pandas.DataFrame[InputSchema]" [arg-type]
Additional context
Pandera version: 0.17.2
Mypy version: 1.6.0
Pandas version: 2.1.1
Hi @miguel-mi-silva if you're using DataFrameModel
s you should use pa.check_types
@check_types
def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]:
return df.assign(c="c")
check_input
and check_output
are for DataFrameSchema
s
Hi @miguel-mi-silva if you're using
DataFrameModel
s you should usepa.check_types
@check_types def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]: return df.assign(c="c")
check_input
andcheck_output
are forDataFrameSchema
s
Thanks for that. However, even with those changes, the error on the return type Incompatible return value type (got "pandas.core.frame.DataFrame", expected "pandera.typing.pandas.DataFrame[OutputSchema]") [return-value]
still exists. Is that a known issue or am I doing something wrong?
Here's the new code to facilitate:
from pandera import SchemaModel, check_types
from pandas import DataFrame
from pandera.typing import DataFrame as TypeDataFrame
class InputSchema(SchemaModel):
a: int
b: bool
class OutputSchema(SchemaModel):
a: int
b: bool
c: str
df = DataFrame({"a": [1], "b": [True]})
@check_types
def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]:
return df.assign(c="c")
What does your mypy
config look like?
See the docs for a discussion on mypy support. If you care about type lints there's some guidance there on how to make you mypy linting pass, e.g.
@check_types
def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]:
return df.assign(c="c").pipe(TypeDataFrame[OutputSchema])
or
from typing import cast
@check_types
def process(df: TypeDataFrame[InputSchema]) -> TypeDataFrame[OutputSchema]:
return cast(TypeDataFrame[OutputSchema], df.assign(c="c"))