unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

Home Page:https://www.union.ai/pandera

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TypeError("Subscripted generics cannot be used with class and instance checks")

cjthorley opened this issue · comments

Question about pandera

I am running the code below on two environments:

ENV1 = databricks spark cluster driver node (same pandera, pandas versions)
ENV2 = local standard VENV(same pandera, pandas versions)

Both environments have the same pandas and pandera versions.

The dataframe validates on local (ie ENV2) with no schema errors but I get "TypeError("Subscripted generics cannot be used with class and instance checks")" on the databricks spark cluster driver node (ie ENV2).

image

Why am I getting this failure and what does it mean? It is specific to the pa.Check.isin.

import pandas as pd
import pandera as pa
from IPython.display import HTML

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
data_df = pd.DataFrame(data)

list_age = [20, 21, 19, 18]

# define pandera custom schema
validation_schema = pa.DataFrameSchema(
    columns = {
        'Name': pa.Column(str),
        'Age': pa.Column(int, pa.Check.isin(list_age), nullable=False)
    }
)

try:
    validated_df = validation_schema(data_df, lazy=True)
    print('VALID')
except pa.errors.SchemaErrors as err:
    err_sum_df = err.failure_cases[['column', 'check', 'failure_case']].value_counts().reset_index(name='no_rows_with_errors')
    err_sum_df = err_sum_df.rename(columns={'column': 'affected_column', 'check': 'error_failure'})
    err_sum_html = err_sum_df.to_html()
    display(HTML(err_sum_html))

We are also getting this issue, please see below:

class ScanOutput(pa.DataFrameModel):
    rating: Series[int] = pa.Field(ge=0, le=4)

def validate_df_against_schema(df: pd.DataFrame, schema: pa.DataFrameModel):
    null_schema = pa.DataFrameSchema({
    "rating": pa.Column(int, nullable=False)
    })

    null_schema.validate(df) - this does not fail
    validation = schema.validate(df) - this causes the TypeError("Subscripted generics cannot be used with class and instance checks")"
    
validate_df_against_schema(df, schema=ScanOutput)    

I encountered a similar problem, which appeared to stem from the multimethod package.

Resolving the issue was achieved by downgrading the package to version 1.11.

I also encountered the same issue. I am using python 3.9.5 in databricks runtime 12.2 (pandera 0.18.0)
It was working fine till yesterday.
When I run it using runtime 10.4 (python version 3.8.1), it is working fine today also.

@PierreC1024 , I have multimethod version 1.11.1 . Did you downgrade it to 1.11.0?

I can confirm rolling back to multimethod==1.11 before the feb 19th update has stopped the "TypeError("Subscripted generics cannot be used with class and instance checks")" schema error.

I am so pleased because I love the pandera library.

Thanks, everyone, this solved my issue on AWS Glue, just versioned
multimethod==1.11 using Terraform when deploying (https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html - if anyone gets stuck doing it)

I was able to fix the issue in databricks by rolling back the multimodel library to 1.11
Thank you very much everyone!!!