unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

Home Page:https://www.union.ai/pandera

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'drop_invalid_rows' always False with from_json() and to_json()

Nico-VC opened this issue · comments

Describe the bug
'drop_invalid_rows: false' argument at a DataFrameSchema level gets set to False using .from_json().
Creating json from .py with 'drop_invalid_rows=True' does not work either.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the master branch of pandera.

Using .to_json() with this inferred schema ignores the True 'drop_invalid_rows' argument

from pandera import DataFrameSchema, Column, Check, Index, MultiIndex
import numpy as np
import pandas as pd

schema = DataFrameSchema(
    columns={
        "Model": Column(
            dtype=np.int32,
            checks=None,
            nullable=False,
            unique=False,
            coerce=True,
            required=True,
            regex=True,
            description=None,
            title=None,
        ),
        "ID": Column(
            dtype=np.int32,
            checks=None,
            nullable=False,
            unique=False,
            coerce=False,
            required=True,
            regex=False,
            description=None,
            title=None,
        ),
    },
    checks=None,
    index=Index(
        dtype="int64",
        checks=[],
        nullable=False,
        coerce=False,
        name=None,
        description=None,
        title=None,
    ),
    dtype=None,
    coerce=True,
    strict=True,
    name=None,
    ordered=False,
    unique=None,
    report_duplicates="all",
    unique_column_names=False,
    add_missing_columns=False,
    title=None,
    description=None,
    drop_invalid_rows=True
)

schema.to_json()

This same behavior is observed if set a Column level.
I end up having to manually set this value to True in the schema class.