Pandas backend: `check_nullable` is inefficient when a column schema has `nullable=True`
smarie opened this issue · comments
Sylvain Marié commented
check_nullable
in the pandas backend seems to compute the null values mask by calling isna()
even when not needed.
isna = check_obj.isna()
passed = schema.nullable or not isna.any()
We see that even if schema.nullable=True
, isna
is already computed. This can lead to a performance issue in dataframe with millions of rows.
Niels Bantilan commented
@smarie feel free to make a PR!
Sylvain Marié commented
Done #1538