Cannot execute a `sum` on a `DataFrame` created with `readParquet`
maxime-petitjean opened this issue · comments
If I try to execute this code:
const { DataFrame } = require('@rapidsai/cudf');
const frame = DataFrame.readParquet({ sourceType: 'files', sources: ['data.parquet'] });
const result = frame.sum(); // throw!
I have the error sum operation requires dataframe to be entirely of dtype FloatingPoint OR Integral.
but parquet file contains only Float64
columns.
If I explicitly cast columns to Float64, it's working!
const { DataFrame, Float64 } = require('@rapidsai/cudf');
const frame = DataFrame.readParquet({ sourceType: 'files', sources: ['data.parquet'] });
const casted = frame.cast({ col1: new Float64(), col2: new Float64() });
const result = casted.sum(); // OK
If I log frame types I get:
- before cast:
{ col1: { typeId: 3, precision: 2 }, col2: { typeId: 3, precision: 2 } }
- after cast:
{ col1: Float64 [Float] { precision: 2 }, col2: Float64 [Float] { precision: 2 } }
Instance type of column type seems to be lost in readParquet function (type serialisation?).
@maxime-petitjean thanks for the bug report! That sounds like we're not fixing the types coming from C++ after loading the parquet file. I'll make a PR real quick with a fix.