cube2222 / octosql

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Decoding Error when Reading Parquet File - "RLE: Decoded Run-Length Block"

youen opened this issue · comments

Description:
I'm facing a problem while attempting to query a Parquet file using OctoSQL. The Parquet file is sourced from the "TLC Trip Record Data" dataset (available here), containing NYC taxi trip records.

The error message I'm encountering is as follows:

Error: couldn't run query: couldn't run source: couldn't read row: decoding page 0 of column "VendorID": decoding definition levels of data page v1: RLE: decoded run-length block cannot have more than 1048576 values

It seems to be a decoding issue while trying to read the "VendorID" column from the Parquet file. It's worth noting that I have successfully read this file using other tools like dsq, and it's also accessible via online Parquet viewers like parquetreader.com and tablab.app.

Steps to Reproduce:

  1. Install OctoSQL version 0.12.2.
  2. Attempt to query the Parquet file using OctoSQL.

Expected Behavior:
OctoSQL should be able to successfully query the Parquet file without encountering decoding errors.

Actual Behavior:
OctoSQL encounters a decoding error related to the "VendorID" column, as mentioned above.

Additional Information: