multiprocessio / dsq

Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parquet is missing rows

mariussoutier opened this issue · comments

I have a Parquet file that should have 30,000+ rows, but SELECT COUNT(*) FROM {} returns 7000. Another one with more than 40,000 rows returns exactly 8000. Converting the same data to JSON works fine.

Thanks for the report! Can you share a parquet file that has this issue?

Unfortunately no, it's business-related. But nothing special, 30 or so columns with mostly UTF8 and two INT32 types.

One of column does contain very large values, but other than that, normal stuff.

Sounds like @Sajuno reproduced the issue and is thinking of a fix. Thanks @Sajuno!

Closed in #82 now available in dsq 0.21.0