multiprocessio / dsq

Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

deprecated INT96 timestamp not supported for older Parquet files

andyEllixson opened this issue · comments

Describe the bug and expected behavior
I have a large number of customer Paquet files which have the timestamp in INT96 format. Timestamp is being interpreted as STR rather than a timestamp type. Timestamp should display as ISO8601 format
Reproduction steps
file/metadata version:
############ file meta data ############
created_by: parquet-mr version 1.10.1-databricks6 (build bd2ebc87e42b3936ac673e1556fa10fb8358307a)
num_columns: 3
num_rows: 376504
num_row_groups: 1
format_version: 1.0
serialized_size: 863

############ Columns ############
event_time
key
value

############ Column(event_time) ############
name: event_time
path: event_time
max_definition_level: 1
max_repetition_level: 0
physical_type: INT96
logical_type: None
converted_type (legacy): NONE

command:
dsq c:/tmp/easy-path.parquet "select event_time from {} limit 4"

Versions
windows 10: cmd.exe

  • dsq version: [e.g. 0.20.1]

Additional context, screenshots
c:\tmp>dsq --pretty c:/tmp/easy-path.parquet "select event_time from {} limit 2"
+------------+
| Event_time |
+------------+
♣�% |
| ��►►)6 �% |
+------------+
(2 rows
easy-path.parquet.txt
)

note: change extension to ".txt" strictly for attaching to this issue. Remove .txt prior to testing with it.