[Parquet] TimestampPrecision / TimestampUnit mismatch in read / write files, particular for unit tests
zuyu opened this issue · comments
Bug description
Lines 33 to 37 in 473902a
velox/velox/vector/arrow/Bridge.h
Lines 28 to 33 in 473902a
TimestampUnit::kSecond
does not have a match inTimestampPrecision
- default values mismatch:
TimestampPrecision::kMilliseconds
used in read vsTimestampUnit::kNano
used in write; most importantly, unit tests (i.e., timestamp int96) does not set the values.- read path:
velox/velox/dwio/common/Options.h
Line 426 in 473902a
velox/velox/connectors/hive/HiveConnectorUtil.cpp
Lines 624 to 625 in 473902a
velox/velox/connectors/hive/HiveConfig.cpp
Lines 264 to 267 in 473902a
- write path:
velox/velox/vector/arrow/Bridge.h
Line 38 in 473902a
velox/velox/dwio/parquet/writer/Writer.h
Line 107 in 473902a
velox/velox/dwio/parquet/writer/Writer.cpp
Lines 241 to 242 in 473902a
- read path:
Proposed Fixes
- Introduce a new
kNotSet
as the default value, and requires setting bothTimestampPrecision
andTimestampUnit
if reading / writing a timestamp column. Otherwise, an assertionVELOX_UNREACHABLE()
would trigger. - For timestamp-related unit tests, need to align the values for both
TimestampPrecision
andTimestampUnit
.
I would say just align them. Adding kNotSet
will make the thing unnecessarily complicated.