[Bug]: Doris 查询 HDFS 上Decimal 类型的数据异常
wgzhao opened this issue · comments
wgzhao commented
What happened?
- 安装最新的 Doris,然后创建连接
hive
的 catalog。 - 通过 Addax 最新版本,往 HDFS 上写入包含 Decimal 类型的 ORC 文件
- 通过 Doris 去查询该表,Decimal 类型显示异常如下:
mysql> switch hive;
Query OK, 0 rows affected (0.01 sec)
mysql> select * from `default`.addax_test ;
+------+---------------+
| id | fee |
+------+---------------+
| 10 | -80444563.314 |
| 10 | -80444563.314 |
| 10 | -80444563.314 |
| 10 | -80444563.314 |
| 10 | -80444563.314 |
| 10 | -80444563.314 |
| 10 | -80444563.314 |
| 10 | -80444563.314 |
| 10 | -80444563.314 |
| 10 | -80444563.314 |
| 10 | 123.120 |
| 10 | 123.120 |
| 10 | 123.120 |
| 10 | 123.120 |
| 10 | 123.120 |
| 10 | 123.120 |
| 10 | 123.120 |
| 10 | 123.120 |
| 10 | 123.120 |
| 10 | 123.120 |
+------+---------------+
20 rows in set (0.12 sec)
表中前 10 条记录是通过 Addax 写入的数据,在 Hive 命令行以及 Trino 查询都是正常的,但在 Doris 里查询异常。
后 10 条记录是在 hive 命令行通过 insert into addax_test select * from addax_test
写入,这 10 条记录查询是正常的。
Version
4.1.3 (Default)
OS Type
Linux (Default)
Java JDK Version
Oracle JDK 1.8.0
Relevant log output
No response
wgzhao commented
正常 ORC 文件的元数据信息如下:
File Version: 0.12 with ORC_135
Rows: 10
Compression: ZLIB
Compression size: 262144
Type: struct<id:int,fee:decimal(20,3)>
Stripe Statistics:
Stripe 1:
Column 0: count: 10 hasNull: false
Column 1: count: 10 hasNull: false bytesOnDisk: 5 min: 10 max: 10 sum: 100
Column 2: count: 10 hasNull: false bytesOnDisk: 16 min: 123.12 max: 123.12 sum: 1231.2
File Statistics:
Column 0: count: 10 hasNull: false
Column 1: count: 10 hasNull: false bytesOnDisk: 5 min: 10 max: 10 sum: 100
Column 2: count: 10 hasNull: false bytesOnDisk: 16 min: 123.12 max: 123.12 sum: 1231.2
Stripes:
Stripe: offset: 3 data: 21 rows: 10 tail: 44 index: 71
Stream: column 0 section ROW_INDEX start: 3 length 11
Stream: column 1 section ROW_INDEX start: 14 length 25
Stream: column 2 section ROW_INDEX start: 39 length 35
Stream: column 1 section DATA start: 74 length 5
Stream: column 2 section DATA start: 79 length 11
Stream: column 2 section SECONDARY start: 90 length 5
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
Encoding column 2: DIRECT_V2
File length: 324 bytes
Padding length: 0 bytes
Padding ratio: 0%
异常 ORC 文件的元数据信息如下:
File Version: 0.12 with FUTURE
Rows: 10
Compression: LZ4
Compression size: 262144
Type: struct<id:int,fee:decimal(38,18)>
Stripe Statistics:
Stripe 1:
Column 0: count: 10 hasNull: false
Column 1: count: 10 hasNull: false bytesOnDisk: 5 min: 10 max: 10 sum: 100
Column 2: count: 10 hasNull: false bytesOnDisk: 21 min: 123.12 max: 123.12 sum: 1231.2
File Statistics:
Column 0: count: 10 hasNull: false
Column 1: count: 10 hasNull: false bytesOnDisk: 5 min: 10 max: 10 sum: 100
Column 2: count: 10 hasNull: false bytesOnDisk: 21 min: 123.12 max: 123.12 sum: 1231.2
Stripes:
Stripe: offset: 3 data: 26 rows: 10 tail: 59 index: 78
Stream: column 0 section ROW_INDEX start: 3 length 11
Stream: column 1 section ROW_INDEX start: 14 length 25
Stream: column 2 section ROW_INDEX start: 39 length 42
Stream: column 1 section DATA start: 81 length 5
Stream: column 2 section DATA start: 86 length 16
Stream: column 2 section SECONDARY start: 102 length 5
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
Encoding column 2: DIRECT_V2
File length: 371 bytes
Padding length: 0 bytes
Padding ratio: 0%
wgzhao commented
进一步进行测试,可能是因为精度不一致导致的。Doris 要求字段定义的精度和 ORC 文件中字段定义的精度保持一致才能正确读取该字段,否则异常。