wrong result returned when charVarcharAsString is true
loudongfeng opened this issue · comments
When spark.sql.legacy.charVarcharAsString is true, SparkSQL will treat char as strings and no padding is added for char literal.
Then the result is wrong.
create table my_char(name char(20)) using orc;
insert into my_char values ('Nemon');
select count(*) from my_char where name = 'Nemon';
+--------+
|count(1)|
+--------+
|1 |
+--------+
set spark.sql.legacy.charVarcharAsString=true;
select count(*) from my_char where name = 'Nemon';
+--------+
|count(1)|
+--------+
|0 |
+--------+
The plan
+- ^(10) HashAggregateTransformer(keys=[], functions=[count(1)], output=[count(1)#139])
+- CoalesceBatches
+- ColumnarExchangeExec SinglePartition, ENSURE_REQUIREMENTS, false, [plan_id=320], [id=#320], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- ^(9) HashAggregateTransformer(keys=[], functions=[partial_count(1)], output=[count#142L])
+- ^(9) ProjectExecTransformer
+- ^(9) FilterExecTransformer (isnotnull(name#7) AND (name#7 = Nemon))
+- NativeFileScan orc tpcds_parquet.my_char[name#7] Batched: true, DataFilters: [isnotnull(name#7), (name#7 = Nemon)], Format: ORC
Notice that DataFilters in NativeFileScan is (name#7 = Nemon), no padding.