confluentinc / kafka-connect-hdfs

Kafka Connect HDFS connector

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hive table does not match column names present in the parquet data

swathimocharla opened this issue · comments

Hi,
We creating a hdfs sink connector based with a partition.field.name. This partition field is present in the middle of the avro data. HDFS Connector creates hive table structure as with the "PARTITIONED BY" as mentioned in the partition.field.name.

The issue here is, the location of the partition column in the parquet data file does not match with the hive table and this is causing wrong data to be read when accessing the particular column.

This seems to be working fine in the older versions of HDFS connect (5.5.2), was there any change made recently? We are seeing this issue on 10.1.1