CODAIT / stocator

Stocator is high performing connector to object storage for Apache Spark, achieving performance by leveraging object storage semantics.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Catalog table doesn't work with HIVE-style partitioning if there is no empty object ending with '/'

thoangtrvn opened this issue · comments

Upon creating a HIVE-style partitioning with multiple partitions on IBM COS

cos://<endpoint>/bucket/Prefix/Col1=a/Col2=b/Col3=d/
cos://<endpoint>/bucket/Prefix/Col1=a/Col2=b/Col3=d/file1.parquet
cos://<endpoint>/bucket/Prefix/Col1=a/Col2=a/Col3=d1/file1.parquet

Using Cloud SQL Query, I can create a catalog table using all 3 columns as the partitions (Col1, Col2, Col3). Querying this table returns data.

However, when I try to create a catalog table using only 2 columns (Col1, Col2), I can create the table, but querying the table returns no data. After the discussion with Daniel Pittner from the Cloud SQL Query team, we suspect that this is a bug that requires having an empty path ending with '/' for the creation of the catalog table to be successful.

@thoangtrvn can you add where you had to add objects to validate it's working when the objects are present?

The data looks like this

image

It's ok to create the catalog table using all partitions columns - up to the HOUR field.

It doesn't work if the catalog table is created using only the first few columns.