duckdb / duckdb_iceberg

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Table schema evolution support

harel-e opened this issue · comments

First of all, thank you for this useful extension.
I can use the extension to read iceberg tables just fine.
As soon as the schema changes, the extension throws an error

Use case:

Using Trino to manage Iceberg on AWS/Glue/S3, I issued the following:

create table test(a int);
insert into test values(1);
alter table test add column b int;
insert into test values(2,5);

select * from test;
a | b
---+------
1 | NULL
2 | 5

Using the last metadata file, I issued the following in DuckDB (after loading the aws and iceberg extensions)

select * from ICEBERG_SCAN('s3:///test-0fc6feb5c66b4915b39dd2d0511105b7/metadata/00005-5a37e1af-dbf3-48c7-b7c7-11309ecc6279.metadata.json');

Error: IO Error: Failed to read file "s3:///test-0fc6feb5c66b4915b39dd2d0511105b7/data/20231204_085221_00020_8b7ws-30eb76d3-2f3e-4a01-a030-976f7640d26a.parquet": schema mismatch in glob: column "b" was read from the original file "s3:///test-0fc6feb5c66b4915b39dd2d0511105b7/data/20231204_085314_00023_8b7ws-1e49b4ae-9d23-469b-8e00-6dd48da1d0a0.parquet", but could not be found in file "s3:///test-0fc6feb5c66b4915b39dd2d0511105b7/data/20231204_085221_00020_8b7ws-30eb76d3-2f3e-4a01-a030-976f7640d26a.parquet".
Candidate names: a
If you are trying to read files with different schemas, try setting union_by_name=True

As I mentioned above, reading the data before the schema change worked just fine.

Thank you,
Harel

Hey @harel-e this was actually just added in #30 This will be available next release of duckdb which is scheduled for end of january