apache / iceberg-python

Apache PyIceberg

Home Page:https://py.iceberg.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Aws Glue error for append data

apersilva opened this issue Β· comments

Apache Iceberg version

0.6.0 (latest release)

Please describe the bug 🐞

A start use pyicerg with glue catalog and start error
titulo
The table in glue catalog have a comment column .
ItΒ΄s possible to ignore comment table for append data in table ?

Hello @apersilva, can you give us the error stack trace and a minimal code example that can reproduce this error?

def update_table(database_target, table_target,database_name, table_name, partition_by,size, process_date, custom_partion):

catalog =load_catalog('glue', **{
        'type': 'glue', 'verify' : False
    })

tabela = catalog.load_table(f"{database_target}.{table_target}")

metadata = {}
for doc in tabela.metadata.schemas[0].columns:
    metadata.update({doc.name: f"({doc.doc})"})

df = pa.Table.from_pylist(
[
    {"nome_tabela": table_name, 
     "nome_base_dados": database_name, 
     "particao": partition_by, 
     "numero_registro": size, 
     "process_date": process_date, 
     "particao_customizada":  custom_partion,
     "data_criacao": datetime.now().date() }
],
metadata=metadata      
)

    
tabela.append(df)

β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜Traceback (most recent call last):
File "c:\great_teste\update_table.py", line 45, in update_table
tabela.append(df)
File "C:\Users\9001329\AppData\Roaming\Python\Python310\site-packages\pyiceberg\table_init_.py", line 1057, in append
check_schema_compatible(self.schema(), other_schema=df.schema)
File "C:\Users\9001329\AppData\Roaming\Python\Python310\site-packages\pyiceberg\table_init
.py", line 175, in _check_schema_compatible
raise ValueError(f"Mismatch in fields:\n{console.export_text()}")
ValueError: Mismatch in fields:
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓┃ ┃ Table field ┃ Dataframe field ┃┑━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩│ ❌ β”‚ 1: nome_tabela: optional string (Nome data Tabela Processada) β”‚ 1: nome_tabela: optional string β”‚
β”‚ ❌ β”‚ 2: nome_base_dados: optional string (Nome do Banco de dados que pertence β”‚ 2: nome_base_dados: optional string β”‚
β”‚ β”‚ a tabela) β”‚ β”‚β”‚ ❌ β”‚ 3: particao: optional string (Nome da particao) β”‚ 3: particao: optional string β”‚
β”‚ ❌ β”‚ 4: numero_registro: optional long (Quantidade de registros) β”‚ 4: numero_registro: optional long β”‚
β”‚ ❌ β”‚ 5: process_date: optional string (parametro quando Γ© enviado e passo para β”‚ 5: process_date: optional string β”‚
β”‚ β”‚ a funcao de escrita para particao) β”‚ β”‚β”‚ ❌ β”‚ 6: particao_customizada: optional string (Indica que a partição Γ© β”‚ 6: particao_customizada: optional string β”‚
β”‚ β”‚ diferente do padrΓ£o) β”‚ β”‚β”‚ ❌ β”‚ 7: data_criacao: optional date (Data em que foi inserido o registro) β”‚ 7: data_criacao: optional date β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

@Fokko, can you help with clarifying the expected behavior? I believe we should compare the representations (repr) of the objects. Currently, the doc attribute is not included in the __repr__, so changing the comparison to be between repr objects might solve this problem. What do you think?

Sorry, I double-checked the Java implementation, and it's correct on the Python side.

@apersilva, for your case, I believe you need to do something like this:

from pyiceberg.io.pyarrow import schema_to_pyarrow

schema = schema_to_pyarrow(tabela.schema())

df = pa.Table.from_pylist(
    [
        {
            "nome_tabela": table_name,
            "nome_base_dados": database_name,
            "particao": partition_by,
            "numero_registro": size,
            "process_date": process_date,
            "particao_customizada": custom_partition,
            "data_criacao": datetime.now().date()
        }
    ],
    schema=schema
)

tabela.append(df)

In a future release, there will be a function in the Schema object to return the Arrow schema, so it would look like this: schema = tabela.schema().as_arrow()

ItΒ΄s work, thanks a lot.

@apersilva looks like your issue is resolved, can we close this issue?