Aws Glue error for append data
apersilva opened this issue Β· comments
Hello @apersilva, can you give us the error stack trace and a minimal code example that can reproduce this error?
def update_table(database_target, table_target,database_name, table_name, partition_by,size, process_date, custom_partion):
catalog =load_catalog('glue', **{
'type': 'glue', 'verify' : False
})
tabela = catalog.load_table(f"{database_target}.{table_target}")
metadata = {}
for doc in tabela.metadata.schemas[0].columns:
metadata.update({doc.name: f"({doc.doc})"})
df = pa.Table.from_pylist(
[
{"nome_tabela": table_name,
"nome_base_dados": database_name,
"particao": partition_by,
"numero_registro": size,
"process_date": process_date,
"particao_customizada": custom_partion,
"data_criacao": datetime.now().date() }
],
metadata=metadata
)
tabela.append(df)
ββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββTraceback (most recent call last):
File "c:\great_teste\update_table.py", line 45, in update_table
tabela.append(df)
File "C:\Users\9001329\AppData\Roaming\Python\Python310\site-packages\pyiceberg\table_init_.py", line 1057, in append
check_schema_compatible(self.schema(), other_schema=df.schema)
File "C:\Users\9001329\AppData\Roaming\Python\Python310\site-packages\pyiceberg\table_init.py", line 175, in _check_schema_compatible
raise ValueError(f"Mismatch in fields:\n{console.export_text()}")
ValueError: Mismatch in fields:
ββββββ³ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββββββββββββββββββ β Table field β Dataframe field ββ‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©β β β 1: nome_tabela: optional string (Nome data Tabela Processada) β 1: nome_tabela: optional string β
β β β 2: nome_base_dados: optional string (Nome do Banco de dados que pertence β 2: nome_base_dados: optional string β
β β a tabela) β ββ β β 3: particao: optional string (Nome da particao) β 3: particao: optional string β
β β β 4: numero_registro: optional long (Quantidade de registros) β 4: numero_registro: optional long β
β β β 5: process_date: optional string (parametro quando Γ© enviado e passo para β 5: process_date: optional string β
β β a funcao de escrita para particao) β ββ β β 6: particao_customizada: optional string (Indica que a partição Γ© β 6: particao_customizada: optional string β
β β diferente do padrΓ£o) β ββ β β 7: data_criacao: optional date (Data em que foi inserido o registro) β 7: data_criacao: optional date β
ββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββ
@Fokko, can you help with clarifying the expected behavior? I believe we should compare the representations (repr) of the objects. Currently, the doc attribute is not included in the __repr__
, so changing the comparison to be between repr objects might solve this problem. What do you think?
Sorry, I double-checked the Java implementation, and it's correct on the Python side.
@apersilva, for your case, I believe you need to do something like this:
from pyiceberg.io.pyarrow import schema_to_pyarrow
schema = schema_to_pyarrow(tabela.schema())
df = pa.Table.from_pylist(
[
{
"nome_tabela": table_name,
"nome_base_dados": database_name,
"particao": partition_by,
"numero_registro": size,
"process_date": process_date,
"particao_customizada": custom_partition,
"data_criacao": datetime.now().date()
}
],
schema=schema
)
tabela.append(df)
In a future release, there will be a function in the Schema object to return the Arrow schema, so it would look like this: schema = tabela.schema().as_arrow()
ItΒ΄s work, thanks a lot.
@apersilva looks like your issue is resolved, can we close this issue?