apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.

Home Page:https://hudi.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[SUPPORT]

zaminhassnain06 opened this issue · comments

Hi
Our organization is migrating from Hudi 0.6.0 to Hudi 0.12.1 and also updating the required spark and EMR versions. Our existing data sets (100s of TBs of data on S3) are written using Hudi 0.6.0.

The latest version of Hudi has come way since 0.6.0, we are not sure about how to use 0.12.1 directly.

Could someone provide the steps for upgrading from 0.6.0 to 0.12.1?

Do we have to rebuild our tables, we are more concerned about this as tables are having billions of records ?

Should we expect following imporvements after the upgrade:
– faster upserts

 – columns add/modify (schema evolution)

 – clustering

 – possible solution for storing history of updates performed on recrods

Thanks,
Zamin Hassnain

I would suggest you use the 0.12.3 or 0.14.1, 0.12.1 still got some stability issues.