apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.

Home Page:https://amoro.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Improvement]: Integrate multiple formats in a pluggable way

baiyangtx opened this issue · comments

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

Currently, Amoro includes multiple formats(iceberg, mixed-iceberg, and paimon) during compilation.

With the integration of more formats such as hudi, implementing the integration of different formats directly in the amoro-core module and the amoro-ams-server module will make the final distribution package more bloated.

This issue hopes to make the integration of the table format in a pluggable way.

This issue hopes to make the integration of table format in a pluggable way, which will help reduce the size of the binary package and also avoid the risk of introducing unnecessary code in the production environment.

How should we improve?

The finnally modules will look like this:

  • amoro-ams :
  • amoro-api : Amoro core api cross different table formats
  • amoro-iceberg: Iceberg related logic
  • amoro-formats-integrations
    • amoro-paimon-integration : Integrate paimon to amoro
    • amoro-hudi-integration : Integrate hudi to amoro
    • amoro-iceberg-integration : Integrate iceberg and mixed-iceberg to amoro
  • amoro-mixed-format
    • amoro-mixed-format-core : Core api of mixed-format cross different compute engines
    • amoro-mixed-format-spark : Connector of mixed-format for spark
    • amoro-mixed-format-flink : Connector of mixed-format for flink

Related

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

Code of Conduct