datamindedbe / lighthouse

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

Home Page:https://datamindedbe.github.io/lighthouse/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Links vs Sinks

wimvanleuven-kbc opened this issue · comments

In our Python port of lighthouse, we were discussing the design choices of links vs sinks. More specifically why a link supports writes vs the need for sinks.

Would it make sense to keep links readonly and sinks write only?

Thanks for any clarification!
-wim

Hey Wim,

We had similar discussions during the design. Our argument for the current design would be that it is done to optimise re-use of code/configuration as much as possible with a minimal design. What is a 'sink' in one job is often a 'source' in another. An option would be to introduce the notion of Source and Sink and some data lake links would implement both interfaces. This would have a better separation of concerns. What would your argumentation be?

Kind regards
Pascal