datamindedbe / lighthouse

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

Home Page:https://datamindedbe.github.io/lighthouse/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lighthouse

Maven Central CircleCI Codacy Badge

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

Principles

  • Configuration as code
  • Idempotent execution
  • Utilities for easier building and testing Apache Spark based applications

Start using Lighthouse

In your build.sbt, add this:

libraryDependencies += "be.dataminded" %% "lighthouse" % <version>
libraryDependencies += "be.dataminded" %% "lighthouse-testing" % <version> % Test

If you are using Maven, add this to your pom.xml:

<dependency>
    <groupId>be.dataminded</groupId>
    <artifactId>lighthouse_2.11</artifactId>
    <version>[version]</version>
</dependency>
<dependency>
    <groupId>be.dataminded</groupId>
    <artifactId>lighthouse-testing_2.11</artifactId>
    <version>[version]</version>
    <scope>test</scope>
</dependency>

Online Documentation

This README file only contains basic instructions. Here is a more complete tutorial: https://datamindedbe.github.io/lighthouse/tutorial/

About

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

https://datamindedbe.github.io/lighthouse/

License:Apache License 2.0


Languages

Language:Scala 100.0%