rymurr / nessie

Nessie provides Git-like capabilities for your Data Lake

Home Page:https://projectnessie.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Nessie

Build Status codecov Maven Central PyPI Docker

Project Nessie is a Transactional Catalog for Data Lakes with Git-like semantics.

More information can be found at projectnessie.org.

Nessie supports Iceberg Tables/Views and Delta Lake Tables. Additionally, Nessie is focused on working with the widest range of tools possible, which can be seen in the feature matrix.

Using Nessie

You can quickly get started with Nessie by using our small, fast docker image.

docker pull projectnessie/nessie
docker run -p 19120:19120 projectnessie/nessie

For trying Nessie image with different configuration options, refer to the templates under the docker module.

A local Web UI will be available at this point.

Then install the Nessie CLI tool (to learn more about CLI tool and how to use it, check Nessie CLI Documentation).

pip install pynessie

From there, you can use one of our technology integrations such those for

To learn more about all supported integrations and tools, check here

Have fun! We have a Google Group and a Slack channel we use for both developers and users. Check them out here.

Authentication

By default, Nessie servers run with authentication disabled and all requests are processed under the "anonymous" user identity.

Nessie supports bearer tokens and uses OpenID Connect for validating them.

Authentication can be enabled by setting the following Quarkus properties:

  • nessie.server.authentication.enabled=true
  • quarkus.oidc.auth-server-url=<OpenID Server URL>
  • quarkus.oidc.client-id=<Client ID>

Experimenting with Nessie Authentication in Docker

One can start the projectnessie/nessie docker image in authenticated mode by setting the properties mentioned above via docker environment variables. For example:

docker run -p 19120:19120 -e QUARKUS_OIDC_CLIENT_ID=<Client ID> -e QUARKUS_OIDC_AUTH_SERVER_URL=<OpenID Server URL> -e NESSIE_SERVER_AUTHENTICATION_ENABLED=true --network host projectnessie/nessie

Building and Developing Nessie

Requirements

  • JDK 11 or higher: JDK11 or higher is needed to build Nessie (artifacts are built for Java 8)

Installation

Clone this repository and run maven:

git clone https://github.com/projectnessie/nessie
cd nessie
./mvnw clean install

Compatibility

Nessie Iceberg's integration is compatible with Iceberg as in the following table:

Nessie version Iceberg version Spark version Hive version Flink version
0.20.0 0.13.1 3.0.X, 3.1.X 2.3.9 1.12.1
0.9.2 0.12.1, 0.12.0 3.0.X, 3.1.X 2.3.9 1.12.1

Nessie Delta Lake's integration is compatible with Delta Lake as in the following table:

Nessie version Delta Lake version Spark version
0.20.0 Custom 3.2.X
0.9.2 Custom 3.1.X

Delta Lake artifacts

Nessie required some minor changes to Delta for full support of branching and history. These changes are currently being integrated into the mainline repo. Until these have been merged we have provided custom builds in our fork which can be downloaded from a separate maven repository.

Distribution

To run:

  1. configuration in servers/quarkus-server/src/main/resources/application.properties
  2. execute ./mvnw quarkus:dev
  3. go to http://localhost:19120

UI

To run the ui (from ui directory):

  1. If you are running in test ensure that setupProxy.js points to the correct api instance. This ensures we avoid CORS issues in testing
  2. npm install will install dependencies
  3. npm run start to start the ui in development mode via node

To deploy the ui (from ui directory):

  1. npm install will install dependencies
  2. npm build will minify and collect the package for deployment in build
  3. the build directory can be deployed to any static hosting environment or run locally as serve -s build

Docker image

When running mvn clean install -Pdocker a docker image will be created at projectnessie/nessie which can be started with docker run -p 19120:19120 projectnessie/nessie and the relevant environment variables. Environment variables are specified as per https://github.com/eclipse/microprofile-config/blob/master/spec/src/main/asciidoc/configsources.asciidoc#default-configsources

AWS Lambda

You can also deploy to AWS lambda function by following the steps in servers/lambda/README.md

Contributing

Code Style

The Nessie project uses the Google Java Code Style, scalafmt and pep8. See CONTRIBUTING.md for more information.

About

Nessie provides Git-like capabilities for your Data Lake

https://projectnessie.org

License:Apache License 2.0


Languages

Language:Java 80.9%Language:Python 7.8%Language:Scala 3.9%Language:TypeScript 3.3%Language:JavaScript 2.9%Language:HTML 0.5%Language:CSS 0.3%Language:ANTLR 0.2%Language:Makefile 0.1%Language:Smarty 0.1%Language:Batchfile 0.0%