nroldanf / delta-examples

Delta Lake examples

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

delta-examples

This repo provides notebooks with Delta Lake examples using PySpark, Scala Spark, and Python.

Running these commands on your local machine is a great way to learn about how Delta Lake works.

PySpark setup

You can install PySpark and Delta Lake by creating the pyspark-330-delta-220 conda environment.

Create the environment with this command: conda env create -f envs/pyspark-330-delta-220.

Activate the environment with this command: conda activate pyspark-330-delta-220.

Then you can run jupyter lab and execute all the PySpark notebooks.

Release Candidate setup

The Delta Lake Spark connector release process typically includes an early release candidate release a few weeks prior to the official release. (example)

The release candidate is typically hosted in non-standard repositories that require additional setup.

To configure a notebook to use a release candidate:

  1. Make sure to use a conda environment that includes the correct --extra-index-url, see here for an example.

  2. Provide the appropriate Ivy settings file that has the correct non-standard repository, see this one for an example.

  3. Initialize Spark with the correct Ivy configuration

    .config(
        "spark.jars.ivySettings",
        "../../ivy/2.4.0rc1.xml"
    )
    

delta-rs setup (Python bindings)

You can run the delta-rs notebooks that use the Python bindings by creating the mr-delta-rs conda environment.

Create the environment with this command: conda env create -f envs/mr-delta-rs.yml.

Activate the environment with this command: conda activate mr-delta-rs.

Scala setup

You can install almond to run the Scala Spark notebooks in this repo.

Contributing

We welcome contributions, especially notebooks that illustrate important functions that will benefit the Delta Lake community.

Check out the open issues for ideas on good notebooks to create!

About

Delta Lake examples

License:Apache License 2.0


Languages

Language:Jupyter Notebook 100.0%