antonum / Databricks-Redis

Databricks notebook, using Spark-Redis integration and RediSearch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Databricks-Redis

Example notebook loads data from build-in Databricks samples as a dataframe and then writes it into Redis database. In Redis RediSearch indices are created and queries from Python are run.

You can use this example to materialize Databricks/Spark dataframes as Redis Hashes

Add Spark-Redis JAR file to Cluster

To add required Redis-Spark libraries to your runtime add com.redislabs:spark-redis_2.12:2.4.2 maven library to your Cluster Libraries section. You might need to restart runtime after library was added.

alt text

Add repo to the Workspace

In your Databricks Workspace Repos->Add Repo, enter https://github.com/antonum/Databricks-Redis.git as Git repository URL.

alt text

Get Free Redis Cloud account

https://redis.com/try-free/ - sign up with google account. Capture the URL, Port and default user password for the database.

Change the following lines in your notebook to use your own Redis Cloud endpoint and password:

#Replace values below with your own if using Redis Cloud instance
REDIS_HOST="redis-17231.c228.us-central1-1.gce.cloud.redislabs.com"
REDIS_PORT=17231
REDIS_PASSWORD="0XKOePIFBCtuNvV6PhsXl3ysQYXXXXXX"

Using Azure Cache for Redis Enterprise

You can also use Azure Cache for Redis Enterprise with RediSearch. See detailed instructions.

References

Saving Dataframe to Redis

The following code fragment would load content of Spark dataframe to Redis as Hash keys. Key names would be "people:1234" where people is table option and 123 is a value of id column from key.column option.

df.write.format("org.apache.spark.sql.redis") \
      .mode("overwrite") \
      .option("table", "people") \
      .option("key.column", "id") \
      .option("host", REDIS_HOST) \
      .option("port", REDIS_PORT) \
      .option("auth", REDIS_PASSWORD) \
      .save()

For more information check Spark-Redis github and documentation https://github.com/RedisLabs/spark-redis

About

Databricks notebook, using Spark-Redis integration and RediSearch

License:MIT License


Languages

Language:Python 100.0%