RedisLabs / spark-redis

A connector for Spark that allows reading and writing to/from Redis cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Read getting stuck at stage 0

arturzangiev opened this issue · comments

Trying to read a dataframe from redis instance of AWS, but get stuck at stage 0.

[Stage 0:> (0 + 1) / 1]

self.__spark = SparkSession.builder\
           .config('spark.jars.packages', 'com.redislabs:spark-redis_2.12:3.1.0')\
           .config("spark.redis.host", "AWS-HOST")\
           .config("spark.redis.port", "6379")\
           .getOrCreate()

def __read_redis_keys(self) -> DataFrame:
       df = self.__spark.read.format("org.apache.spark.sql.redis")\
           .option("keys.pattern", "SOME_PATH*")\
           .option("infer.schema", True)\
           .load()
       return df

Spark 3.3.1
Scala 2.12.15
Java 17.0.1
Python 3.8.14
pyspark 3.3.1
Macbook M1

I managed to figure it out. It is clearly networking issue to do with AWS Elasticache. As I deployed to EMR the job successfully get executed. The thing I can't figure out now is why I can't execute it locally as I am on VPN and if I just use redis-cli I can access Elasticache fine. It looks like spark locally can't assign IP correctly.