Issue: Spark Streaming does not consume from Kafka

Question

Issue: Spark Streaming does not consume from Kafka

arthurdysart opened this issue 6 years ago · comments

Running PySpark Streaming script does not yield or print intermediate output for micro-RDD evaluations.

Resulting error from PySpark : "TypeError: can't pickle generator objects", "ERROR PythonDStream$$anon$1:91 - Cannot connect to Python process. It's probably dead. Stopping StreamingContext."

Error points to function "save_to_database()" in "cycle_step_analysis.py". But what generator object is being pickled?

Art · Answer 1 · Fri Oct 05 2018 01:35:56 GMT+0800 (China Standard Time)

Solved: in PySpark Streaming script, lambda argument of "parsed_rdd" must be mapped with explicit tuple(); otherwise, lazily evaluated as generator object.

For example, see tuple comprehension combined with function "sum()".