Issue: Spark Streaming does not consume from Kafka
arthurdysart opened this issue · comments
Running PySpark Streaming script does not yield or print intermediate output for micro-RDD evaluations.
Resulting error from PySpark : "TypeError: can't pickle generator objects", "ERROR PythonDStream$$anon$1:91 - Cannot connect to Python process. It's probably dead. Stopping StreamingContext."
Error points to function "save_to_database()" in "cycle_step_analysis.py". But what generator object is being pickled?
Solved: in PySpark Streaming script, lambda argument of "parsed_rdd" must be mapped with explicit tuple(); otherwise, lazily evaluated as generator object.
For example, see tuple comprehension combined with function "sum()".