yahoo / TensorFlowOnSpark

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python 3 support on Windows? Running into a pickling problem

tjrileywisc opened this issue · comments

I'm running the MNIST example on a standalone cluster in Windows. I had to make a few changes to enable Python 3 support (I'm using 3.5):

In TFCluster.py (due to changes in relative imports):

from . import TFSparkNode

In TFSparkNode.py (Queue > queue and relative imports again, also a problem with Python 3 not handling UUID objects the same as 2):

from . import TFManager
...
authkey = uuid.uuid4()
authkey = authkey.bytes

... and then I get stuck with a pickling issue:

Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "F:\TensorFlowOnSpark\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\worker.py", line 111, in main
  File "F:\TensorFlowOnSpark\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\worker.py", line 106, in process
  File "F:\TensorFlowOnSpark\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\rdd.py", line 2346, in pipeline_func
  File "F:\TensorFlowOnSpark\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\rdd.py", line 317, in func
  File "F:\TensorFlowOnSpark\tfspark.zip\com\yahoo\ml\tf\TFSparkNode.py", line 97, in _reserve
  File ".\tfspark.zip\com\yahoo\ml\tf\TFManager.py", line 36, in start
    mgr.start()
  File "C:\Python35\lib\multiprocessing\managers.py", line 479, in start
    self._process.start()
  File "C:\Python35\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Python35\lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)
  File "C:\Python35\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Python35\lib\multiprocessing\reduction.py", line 59, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'start.<locals>.<lambda>'

Not really how to proceed from here. I tried to use another library (dill) to do the pickling but that isn't working. Has anybody gotten this to work in Python 3?

@tjrileywisc I have a branch for Python 3 support that I had been working on locally. I just pushed it up, but it's not merged in yet because I've only "lightly tested" it on my Mac w/ Spark Standalone... However, if you're already experimenting, feel free to try it out and see if it works for you...

@leewyang Thanks. I checked out the branch. I'm still stuck with the same error unfortunately but I'll keep trying.

FYI, we've merged in the python3 compatible code. Unfortunately, I doubt that it will fix this issue (which seems to be Windows-specific pickling issue), so I'll update the title accordingly.

I'm also trying python 3.5 + tensorflow + windows solution and I also get the pickling issue. I tried using pathos multiprocess to replace multiprocessing, it seems the pickling issue is gone but the workers crashed for unknown reason. If possible could you try using pathos and see if it can work on Linux?

@goodwanghan May I ask how you used pathos to resolve the pickling issue? I'm also facing the same issue on windows right now.

This is probably clear by now, but it seems that the ability to pickle under windows is affected by the lack of ability to fork() in the multiprocess package (a spawn() occurs instead). It was resolved in another context (ouspg/trytls#197) by moving the call but not sure if that's relevant here.

I could not install this on CDH hence took this route of installing on windows.
I stood up a standalone spark cluster using this https://sachingpta.gitlab.io/_posts/installing-spark-cluster-in-windows.html. I could run the first step but train of the model fails. I'm stuck on this pickling error too. Let me know if you find a resolution to this.

Is there any way to run this on windows and avoid the pickling error?

im not sure
C:\Python35\lib\multiprocessing\reduction.py
import dill as pickle

Is this still an issue?

Added a note that Windows isn't currently supported due to this issue.

I found a solution to this: I removed all classes definition from my main execution file (as instructed in DataLoader source here) and the pickling error no longer appears.
I'm running Windows.