Python 3 support on Windows? Running into a pickling problem
tjrileywisc opened this issue · comments
I'm running the MNIST example on a standalone cluster in Windows. I had to make a few changes to enable Python 3 support (I'm using 3.5):
In TFCluster.py (due to changes in relative imports):
from . import TFSparkNode
In TFSparkNode.py (Queue > queue and relative imports again, also a problem with Python 3 not handling UUID objects the same as 2):
from . import TFManager
...
authkey = uuid.uuid4()
authkey = authkey.bytes
... and then I get stuck with a pickling issue:
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "F:\TensorFlowOnSpark\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\worker.py", line 111, in main
File "F:\TensorFlowOnSpark\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\worker.py", line 106, in process
File "F:\TensorFlowOnSpark\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\rdd.py", line 2346, in pipeline_func
File "F:\TensorFlowOnSpark\spark-1.6.0-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\rdd.py", line 317, in func
File "F:\TensorFlowOnSpark\tfspark.zip\com\yahoo\ml\tf\TFSparkNode.py", line 97, in _reserve
File ".\tfspark.zip\com\yahoo\ml\tf\TFManager.py", line 36, in start
mgr.start()
File "C:\Python35\lib\multiprocessing\managers.py", line 479, in start
self._process.start()
File "C:\Python35\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Python35\lib\multiprocessing\context.py", line 313, in _Popen
return Popen(process_obj)
File "C:\Python35\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
reduction.dump(process_obj, to_child)
File "C:\Python35\lib\multiprocessing\reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'start.<locals>.<lambda>'
Not really how to proceed from here. I tried to use another library (dill) to do the pickling but that isn't working. Has anybody gotten this to work in Python 3?
@tjrileywisc I have a branch for Python 3 support that I had been working on locally. I just pushed it up, but it's not merged in yet because I've only "lightly tested" it on my Mac w/ Spark Standalone... However, if you're already experimenting, feel free to try it out and see if it works for you...
@leewyang Thanks. I checked out the branch. I'm still stuck with the same error unfortunately but I'll keep trying.
FYI, we've merged in the python3 compatible code. Unfortunately, I doubt that it will fix this issue (which seems to be Windows-specific pickling issue), so I'll update the title accordingly.
I'm also trying python 3.5 + tensorflow + windows solution and I also get the pickling issue. I tried using pathos multiprocess to replace multiprocessing, it seems the pickling issue is gone but the workers crashed for unknown reason. If possible could you try using pathos and see if it can work on Linux?
@goodwanghan May I ask how you used pathos to resolve the pickling issue? I'm also facing the same issue on windows right now.
This is probably clear by now, but it seems that the ability to pickle under windows is affected by the lack of ability to fork() in the multiprocess package (a spawn() occurs instead). It was resolved in another context (ouspg/trytls#197) by moving the call but not sure if that's relevant here.
I could not install this on CDH hence took this route of installing on windows.
I stood up a standalone spark cluster using this https://sachingpta.gitlab.io/_posts/installing-spark-cluster-in-windows.html. I could run the first step but train of the model fails. I'm stuck on this pickling error too. Let me know if you find a resolution to this.
Is there any way to run this on windows and avoid the pickling error?
im not sure
C:\Python35\lib\multiprocessing\reduction.py
import dill as pickle
Is this still an issue?
I found a solution to this: I removed all classes definition from my main execution file (as instructed in DataLoader source here) and the pickling error no longer appears.
I'm running Windows.