sync_imports not working as intended
WernerFS opened this issue · comments
Hello,
I am working on TLJH. Previously, the code worked fine, but now I get errors when performing imports in remote hosts:
ipyparallel==8.6.1
import ipyparallel as ipp
engines = 1
cluster = ipp.Cluster(profile= "ssh", n= engines) # profile is short hand for profile_dir/profile_""
rc = cluster.start_and_connect_sync()
dview = rc[:]
with dview.sync_imports():
import numpy
Return:
[Engine Exception]:
Traceback (most recent call last):
File "/opt/tljh/user/lib/python3.10/site-packages/ipyparallel/client/client.py", line 885, in _handle_stranded_msgs
raise error.EngineError(
ipyparallel.error.EngineError: Engine 0 died while running task 'c965c310-bca08b3ee2a0a3c1be6caa3f_59412_1'
fetching /tmp/tmpgjrkj09r/ipengine-1700667075.2026.out from user@192.168.0.6:.ipython/profile_ssh/log/ipengine-1700667075.2026.out
Removing user@192.168.0.6:.ipython/profile_ssh/log/ipengine-1700667075.2026.out
engine set stopped 1700667068: {'engines': {'user@192.168.0.6/0': {'exit_code': -1, 'pid': 2754, 'identifier': 'user@192.168.0.6/0'}}, 'exit_code': -1}
However the error changes when the import changes and fails in the remote_import() call:
import ipyparallel as ipp
engines = 1
cluster = ipp.Cluster(profile= "ssh", n= engines) # profile is short hand for profile_dir/profile_""
rc = cluster.start_and_connect_sync()
dview = rc[:]
with dview.sync_imports():
from numpy import random
Return:
importing random from numpy on engine(s)
[0:apply]:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File <string>:1
File /opt/tljh/user/lib/python3.10/site-packages/ipyparallel/client/view.py:437, in remote_import(name, fromlist, level)
TypeError: 'str' object is not callable
As you all can guess this is an issue of my implementation. It works fine when working on local clusters. I checked the ipyparallel
version on remote and host, both = 8.6.1.
The host runs Python 3.10.12 and the remote is on 3.9.2. My first guess was that maybe the globals()
call inside the remote_import()
function was changed in the latest Python version. Sadly, this is not the case.
I will keep posting my findings in case this is relevant to anyone.
More on my implementation:
TLJH:
Linux 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC x86_64 x86_64 x86_64 GNU/Linux
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
Remote nodes:
Linux hyperfpga-3be11-3-2 5.15.36-xilinx-v2022.2 #1 SMP aarch64 GNU/Linux
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
To debug the issue, I went back to the toy experiments to check that it's only when sync_imports()
is executed.
The (Load balanced map and parallel function decorator)[https://github.com/ipython/ipyparallel/blob/main/docs/source/examples/Parallel%20Decorator%20and%20map.ipynb] works as expected.
It succeeds when submitting tasks and retrieving the results:
Submitted tasks, got ids: ['4aae4be3-0e1d348191cd935771cece3f_76913_11', '4aae4be3-0e1d348191cd935771cece3f_76913_12', '4aae4be3-0e1d348191cd935771cece3f_76913_13', '4aae4be3-0e1d348191cd935771cece3f_76913_14', '4aae4be3-0e1d348191cd935771cece3f_76913_15', '4aae4be3-0e1d348191cd935771cece3f_76913_16', '4aae4be3-0e1d348191cd935771cece3f_76913_17', '4aae4be3-0e1d348191cd935771cece3f_76913_18', '4aae4be3-0e1d348191cd935771cece3f_76913_19', '4aae4be3-0e1d348191cd935771cece3f_76913_20']
Using a mapper: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
This also works fine:
@d.parallel(block=True)
def df(x):
import numpy
return( x * numpy.random.randint(100))
result = df.map(range(10))
print("Submitted tasks, got ids: ")
print("Using a parallel function in direct view: ", result)
Which may work as an alternative to sync_imports()
.
I believe the problem is that in my remote I have two Python in the Python3 variable. This is stupid but easily solvable by simply specifying the Python binary to use in the engines in the ipcluster_config.py
file.
c.SSHEngineSetLauncher.remote_python = "/usr/bin/python3.9"
Don't be dumb like me, specify your remote_python in a virtual environment or use conda.
Glad you figured it out! IPython Parallel's code serialization isn't stable across different Python version. It may work sometimes, but won't in general. If you use cloudpickle (cluster[:].use_cloudpickle()
), it might be more reliable. But I think this approach also means things like sync_imports won't work, because it changes how globals are resolved.
I actually wrote that comment while waiting for the test to finish, overly optimistic. Sadly, my implementation is way more broken than I imagined.
I got around the imports by using %px import
magic. The issue now is that traceback is breaking up on engines.
Traceback (most recent call last):
File "/tmp/ipykernel_99099/659900893.py", line 11, in calculate_solutions_fpga
TypeError: 'enumerate' object is not callable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mlabadm/.local/lib/python3.9/site-packages/ipyparallel/engine/kernel.py", line 199, in do_apply
exec(code, shell.user_global_ns, shell.user_ns)
File "<string>", line 1, in <module>
File "/opt/tljh/user/lib/python3.10/site-packages/ipyparallel/client/remotefunction.py", line 148, in <lambda>
_map = lambda f, *sequences: list(map(f, *sequences))
File "/tmp/ipykernel_99099/659900893.py", line 24, in calculate_solutions_fpga
TypeError: 'traceback' object is not callable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2057, in showtraceback
stb = self.InteractiveTB.structured_traceback(
File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1288, in structured_traceback
return FormattedTB.structured_traceback(
File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1177, in structured_traceback
return VerboseTB.structured_traceback(
File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1049, in structured_traceback
formatted_exceptions += self.format_exception_as_a_whole(etype, evalue, etb, lines_of_context,
File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/ultratb.py", line 935, in format_exception_as_a_whole
self.get_records(etb, number_of_lines_of_context, tb_offset) if etb else []
File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1003, in get_records
lines, first = inspect.getsourcelines(etb.tb_frame)
File "/usr/lib/python3.9/inspect.py", line 1006, in getsourcelines
lines, lnum = findsource(object)
File "/usr/lib/python3.9/inspect.py", line 827, in findsource
raise OSError('source code not available')
OSError: source code not available
At this point, I suspect it may be better to start from scratch. Nonetheless any suggestion is welcomed.
Please, please, please. Make yourself a favor and just use the same Python version on your remote engines and your host.
Yeah, in general cross-versions isn't supported, though it might work. We could probably put in some more visible warnings when that situation is detected.