ipython / ipyparallel

IPython Parallel: Interactive Parallel Computing in Python

Home Page:https://ipyparallel.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sync_imports not working as intended

WernerFS opened this issue · comments

Hello,

I am working on TLJH. Previously, the code worked fine, but now I get errors when performing imports in remote hosts:

ipyparallel==8.6.1

import ipyparallel as ipp

engines = 1

cluster = ipp.Cluster(profile= "ssh", n= engines) # profile is short hand for profile_dir/profile_""
rc = cluster.start_and_connect_sync()

dview = rc[:]

with dview.sync_imports():
    import numpy

Return:

[Engine Exception]:
Traceback (most recent call last):

  File "/opt/tljh/user/lib/python3.10/site-packages/ipyparallel/client/client.py", line 885, in _handle_stranded_msgs
    raise error.EngineError(

ipyparallel.error.EngineError: Engine 0 died while running task 'c965c310-bca08b3ee2a0a3c1be6caa3f_59412_1'
fetching /tmp/tmpgjrkj09r/ipengine-1700667075.2026.out from user@192.168.0.6:.ipython/profile_ssh/log/ipengine-1700667075.2026.out
Removing user@192.168.0.6:.ipython/profile_ssh/log/ipengine-1700667075.2026.out
engine set stopped 1700667068: {'engines': {'user@192.168.0.6/0': {'exit_code': -1, 'pid': 2754, 'identifier': 'user@192.168.0.6/0'}}, 'exit_code': -1}

However the error changes when the import changes and fails in the remote_import() call:

import ipyparallel as ipp

engines = 1

cluster = ipp.Cluster(profile= "ssh", n= engines) # profile is short hand for profile_dir/profile_""
rc = cluster.start_and_connect_sync()

dview = rc[:]

with dview.sync_imports():
    from numpy import random

Return:

importing random from numpy on engine(s)
[0:apply]:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File <string>:1

File /opt/tljh/user/lib/python3.10/site-packages/ipyparallel/client/view.py:437, in remote_import(name, fromlist, level)

TypeError: 'str' object is not callable

As you all can guess this is an issue of my implementation. It works fine when working on local clusters. I checked the ipyparallel version on remote and host, both = 8.6.1.

The host runs Python 3.10.12 and the remote is on 3.9.2. My first guess was that maybe the globals() call inside the remote_import() function was changed in the latest Python version. Sadly, this is not the case.

I will keep posting my findings in case this is relevant to anyone.

More on my implementation:

TLJH:

Linux 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC x86_64 x86_64 x86_64 GNU/Linux
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy

Remote nodes:

Linux hyperfpga-3be11-3-2 5.15.36-xilinx-v2022.2 #1 SMP aarch64 GNU/Linux
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye

To debug the issue, I went back to the toy experiments to check that it's only when sync_imports() is executed.
The (Load balanced map and parallel function decorator)[https://github.com/ipython/ipyparallel/blob/main/docs/source/examples/Parallel%20Decorator%20and%20map.ipynb] works as expected.

It succeeds when submitting tasks and retrieving the results:

Submitted tasks, got ids:  ['4aae4be3-0e1d348191cd935771cece3f_76913_11', '4aae4be3-0e1d348191cd935771cece3f_76913_12', '4aae4be3-0e1d348191cd935771cece3f_76913_13', '4aae4be3-0e1d348191cd935771cece3f_76913_14', '4aae4be3-0e1d348191cd935771cece3f_76913_15', '4aae4be3-0e1d348191cd935771cece3f_76913_16', '4aae4be3-0e1d348191cd935771cece3f_76913_17', '4aae4be3-0e1d348191cd935771cece3f_76913_18', '4aae4be3-0e1d348191cd935771cece3f_76913_19', '4aae4be3-0e1d348191cd935771cece3f_76913_20']
Using a mapper:  [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

This also works fine:

@d.parallel(block=True)
def df(x): 
    import numpy
    return( x * numpy.random.randint(100))

result = df.map(range(10))
print("Submitted tasks, got ids: ")

print("Using a parallel function in direct view: ", result)

Which may work as an alternative to sync_imports().

I believe the problem is that in my remote I have two Python in the Python3 variable. This is stupid but easily solvable by simply specifying the Python binary to use in the engines in the ipcluster_config.py file.

c.SSHEngineSetLauncher.remote_python = "/usr/bin/python3.9"

Don't be dumb like me, specify your remote_python in a virtual environment or use conda.

Glad you figured it out! IPython Parallel's code serialization isn't stable across different Python version. It may work sometimes, but won't in general. If you use cloudpickle (cluster[:].use_cloudpickle()), it might be more reliable. But I think this approach also means things like sync_imports won't work, because it changes how globals are resolved.

I actually wrote that comment while waiting for the test to finish, overly optimistic. Sadly, my implementation is way more broken than I imagined.

I got around the imports by using %px import magic. The issue now is that traceback is breaking up on engines.

Traceback (most recent call last):
  File "/tmp/ipykernel_99099/659900893.py", line 11, in calculate_solutions_fpga
TypeError: 'enumerate' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mlabadm/.local/lib/python3.9/site-packages/ipyparallel/engine/kernel.py", line 199, in do_apply
    exec(code, shell.user_global_ns, shell.user_ns)
  File "<string>", line 1, in <module>
  File "/opt/tljh/user/lib/python3.10/site-packages/ipyparallel/client/remotefunction.py", line 148, in <lambda>
    _map = lambda f, *sequences: list(map(f, *sequences))
  File "/tmp/ipykernel_99099/659900893.py", line 24, in calculate_solutions_fpga
TypeError: 'traceback' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2057, in showtraceback
    stb = self.InteractiveTB.structured_traceback(
  File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1288, in structured_traceback
    return FormattedTB.structured_traceback(
  File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1177, in structured_traceback
    return VerboseTB.structured_traceback(
  File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1049, in structured_traceback
    formatted_exceptions += self.format_exception_as_a_whole(etype, evalue, etb, lines_of_context,
  File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/ultratb.py", line 935, in format_exception_as_a_whole
    self.get_records(etb, number_of_lines_of_context, tb_offset) if etb else []
  File "/home/mlabadm/.local/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1003, in get_records
    lines, first = inspect.getsourcelines(etb.tb_frame)
  File "/usr/lib/python3.9/inspect.py", line 1006, in getsourcelines
    lines, lnum = findsource(object)
  File "/usr/lib/python3.9/inspect.py", line 827, in findsource
    raise OSError('source code not available')
OSError: source code not available

At this point, I suspect it may be better to start from scratch. Nonetheless any suggestion is welcomed.

Please, please, please. Make yourself a favor and just use the same Python version on your remote engines and your host.

Yeah, in general cross-versions isn't supported, though it might work. We could probably put in some more visible warnings when that situation is detected.