how to take down ray and put up again in local mode
SiRumCz opened this issue · comments
My program has memory risk, and part of it seems to come from memory leak (idling ray workers holding a big chunk of memory). I have a for loop to independently run chunks of csv file on a series of tasks, I wish to kill ray after each iteration to release memory, and let Modin to put it up again with fresh ray workers. However, my code is the following:
import pandas
for df_ in pandas.read_csv('xxx.csv', chunk=5000):
df_.to_csv(xxx)
run_my_tasks(xxx) # Modin will initialize ray in first iteration
ray.shutdown()
however, I got below error:
File "/home/.../lib/python3.9/site-packages/modin/core/execution/ray/common/deferred_execution.py", line 309, in _deconstruct_chain
output[out_pos] = out_pos
IndexError: list assignment index out of range
Hi @SiRumCz, thanks for posting this issue. I guess there might be an issue with multiple Ray initialization in Modin codebase. We would have to look into this deeper. Meanwhile, can you explicitly put ray.init()
before run_my_tasks(xxx)
to see if it works?
@YarShev Thanks for your response. Yes, I have tried that method, and unfortunately I got:
ValueError: An application is trying to access a Ray object whose owner is unknown(00ffffffffffffffffffffffffffffffffffffff0100000002e1f505). Please make sure that all Ray objects you are trying to access are part of the current Ray session. Note that object IDs generated randomly (ObjectID.from_random()) or out-of-band (ObjectID.from_binary(...)) cannot be passed as a task argument because Ray does not know which task created them. If this was not how your object ID was generated, please file an issue at https://github.com/ray-project/ray/issues/
@SiRumCz, could you try to execute ray.init()
and importlib.reload(pd)
before run_my_tasks(xxx)
, where pd
is import modin.pandas as pd
?
@SiRumCz, I opened #7280, which adds reload_modin
function. Tested on the following example and it passed to me.
import modin.pandas as pd
from modin.utils import reload_modin
import ray
ray.init(num_cpus=16) # can be commented out, works
df = pd.read_csv("example.csv")
df = df.abs()
print(df)
ray.shutdown()
reload_modin()
ray.init(num_cpus=16) # can be commented out, works
df = pd.read_csv("example.csv")
df = df.abs()
print(df)
thanks, I ended up using a Process to wrap my task into a new process, ray will be taken down when process ends. But I am happy that there will a feature for this, cheers :-)