Offsite tuning code with multigpu setting throws error
KKNakkav2 opened this issue · comments
Dear @rayrayraykk,
I'm trying to run federated offsite tuning
code in the multi-gpu setting by setting the parameter federate.process_num
When I set the value of federate.process_num
to >=2 in our server with 4 GPUs, I encountered an issue in Client.py
within OffsiteTuningClient
class. The error is as follows from :
File "/home/krishna/2024/FederatedScope/federatedscope/core/parallel/parallel_runner.py", line 100, in run
runner.setup()
File "/home/krishna/2024/FederatedScope/federatedscope/core/parallel/parallel_runner.py", line 372, in setup
client.model.to(self.device)
File "/home/krishna/2024/FederatedScope/federatedscope/core/workers/base_worker.py", line 51, in model
return self._model
It looks the self._model
is deleted at
OffsiteTuningClient
class.
Can you please advise how to overcome this issue. Any pointers would also help me to fix the issue myself. Thanks a lot.
Can authors please tell if this issue is expected? or is it related to wrong setting in the configuration file. Thank you
In the current version, the multi-GPU training is not supported with offsite-tuning. Thank you!
Thank you for letting me know.