alibaba / FederatedScope

Dear @rayrayraykk,

I'm trying to run federated offsite tuning code in the multi-gpu setting by setting the parameter federate.process_num

When I set the value of federate.process_num to >=2 in our server with 4 GPUs, I encountered an issue in Client.py within OffsiteTuningClient class. The error is as follows from :

 File "/home/krishna/2024/FederatedScope/federatedscope/core/parallel/parallel_runner.py", line 100, in run
    runner.setup()
  File "/home/krishna/2024/FederatedScope/federatedscope/core/parallel/parallel_runner.py", line 372, in setup
    client.model.to(self.device)
  File "/home/krishna/2024/FederatedScope/federatedscope/core/workers/base_worker.py", line 51, in model
    return self._model

It looks the self._model is deleted at

FederatedScope/federatedscope/llm/offsite_tuning/client.py

Line 37 in 7f08694

delattr(self, '_model')

within the OffsiteTuningClient class.

Can you please advise how to overcome this issue. Any pointers would also help me to fix the issue myself. Thanks a lot.

Can authors please tell if this issue is expected? or is it related to wrong setting in the configuration file. Thank you

In the current version, the multi-GPU training is not supported with offsite-tuning. Thank you!

Thank you for letting me know.

Offsite tuning code with multigpu setting throws error