Distributed mode for single GPU

Question

Distributed mode for single GPU

TheodorPatrickZ opened this issue 2 years ago · comments

Is it possibile to run itr_flickr as not distributed but on a single gpu?

When running:
python run.py --task "itr_flickr" --dist "gpu0" --output_dir "output/itr_flickr" --checkpoint "4m_base_finetune/itr_flickr/checkpoint_best.pth"

I get:

Training Retrieval Flickr

| distributed init (rank 0): env://
Traceback (most recent call last):
File "Retrieval.py", line 381, in
main(args, config)
File "Retrieval.py", line 215, in main
utils.init_distributed_mode(args)
File "C:\Users..\X-VLM-master\utils_init_.py", line 357, in init_distributed_mode
world_size=args.world_size, rank=args.rank)
File "C:\Users..\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group
init_method, rank, world_size, timeout=timeout
File "C:\Users..\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous
raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for env://

Yan Zeng · Answer 1 · Tue Apr 26 2022 10:47:20 GMT+0800 (China Standard Time)

Hi,

Our code can run on a single gpu by specifying --dist "gpu0".
I didn't get this error and also have no idea. Sorry.

TheodorPatrickZ · Answer 2 · Tue Apr 26 2022 16:49:32 GMT+0800 (China Standard Time)

Got it running after looking at it again the next day, thanks for the fast response!