dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

python run.py with data_root=content/datasets num_gpus=2 num_nodes=1 task_mlm_itm whole_word_masking=True step100k per_gpu_batchsize=64

F-Yuan303 opened this issue · comments

i encounter this when i pre-train with coco:
WARNING - ViLT - No observers have been added to this run
INFO - ViLT - Running command 'main'
INFO - ViLT - Started
Global seed set to 0
INFO - lightning - Global seed set to 0
INFO - timm.models.helpers - Loading pretrained weights from url (https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p32_384-830016f5.pth)
GPU available: True, used: True
INFO - lightning - GPU available: True, used: True
TPU available: None, using: 0 TPU cores
INFO - lightning - TPU available: None, using: 0 TPU cores
Using environment variable NODE_RANK for node rank ().
INFO - lightning - Using environment variable NODE_RANK for node rank ().
ERROR - ViLT - Failed after 0:00:06!
Traceback (most recent calls WITHOUT Sacred internals):
File "run.py", line 67, in main
val_check_interval=_config["val_check_interval"],
File "/data/fyuan/anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 41, in overwrite_by_env_vars
return fn(self, **kwargs)
File "/data/fyuan/anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 359, in init
deterministic,
File "/data/fyuan/anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator_connector.py", line 127, in on_trainer_init
self.trainer.node_rank = self.determine_ddp_node_rank()
File "/data/fyuan/anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator_connector.py", line 415, in determine_ddp_node_rank
return int(rank)
ValueError: invalid literal for int() with base 10: ''

Have you solved it? I have the same bug.

Have you solved it? I have the same bug.

Not yet bro.

I solved it bro!
Don't forget to set these variables!
export MASTER_ADDR=$DIST_0_IP
export MASTER_PORT=$DIST_0_PORT
export NODE_RANK=$DIST_RANK

I solved it bro! Don't forget to set these variables! export MASTER_ADDR=$DIST_0_IP export MASTER_PORT=$DIST_0_PORT export NODE_RANK=$DIST_RANK

it works, thanks!

Nice job! :)

I solved it bro!
Don't forget to set these variables!
export MASTER_ADDR=$DIST_0_IP
export MASTER_PORT=$DIST_0_PORT
export NODE_RANK=$DIST_RANK

if use one machine and 8 GPUS, how to set these variables?