rui-yan / SSL-FL

Self-supervised Federated Learning for Medical Imaging - IEEE TMI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CUDA error: device-side assert triggered

chillum-codeX opened this issue · comments

commented

Traceback (most recent call last):
File "run_class_finetune_FedAvg.py", line 500, in
main(args, model)
File "run_class_finetune_FedAvg.py", line 404, in main
train_stats = train_one_epoch(args, model, criterion, data_loader_train, optimizer,
File "/workspace/nvidia/pradeep/SSL-FL/code/fed_beit/engine_for_finetuning.py", line 89, in train_one_epoch
loss_value = loss.item()
RuntimeError: CUDA error: device-side assert triggered

command :
python run_class_finetune_FedAvg.py --finetune /workspace/nvidia/pradeep/SSL-FL/output/covidfl_pretrain_beit_base_central_checkpoint-999.pth --output_dir /workspace/nvidia/pradeep/SSL-FL/OUTPUT_PATH_FT/ --save_ckpt --num_workers 0 --log_dir /home/SSL-FL/log_dir/ --n_clients 5 --num_local_clients -1

This error could be caused by an inconsistency between the number of labels and output units or an incorrect input for a loss function. I have not experienced any of these errors when running on our released datasets using the given scripts. Please follow our instructions.

commented

Thanks for the reply
solved