Training is interrupted without error with 3 gpus
DatNgoBK opened this issue · comments
datngo288 commented
The training is interrupted when reaches 33% of the first epoch. I tried many times and always stopped at 33%. The graphic cards are still used 100% utils by python processes.
My config:
trainer.accelerator=ddp
traner.plugins=null
trainer.gradient_clip_val=400
trainer.gpus=3
trainer.amp_level=01
I trained on 3 V100 GPUs
stale commented
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.