Disk read and write too fast, resulting in server lag, how to reduce read and write speed
frankst-debug opened this issue · comments
❓ Questions and Help
I use two gpu to run code on vqav2 dataset using movie_mcan model, the gpu memory is not enough so the batch_size is set to 16, but every time I run the code will cause the server abnormal lag, I use sar -d 3 5 to check the disk read and write, I found that the read speed is very fast, how to improve this problem, when the lag I can't do any operation.
This is my training code
CUDA_VISIBLE_DEVICES=2,3 mmf_run config=projects/movie_mcan/configsqa2/defaults.yaml model=movie_mcan dataset=vqa2 run_type=train env.cache_dir=/data/students/zzj/ env.data_dir=/data/students/zzj/ training.batch_size=16