Seanlinx / mtcnn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

train PNet is so slow

tzhang2014 opened this issue · comments

when I run python example/train_P_net.py --gpus 0 , My GPU is 1070
INFO:root:Epoch[0] Batch [200] Speed: 123.25 samples/sec Train-Accuracy=0.697969
INFO:root:Epoch[0] Batch [200] Speed: 123.25 samples/sec Train-LogLoss=0.617246
INFO:root:Epoch[0] Batch [200] Speed: 123.25 samples/sec Train-BBOX_MSE=0.103584
can you help me ? this is a wrong ? Where is the mistake?thx

you need put your data in SSD disk

@xiaoxiongli thank you, how much time in your PC, What is the configuration of your PC? thx

@tzhang2014 i also meet this problem, how did you improve it?

INFO:root:Epoch[0] Batch [200] Speed: 126.56 samples/sec Train-Accuracy=0.697195
INFO:root:Epoch[0] Batch [200] Speed: 126.56 samples/sec Train-LogLoss=0.614800
INFO:root:Epoch[0] Batch [200] Speed: 126.56 samples/sec Train-BBOX_MSE=0.106309

Only the first round is slow, the other is very fast.

You can change mxnet's environment variables to speed training ,just like cmd : export MXNET_GPU_WORKER_NTHREADS=4 (default = 2) and : export MXNET_GPU_COPY_NTHREADS=4 (default = 1) . after i did it , every thing became better

eg : i7-7700 gtx1060
INFO:root:Epoch[0] Batch [3780] Speed: 8343.78 samples/sec Accuracy=0.898810 LogLoss=0.270442 BBOX_MSE=0.015827
INFO:root:Epoch[0] Batch [3800] Speed: 9112.26 samples/sec Accuracy=0.891901 LogLoss=0.282063 BBOX_MSE=0.015802
INFO:root:Epoch[0] Batch [3820] Speed: 10172.07 samples/sec Accuracy=0.883745 LogLoss=0.303172 BBOX_MSE=0.015691
INFO:root:Epoch[0] Batch [3840] Speed: 10388.03 samples/sec Accuracy=0.878459 LogLoss=0.288958 BBOX_MSE=0.015310
INFO:root:Epoch[0] Batch [3860] Speed: 9720.13 samples/sec Accuracy=0.885983 LogLoss=0.310603 BBOX_MSE=0.015680
INFO:root:Epoch[0] Batch [3880] Speed: 9980.33 samples/sec Accuracy=0.879565 LogLoss=0.300225 BBOX_MSE=0.016198

@linsoncvw After 1 epoch ,the speed is so fast. I don't understand the reason

commented

Did you meet "Cannot find argument 'out_grad'" when using train_P_net.py?

@geoffzhang I met the same problem,did you fix it?

@geoffzhang @EmiPark delete all 'out_grad=True' in core\symbol.py

@geoffzhang @EmiPark delete all 'out_grad=True' in core\symbol.py
delete "out_grad = True",whether it has an impact on training?