FP become slower after upgrade to 0.4

Question

FP become slower after upgrade to 0.4

DesertsP opened this issue 6 years ago · comments

Hi,
Thanks for your works!
Recently I upgrade my network to 0.4 with your implementation of DenseNet. And I found that the new version is slower than before. I thought that the shared memory could speed up the forward pass obviously.In my application, predicting one subject on the 0.3.x version cost 9s but now it need 11s.

The dice metric also get worth than before. I found that in the new code you use the Kaiming normal initialization but before default initialization (uniform?). I have try to make all parameters as before but it has not effect. Have you some advice for me?

Thanks.

Geoff Pleiss · Answer 1 · Tue May 08 2018 20:15:05 GMT+0800 (China Standard Time)

Were you using the efficient implementation before? I think efficient is true by default.
The speed of the efficient version now depends on how much memory you have available. The more memory, the faster it runs.

What is the "dice" metric?

Deserts · Answer 2 · Tue May 08 2018 20:29:11 GMT+0800 (China Standard Time)

Thanks for your reply.
Yes. I use your efficient implementation before.
I found that the forward pass cost more time now, and I think the shared memory may speed up the forward pass.

Deserts · Answer 3 · Tue May 08 2018 20:31:56 GMT+0800 (China Standard Time)

Dice == F1-score.
I use it to evaluate my network. I'm confused that the performance get worse after I update to 0.4.x.

Geoff Pleiss · Answer 4 · Tue May 08 2018 22:22:13 GMT+0800 (China Standard Time)

How are you computing the F1 score? On a per-class basis? What dataset are you using?

The change to initialization was meant to reflect the initialization scheme used in the original paper. If it's not working for the particular task that you care about, you can totally change the initialization scheme. This implementation is supposed to reflect what's in the original paper.

Additionally, if the model is too slow for your purposes, you can turn the efficient flag off. I don't maintain the checkpointing feature -- this is built-in PyTorch functionality. We switched to using checkpointing because the low-level calls that we used to make were hack-y (and not particularly memory efficient). The checkpointing version is far more memory efficient.

If speed is an issue, you can try playing around with how much checkpointing is used (e.g. only use it on one of the pooling blocks). This code is designed to be boilerplate/starting code that you should customize for your project's specific needs. So feel free to change the initialization and the amount of checkpointing for your particular project needs.

Deserts · Answer 5 · Tue May 08 2018 23:03:18 GMT+0800 (China Standard Time)

I apply DenseNet on 3D medical image segmentation.
F1-score is computed in numpy on per-class. Data set is labeled by myself.
Everything works well with pytorch 0.3.1 and your efficient implementation before. I re-read the paper and found that you use MSRA inialization and now I can confirm that initialization is not the problem.
I check the PyTorch code about checkpointing. It just drop the intermediate results and re-compute them when back-prop. That's a really simple method and there's no shared memory. I care the test speed instead of training. I think that the shared memory is useful to speed up the forward pass.
I have transfer the parameters from 0.3.1 version to lastest and the network works bad. All parameters are same as before but the result is different. I will try to locate the problem.

Thanks.