How to compute the ERes2Net model param?

Question

How to compute the ERes2Net model param?

JiJiJiang opened this issue 6 months ago · comments

Hello, I use the same model params as your configs in https://github.com/alibaba-damo-academy/3D-Speaker/blob/6f6ed3189a4d1db040586a518c8e5d80f4fc0665/egs/3dspeaker/sv-eres2net/conf/eres2net.yaml, but I get 9.88M. (Yours is 4.6M)

Here is the way I compute the model params:

I'm wondering where the difference is ?

Hongji Wang · Answer 1 · Fri Jan 19 2024 22:58:42 GMT+0800 (China Standard Time)

Even I set embedding_size=192, I still got 6.61 M.

Chen Yafeng · Answer 2 · Sat Jan 20 2024 11:02:09 GMT+0800 (China Standard Time)

The key difference arises from how we compute the model parameters. Since the classifier isn't used during inference, it's not factored into the statistical calculation of the model parameters.

Hongji Wang · Answer 3 · Mon Jan 22 2024 00:54:07 GMT+0800 (China Standard Time)

Thank you for your answer!
But what part of the ERes2Net model is the classifier you mean? Is it the output linear layer mapping the embedding into the speaker label? However, it is not defined in the model.

Chen Yafeng · Answer 4 · Mon Jan 22 2024 20:12:47 GMT+0800 (China Standard Time)

Yes, the classifier refers to the output linear layer mapping the embedding into the speaker label. Therefore, since these parameters will be discarded during inference, they are not factored into the model parameter calculations.

Hongji Wang · Answer 5 · Tue Jan 23 2024 00:13:08 GMT+0800 (China Standard Time)

Thank you for your answer. I directly initialize the ERes2Net model as defined in ResNet.py, which does not contain the classifier as you mention above. The code lines in the screenshot are directly appended in the end of your ResNet.py and run python ResNet.py. So I think my calculation result should be consistent with yours. What is wrong with my codes?

It would be nice if you can share the codes you calculate the model parameters. Thanks so much!

Chen Yafeng · Answer 6 · Tue Jan 23 2024 14:24:05 GMT+0800 (China Standard Time)

Apologies for my oversight, I overlooked the parameters following the statistical pooling layer. With an embedding size of 192, the model parameters total 6.61M. When the embedding size is 512, the model parameters amount to 9.88M. I'll update this on arXiv paper and GitHub soon. Thank you very much for reminding.