Densenets supermask

Question

Densenets supermask

bhack opened this issue 4 years ago · comments

bhack commented 4 years ago

Have you never tried to find supermask over densenets?

Mitchell Wortsman · Answer 1 · Sat Jul 04 2020 04:12:50 GMT+0800 (China Standard Time)

This seems like more of a question for

https://github.com/uber-research/deconstructing-lottery-tickets
https://github.com/allenai/hidden-networks
though I don't believe anyone has tried this! Very cool idea.

bhack · Answer 2 · Sat Jul 04 2020 04:19:00 GMT+0800 (China Standard Time)

I was interested in your specific context 😉 and the comments and FAQ section in https://mitchellnw.github.io/blog/2020/supsup/ was poiting to this repo 😸

bhack · Answer 3 · Sat Jul 04 2020 04:23:36 GMT+0800 (China Standard Time)

P.s. I got this vague idea reading the conclusions of https://arxiv.org/abs/2006.12156.

If he is wondering about skip connections why not about dense connections?

Mitchell Wortsman · Answer 4 · Sat Jul 04 2020 04:38:31 GMT+0800 (China Standard Time)

Oops! Sorry about that :)

We tried skip-connections with resnets here which worked well.

I believe dense-connections have not been explored with supermasks and it seems like a really interesting direction!

bhack · Answer 5 · Sat Jul 04 2020 04:47:05 GMT+0800 (China Standard Time)

Yes I know but I meant in the mentioned work the conclusion was more related to their strong claim that subnetworks "only needs a logarithmic factor (in all variables but depth) number of neurons per weight of the target subnetwork".

So the open question was more about the impact of convolutional and batch norm layers, skip-connections, (densenet like connections?)
and LSTMs on the number of required sampled neurons to maintain a good accuracy.

bhack · Answer 6 · Sat Jul 04 2020 04:51:15 GMT+0800 (China Standard Time)

I also meant that this claim could has an interesting impact in your continual learning specific setup.
If you can free-up "more resources" it is useful when you need to expand on new task.

Mitchell Wortsman · Answer 7 · Sat Jul 04 2020 04:59:19 GMT+0800 (China Standard Time)

Thanks, that could definitely help!

bhack · Answer 8 · Sat Jul 04 2020 05:09:09 GMT+0800 (China Standard Time)

If you are interested in this see also Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

Mitchell Wortsman · Answer 9 · Sat Jul 04 2020 05:23:17 GMT+0800 (China Standard Time)

Thank you, we have seen this but haven't taken a close look! Hopefully we can soon it seems awesome

bhack · Answer 10 · Sat Jul 04 2020 05:38:43 GMT+0800 (China Standard Time)

Other then densenets another interesting direction are Transformers. Some early exploring efforts were made in:

https://arxiv.org/abs/2005.00561
https://arxiv.org/abs/2005.03454