Alpha value less than one?

Question

Alpha value less than one?

kayuksel opened this issue 4 years ago · comments

Kamer Ali Yuksel commented 4 years ago

Can alpha value be less than one?

I basically need it to be sum-normalized sigmoids in that case

(e.g. rather than softmax that is the case where alpha = 1.0).

Ben Peters · Answer 1 · Wed Jan 20 2021 19:20:00 GMT+0800 (China Standard Time)

It might be possible mathematically, but as far as I know our bisection algorithm only works for alpha>1.

Ben Peters · Answer 2 · Wed Jan 20 2021 19:24:50 GMT+0800 (China Standard Time)

Could you explain what you mean about sum-normalized sigmoids? I'm not sure what the connection is to entmax.

Kamer Ali Yuksel · Answer 3 · Sat Jan 23 2021 08:01:25 GMT+0800 (China Standard Time)

By sum-normalized sigmoids, I mean just taking the sigmoids of logits and then dividing them to their sum. Here (On Controllable Sparse Alternatives to Softmax) I believe that it is referred as sum normalization.
https://papers.nips.cc/paper/2018/file/6a4d5952d4c018a1c1af9fa590a10dda-Paper.pdf

I basically want to learn the sparsity and temperature parameter of the sparsemax at the same time. I thought entmax if it was possible to use alpha < 1.0 would be equivalent to that. I have tried but got NaNs

Andre Martins · Answer 4 · Sat Jan 23 2021 20:08:11 GMT+0800 (China Standard Time)

Hi @kayuksel , it is possible to use alpha instead of a temperature parameter to control the propensity for sparsity of entmax, and gradients with respect to alpha are supported, hence alpha can be learned (this was done here: https://arxiv.org/pdf/1909.00015.pdf). However I believe the current code only supports alpha >= 1 (I believe it should not be very hard to extend bisection for alpha < 1 though, but this won't be a sparse transformation). Is alpha < 1 crucial in your problem? I didn't quite get the connection with sum-normalized sigmoids, is the idea to consider sum-normalized "entmoids" with alpha < 1?

Kamer Ali Yuksel · Answer 5 · Mon Jan 25 2021 04:00:09 GMT+0800 (China Standard Time)

@andre-martins Since that I am able to use entmax to learn the alpha parameter, I would like to use it for learning both the optimal sparsity and temperature at the same time with a single alpha parameter. Yes, it would make a great addition to what I am working on (financial portfolio optimization). I would be more than glad if you can extend it for alpha < 1.0 in the future on your convenience.

Andre Martins · Answer 6 · Mon Jan 25 2021 04:27:06 GMT+0800 (China Standard Time)

Hi @kayuksel I made a pull request (#22) that I think solves this problem - it should work with alpha < 1.0, and it's passing the tests. It would be great if you could try it and let us know if it worked.

Ben Peters · Answer 7 · Wed Jan 27 2021 20:06:23 GMT+0800 (China Standard Time)

It's merged on master now.

Kamer Ali Yuksel · Answer 8 · Sat Feb 13 2021 21:12:25 GMT+0800 (China Standard Time)

Hello @andre-martins & @bpopeters , sorry for late response due to extreme congestion, I will surely let you know about it.