deep-spin / entmax

The entmax mapping and its loss, a family of sparse softmax alternatives.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Alpha value less than one?

kayuksel opened this issue · comments

Can alpha value be less than one?

I basically need it to be sum-normalized sigmoids in that case

(e.g. rather than softmax that is the case where alpha = 1.0).

It might be possible mathematically, but as far as I know our bisection algorithm only works for alpha>1.

Could you explain what you mean about sum-normalized sigmoids? I'm not sure what the connection is to entmax.

By sum-normalized sigmoids, I mean just taking the sigmoids of logits and then dividing them to their sum. Here (On Controllable Sparse Alternatives to Softmax) I believe that it is referred as sum normalization.
https://papers.nips.cc/paper/2018/file/6a4d5952d4c018a1c1af9fa590a10dda-Paper.pdf

I basically want to learn the sparsity and temperature parameter of the sparsemax at the same time. I thought entmax if it was possible to use alpha < 1.0 would be equivalent to that. I have tried but got NaNs

Hi @kayuksel , it is possible to use alpha instead of a temperature parameter to control the propensity for sparsity of entmax, and gradients with respect to alpha are supported, hence alpha can be learned (this was done here: https://arxiv.org/pdf/1909.00015.pdf). However I believe the current code only supports alpha >= 1 (I believe it should not be very hard to extend bisection for alpha < 1 though, but this won't be a sparse transformation). Is alpha < 1 crucial in your problem? I didn't quite get the connection with sum-normalized sigmoids, is the idea to consider sum-normalized "entmoids" with alpha < 1?

@andre-martins Since that I am able to use entmax to learn the alpha parameter, I would like to use it for learning both the optimal sparsity and temperature at the same time with a single alpha parameter. Yes, it would make a great addition to what I am working on (financial portfolio optimization). I would be more than glad if you can extend it for alpha < 1.0 in the future on your convenience.

Hi @kayuksel I made a pull request (#22) that I think solves this problem - it should work with alpha < 1.0, and it's passing the tests. It would be great if you could try it and let us know if it worked.

It's merged on master now.

Hello @andre-martins & @bpopeters , sorry for late response due to extreme congestion, I will surely let you know about it.