lasso-net / lassonet

Feature selection in neural networks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unclear connection of LassoNet and SPINN

andreimargeloiu opened this issue · comments

The LassoNet paper mentiones that LassoNet generalises a method, though it's unclear how/when this is the case.

In Section 1.2 Related work, the paper says "Recently, Feng and Simon (2017) proposed an input-sparse neural network, where the input weights are penalized using the group Lasso penalty. As will become evident in Section 3, our proposed method extends and generalizes this approach in a natural way."

The Feng and Simon (2017) add a sparse group Lasso on the first layer (see figure below), which is a convex combination of a Lasso and a group Lasso.

CleanShot 2022-10-22 at 11 17 39

How/When does LassoNet generalize the method of Feng and Simon (2017)? Looking in Section 3, I see that LassoNet is equivalent to a standard Lasso (when M=0) and an unregularized feed-forward neural network (when M → +∞); though the connection to the method of Feng and Simon (2017) isn't mentioned.

For large M our method is pretty close, with just an additional skip connection!

How is LassoNet large M pretty to Feng and Simon (2017)? Can you please explain?

Feng and Simon (2017) put a Lasso and Group Lasso constraints of the first layer, but LassoNet with large M doesn't put any Lasso constraints on the MLP. As I understand, for large M LassoNet does:

  1. put L1 constraint on the skip connection
  2. doesn't put any constraint on the linear layer of the MLP because $\left|W_j^{(1)}\right|_{\infty} \leq M\left|\theta_j\right|$

Let's say intermediate M then.
The point is that the first layer does have a L1 penalty and is not too constrained by linear signals.
Lassonet is not more general than the other paper though.

Are you referring to the L1 penalty on the skip connection? I don't see how the first layer of the MLP has an L1 penalty.

image

Thus $W^{(0)}$ (first layer) has a constraint with a L1 penalty.

Is the corresponding L1 penalty $||W_j^{(0)}||_1 \leq K \cdot M \left|\theta_j\right|, j=1, \ldots, d$ where $K$ is the number of neurons in the first layer?

Explanation: $L_1 of ||W_j^{(0)}|| \leq K ||W_j^{(0)}||_{\infty} \leq K \cdot M \left|\theta_j\right|$ ?

The absolute value of any coordinate of $W_j^{(0)}$ will be less than the infinite norm, yes.

Thank you very much for all the back-and-forth explanation!