Unclear connection of LassoNet and SPINN

Question

Unclear connection of LassoNet and SPINN

andreimargeloiu opened this issue 2 years ago · comments

The LassoNet paper mentiones that LassoNet generalises a method, though it's unclear how/when this is the case.

In Section 1.2 Related work, the paper says "Recently, Feng and Simon (2017) proposed an input-sparse neural network, where the input weights are penalized using the group Lasso penalty. As will become evident in Section 3, our proposed method extends and generalizes this approach in a natural way."

The Feng and Simon (2017) add a sparse group Lasso on the first layer (see figure below), which is a convex combination of a Lasso and a group Lasso.

How/When does LassoNet generalize the method of Feng and Simon (2017)? Looking in Section 3, I see that LassoNet is equivalent to a standard Lasso (when M=0) and an unregularized feed-forward neural network (when M → +∞); though the connection to the method of Feng and Simon (2017) isn't mentioned.

Louis Abraham · Answer 1 · Sun Oct 23 2022 01:08:46 GMT+0800 (China Standard Time)

For large M our method is pretty close, with just an additional skip connection!

Louis Abraham · Answer 2 · Sun Oct 23 2022 01:09:38 GMT+0800 (China Standard Time)

Cc'ing @ilemhadri

Andrei Margeloiu · Answer 3 · Sun Oct 23 2022 02:12:32 GMT+0800 (China Standard Time)

How is LassoNet large M pretty to Feng and Simon (2017)? Can you please explain?

Feng and Simon (2017) put a Lasso and Group Lasso constraints of the first layer, but LassoNet with large M doesn't put any Lasso constraints on the MLP. As I understand, for large M LassoNet does:

put L1 constraint on the skip connection
doesn't put any constraint on the linear layer of the MLP because $\left|W_j^{(1)}\right|_{\infty} \leq M\left|\theta_j\right|$

Louis Abraham · Answer 4 · Sun Oct 23 2022 02:16:15 GMT+0800 (China Standard Time)

Let's say intermediate M then.
The point is that the first layer does have a L1 penalty and is not too constrained by linear signals.
Lassonet is not more general than the other paper though.

Andrei Margeloiu · Answer 5 · Sun Oct 23 2022 02:20:21 GMT+0800 (China Standard Time)

Are you referring to the L1 penalty on the skip connection? I don't see how the first layer of the MLP has an L1 penalty.

Louis Abraham · Answer 6 · Sun Oct 23 2022 03:10:25 GMT+0800 (China Standard Time)

Thus $W^{(0)}$ (first layer) has a constraint with a L1 penalty.

Andrei Margeloiu · Answer 7 · Sun Oct 23 2022 03:35:53 GMT+0800 (China Standard Time)

Is the corresponding L1 penalty $||W_j^{(0)}||_1 \leq K \cdot M \left|\theta_j\right|, j=1, \ldots, d$ where $K$ is the number of neurons in the first layer?

Explanation: $L_1 of ||W_j^{(0)}|| \leq K ||W_j^{(0)}||_{\infty} \leq K \cdot M \left|\theta_j\right|$ ?

Louis Abraham · Answer 8 · Sun Oct 23 2022 03:53:23 GMT+0800 (China Standard Time)

The absolute value of any coordinate of $W_j^{(0)}$ will be less than the infinite norm, yes.

Andrei Margeloiu · Answer 9 · Sun Oct 23 2022 03:59:26 GMT+0800 (China Standard Time)

Thank you very much for all the back-and-forth explanation!