lasso-net / lassonet

Hi, thanks for sharing the code. I have some issues with the implementation of group lasso, here are my problems.

About Algorithm 4 of the paper: In line 14, the notation indicates the vector theta_j \in R^k, having the same dimension with W_j^{(1)}. That makes me a little confused. K is the size of the first hidden layer, and if it is a multi-class classification problem, theta should be \in R^{d*c}, where c is the number of classes (since theta is a linear classifier), and d is the input feature dimension. So theta_j should be in R^{c} in my opinion. Do I understand it correctly?
About Section 6 of the paper: In the group lasso problem, how does the group L1 norm regularizer construct? Assuming it is a multi-class classification problem, and theta is in R^{d*c}, where d is the number of features, and c is the number of classes. So if we want to choose a sparse subset of features for the linear classifier, the regularization term should be |theta|_ {1} = \ sum_{i=1}^d |\sum_{j=1}^c theta_{i,j}^2|, am I right?
How should I use the API provided in this repo? For example, in function prox (in lassonet/prox.py), I found the theta (variable v in the code) is calculated by:
norm_v = torch.norm(v, p=2, dim=0)
It seems that this has the same formulation as the pseudocode in Line6, Algorithm 4 of the paper. Does this mean that the function prox can solve the feature subset selection problem I described above?
What does the function inplace_group_prox (in lassonet/prox.py) used for? I notice it passes each group of parameters to the prox function. That makes me confused because I think prox is used to give the features sparse weights. And we hope the group lasso can make a group of features share similar weights (for example, features in group 1 all have large weights, and features in group2 all have small weights, etc.). However, if we pass a group of features into the prox, I would expect this function to return sparse weights, which means the features in this single group have sparse weights (some weights are big and some weights are small), and not the features in this group share similar weights (all large or all small). Do I understand this function correctly?
There is no LassoNetAutoEncoder in lassonet folder so [examples](https://github.com/lasso-net/lassonet/tree/master/examples)/mnist_ae.py cannot run correctly.

Thanks for your help in advance!

you are probably right, I forwarded to my co authors
yes
the prox function corresponds to the prox operator. Look model.prox()

lassonet/lassonet/model.py

Line 51 in d8d92f5

def prox(self, *, lambda_, lambda_bar=0, M=1):
You can just use a LassoNetRegressor and fit(X, X)

Thanks for your quick response, but I think my question is not fully addressed, especially for question 3 and 4. I think I have not explained my goal clearly: I want to understand the implementation clearly, so that I can reimplement the algorithm proposed in this paper on my own problem (which is implemented by tensorflow).

So my question is, if I want to employ the proposed algorithm on a feature subset selection problem as I described in question 2, which function (prox or inplace_group_prox) should I refer to? As I stated in question 3:

Q3: I found the theta (variable v in the code) is calculated by:
norm_v = torch.norm(v, p=2, dim=0)
It seems that this has the same formulation as the pseudocode in Line6, Algorithm 4 of the paper. Does this mean that the function prox can solve the feature subset selection problem?

And because the feature subset selection problem can be formulated as a group lasso problem, I also want to know if I understand inplace_group_prox correctly, and when should I use this function:

Q4: What does the function inplace_group_prox used for? I notice it passes each group of parameters to the prox function. That makes me confused because I think prox is used to give the features sparse weights. And we hope the group lasso can make a group of features share similar sparse weights (for example, features in group 1 all have large weights, and features in group2 all have small weights). However, if we pass a group of features into the prox, I would expect this function to return sparse weights, which means the features in this single group have sparse weights (some weights are big and some weights are small), and not the features in this group share similar weights (all large or all small). Do I understand this function correctly?

Thank you for your kind help again.

refer to prox. inplace_group_prox is NOT the group prox described in the paper. It is used to select groups of features and is not documented in the paper.

About the implementation of group lasso