tiandunx / loss_function_search

Hello!
Thanks for your work. I found some problems while running your code, I hope you can provide suggestions and answers.

Firstly, you initialize

loss_function_search/lfs_core/utils/loss_agent.py

Line 11 in 0b17d38

self.gaussian_param_loc = torch.nn.Parameter(torch.Tensor([0.0, ] * 10))

to generate 10 u. And then randomly sample 10 a，but actually，only a[0] is used. So for what purpose did you choose to generate 10 such combinations？
Secondly, every time a is sampled and u is updated，you add a relu. But I found that the gradient of self.gaussian_loc_param_cuda[0] (u) is always 0 at this time, and the value is always 0 without updating. When I remove relu, he can update the gradient normally. So，why there is a relu?

loss_function_search/lfs_core/utils/loss_agent.py

Line 26 in 0b17d38

m = [torch.distributions.normal.Normal(self.relu(self.gaussian_param_loc[0]), self.gaussian_scale[0])]
Lastly, I tried to add an auxiliary loss to this work, but the order of magnitude between them is different, so I multiplied this loss by a very small coefficient to add it to the auxiliary loss. I found that if the coefficient is slightly larger, the model will Will collapse. But if it is relatively small, it will affect the classification performance of the model. Can you give me some advice on this？

Thank you

@mzmzdcr Thanks for your attention to our work.
For your first question, you mention "but actually，only a[0] is used", but actually all entries in a will be used. You may see a will take effect in my_loss.py
For your second question, relu is used to guarantee that it will produce non-negative numbers. The reason behind it is that Eq7 and Eq8. These two equations indicate that 0 < h(a, p) <= 1, as we claimed in our paper, that the key to margin based face recognition is how to reduce probability of ground truth.
For your 3rd question, well . You may take an AM trained model to observe at what degree a should be.

@tiandunx
Thank you very much for your explanation. Forgive me for not responding in time due to a lot of things at the end of my recent semester.
Well, according to your answer, I still have some details that are not very clear, I hope you can explain!!

You mentioned that you use all entries in a. Why? I found here about the use of a:

loss_function_search/lfs_core/utils/loss.py

Lines 16 to 25 in 0b17d38

    
           elif search_type == 'local': 
        
               for i in range(batch_size): 
        
                   for j in range(len(p_bins) - 1): 
        
                       if x[i, lb[i]].item() <= p_bins[j + 1]: 
        
                           if a[j] <= 0: 
        
                               b = 1.0 - a[j] * math.exp(sm / 2) 
        
                           else: 
        
                               b = 1.0 
        
                           new_x[i, lb[i]] = x[i, lb[i]] / (a[j] * math.exp(sm / 2) * x[i, lb[i]] + b) 
        
                           break

According to my understanding, if GT's pred_prob is in a different interval, it will correspond to a different a. Is it to distinguish between hard and easy samples or other purposes? Can you explain the difference between search_type global and local in detail? I didn't find the relevant introduction in your paper for this part.

I understand the role of ReLU you set, but there seems to be a problem because you initialized a like this:

loss_function_search/lfs_core/utils/loss_agent.py

Line 11 in 0b17d38

self.gaussian_param_loc = torch.nn.Parameter(torch.Tensor([0.0, ] * 10))

I think that because a[0] is 0, it always has no gradient and cannot be updated. So why not initialize to a minimum value, such as 0.001? And, Why only add ReLU to a[0]?

Looking forward to your reply, best regards.

@mzmzdcr Sorry for late respond. I'm so busy. The default search type is global. It should be used in conjuction with local settings when epochs reaches 2. The reason behind this intuitive is that at the very beging of the training, the network is randomly initialized, we should first take 2 epochs to train the network. After that, the network is capable of distinguishing faces. From then on you may change the setting to local search.

@tiandunx Thank you for your answer.
Can you leave me your email address? I would like to write an email in Chinese to ask you for some details. Because I may not express enough in English and I am also doing research on the automatic design of loss function recently, so I am very concerned about the details of your project.

@mzmzdcr wangshuo514@sina.com

	elif search_type == 'local':
	for i in range(batch_size):
	for j in range(len(p_bins) - 1):
	if x[i, lb[i]].item() <= p_bins[j + 1]:
	if a[j] <= 0:
	b = 1.0 - a[j] * math.exp(sm / 2)
	else:
	b = 1.0
	new_x[i, lb[i]] = x[i, lb[i]] / (a[j] * math.exp(sm / 2) * x[i, lb[i]] + b)
	break

About the lossAgent settings