Discussion about implementation and model
nlgranger opened this issue · comments
Firstly, thank you for your work, an up-to-date pytorch implementation of RandLA is really nice to have.
This is not a bug report but rather a series of questions I had when I started implementing RandLA before I found this repository.
- The loss weights in Pytorch CrossEntropy are normalized so that they sum to one, I don't think the original code has this normalization, but removing it would generate huge loss values.
- Why use ConvLayers, since they all have 1x1 kernels, Linear layers would work just as well?
- regarding https://github.com/qiqihaer/RandLA-Net-pytorch/blob/a255652b65d8378682479c23299704783a8fe4d9/RandLANet.py#L58 it matches the tensorflow code but not the fig. 7 of the paper where the shortcut originates from before the first res block, right?
For the questions above:
- As the original implementation, I think the author also use weights that sum to one.
- After thinking and browsing web for a long time, I think 1x1 convolution layer is equivalent to fully connected layer. (I haven't thought it before, thanks for your great findings XD~) Thus, I think I can also work well if you substitute it to Linear layers.
- Well ...... I'm not sure whether it contains feature concatenation you mentioned above (?)
Sorry for the delay and thank you for looking into this.
- I mis-interpreted the pytorch doc about the weights normalization, but I believe there is still a discrepancy. I think what the original implementation does is:
loss = F.cross_entropy(predictions, targets, reduction='none')
weights = class_weights[targets]
loss *= weights
return loss.sum() / loss.shape[0]
Whereas yours is equivalent to
loss = F.cross_entropy(predictions, targets, reduction='none')
weights = class_weights[targets]
loss *= weights
return loss.sum() / weights.sum()
Hi,
I'm really sorry for the late reply due to some personal affairs. After reading your latest reply, I understood your concern. Therefore, I looked into the original repository and would like to figure out whether I made a mistake. As I found the original loss function implementation here:
def get_loss(self, logits, labels, pre_cal_weights):
# calculate the weighted cross entropy according to the inverse frequency
class_weights = tf.convert_to_tensor(pre_cal_weights, dtype=tf.float32)
one_hot_labels = tf.one_hot(labels, depth=self.config.num_classes)
weights = tf.reduce_sum(class_weights * one_hot_labels, axis=1)
unweighted_losses = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=one_hot_labels)
weighted_losses = unweighted_losses * weights
output_loss = tf.reduce_mean(weighted_losses)
return output_loss
I think you are correct. It first calculates the unweighted losses (same as F.cross_entropy(predictions, targets, reduction='none')
), then weights them using the pre-compute class weights (same as weights = class_weights[targets], loss *= weights
). However, in the final step, the original implementation apply tf.reduce_mean(weighted_losses)
, while my implementation use another term loss.sum() / weights.sum()
.
Really thanks for your great catching. I will modify and push the program if I am not that busy. However, I think this is not a big issue because loss.shape[0]
and weights.sum()
are just constant values. The former is the same as batch size and the later is a pre-computed value.
Close issue for long inactive.