rpautrat / SuperPoint

Efficient neural feature detector and descriptor

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About detecot_loss

foxkw opened this issue · comments

commented

HI. thanks for your great job.

I found that warped_detector_ Loss is difficult to converge when training superpoint . because the image content changes with each epoch, and the original image completely converges. Is there any good way to solve this problem

image

commented

HI. thanks for your great job.

I found that warped_detector_ Loss is difficult to converge when training superpoint . because the image content changes with each epoch, and the original image completely converges. Is there any good way to solve this problem

image

The green point is gt, and the red point is prediction

Hi, ideally, both the source and target image should be warped during training, to improve the diversity of the training images. I think that currently, when training on pairs of images, only the target image is warped. But it should not be too difficult to also warp the source one as well.

commented

Hi, ideally, both the source and target image should be warped during training, to improve the diversity of the training images. I think that currently, when training on pairs of images, only the target image is warped. But it should not be too difficult to also warp the source one as well.

Hi, "I warpeded the source image and the corresponding target image, but when my homography only uses rotation, adding some data augmentation, the loss can converge very well. However, when my homography uses rotation, translation, and scaling, the warp loss cannot converge very well. There are pixel errors in gt and prediction. I tried to debug the homography parameters, but found it difficult to solve this problem. Have you ever encountered this situation, or is it normal ?

As shown in the figure below, most points can be completely corresponding, but some points do not. The green point is a gt, and the red is a prediction. If gt is the same as the prediction, the green point covers the red point

image

Hi, the prediction will never be perfect anyway, and the current one seems pretty good in my opinion. Since the green point can cover the red ones, it is hard to judge if the prediction are detecting most corners. But they are at least very precise, in the sense that all predicted keypoints are located on corners.

commented

Hi, the prediction will never be perfect anyway, and the current one seems pretty good in my opinion. Since the green point can cover the red ones, it is hard to judge if the prediction are detecting most corners. But they are at least very precise, in the sense that all predicted keypoints are located on corners.

Sorry to bother you again. When I train model with a large amount of data, I found that ori_ image detec_ Loss is about 0.05, while warp detect_ Ioss is 0.3 . It seems to be still warped_ img cannot learn well. Is this normal ?

Hi, orientation only and full warp are two different tasks, and the latter is harder. So it makes sense that your warp detect_loss is higher. I don't think there is any issue there, the gap only reflects the fact that training with warp is harder than training with rotation only.

You should also not rely only on the train/val loss to judge the quality of your model, but evaluate it on a proper test set. You could compare the repeatability / localization error of both models on two types of data: with rotation only, and with full warps.

Hi, the idea is the following:

Appending the ones allows to insert a label for the "no keypoint" dustbin in every 8x8 patch. Multiplying by 2 gives more weight to the ground truth labels. So after this operation, labels will contains either 0s (for pixels where there was no keypoint in the ground truth), always a 1 for the dustbin, and 2s where there was a ground truth keypoint. So when we take the argmax in the next line, there are 2 cases: either there was a keypoint in the patch in the ground truth, in which case the argmax will pick the corresponding cell with a 2 in labels, or there was no keypoint (so labels was full of zeros) and the dustbin cell is selected instead.

I hope this is clear.

Thank you so much for the swift response! I am investigating MagicPoint performance as my SuperPoint performance was unsatisactory. This is the output of MagicPoint trained on Synthetic Dataset (120k images)
image

And this is from GFTT (Shi Tomasi)
image

As you can see more interpretable keypoints from GFTT rather than MagicPoint. Clearly the patterns in the giraffe points to polygons which it has already trained. Furthermore, the losses ave reduced drastically making me think that the model has overfitted to synthetic shapes. Do you think this is normal? What suggestions you may have, given your extensive experience working on this?

Hi, yes this is a generalization problem that can be expected for the model trained on synthetic data. What you could do is:

  1. Reduce the threshold to accept more points. Currently you are only displaying the points where the model is the most confident, but with a threshold of 0.001 for example, you might see more points.
  2. If you think the model is overfitting on the synthetic shapes, add more data augmentation in the synthetic training, and do early stopping, i.e. stop training as soon as the validation loss stops decreasing, or even already when it has almost converged.