valeoai / obow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`mu_min_dist` got float(-inf) value

hao-pt opened this issue · comments

First of all, I'd like to thank you for your great work. I have adopted your BoW implementation to my method and mu_min_dist in BoWExtractor class got inf value after few epochs in the early stage though I remain most of your BoW codes in my implementation. What is the problem with this?
Here I use batch size of 16 to debug.

Epoch 0:   0%|                     | 2/7972 [00:16<12:29:47,  5.64s/it, loss=41.7, v_num=0_21]
features: torch.Size([16, 1024, 14, 14])                                                                
embedding_w: torch.Size([4096, 1024, 1, 1])                                                   
embedding_b: torch.Size([4096])                                                               
dist: torch.Size([16, 4096, 14, 14])                                                          
mu_min_dist: tensor(2409.4023, device='cuda:0') 
selected_feaures: torch.Size([16, 1024])
self.mu_min_dist: tensor([89.2840], device='cuda:0')
inv_delta_adaptive: tensor([0.1680], device='cuda:0')
features: torch.Size([16, 2048, 7, 7])
embedding_w: torch.Size([4096, 2048, 1, 1])
embedding_b: torch.Size([4096])
dist: torch.Size([16, 4096, 7, 7])
mu_min_dist: tensor(2662.2830, device='cuda:0') 
selected_feaures: torch.Size([16, 2048])
self.mu_min_dist: tensor([95.8128], device='cuda:0')
inv_delta_adaptive: tensor([0.1566], device='cuda:0')
bow_loss: tensor(42.8774, device='cuda:0', grad_fn=<SumBackward0>)
Epoch 0:   0%|                      | 3/7972 [00:17<9:30:46,  4.30s/it, loss=42.1, v_num=0_21]
features: torch.Size([16, 1024, 14, 14])
embedding_w: torch.Size([4096, 1024, 1, 1])
embedding_b: torch.Size([4096])
dist: torch.Size([16, 4096, 14, 14])
mu_min_dist: tensor(-inf, device='cuda:0')

Meanwhile, I have saw you drop the boundary of feature maps by margin = 1:

features = features[:, :, 1:-1, 1:-1].contiguous() # drop the boudary, which turns spatial size into [H-2,W-2]

What is the intuition behind this? Might it affect the model performance if we do not alter features?