wjf5203 / SeqFormer

SeqFormer: Sequential Transformer for Video Instance Segmentation (ECCV 2022 Oral)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

An error occurs when training SeqFormer on YouTube-VIS 2019 and COCO 2017 jointly

liangzhiyuanCV opened this issue · comments

Hi Junfeng,
Thanks for your excellent work! I meet a problem when I train the SeqFormer on YouTube-VIS 2019 and COCO 2017 jointly. Here is the error information.
Traceback (most recent call last):
File "main.py", line 331, in
main(args)
File "main.py", line 278, in main
model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm)
File "/data/liangzhiyuan/projects/SeqFormer/engine.py", line 48, in train_one_epoch
outputs, loss_dict = model(samples, targets, criterion, train=True)
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/liangzhiyuan/projects/SeqFormer/models/segmentation.py", line 166, in forward
indices = criterion.matcher(outputs_layer, gt_targets, self.detr.num_frames, valid_ratios)
File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/liangzhiyuan/projects/SeqFormer/models/matcher.py", line 113, in forward
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
File "/data/liangzhiyuan/projects/SeqFormer/models/matcher.py", line 113, in
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
File "/usr/local/lib/python3.6/dist-packages/scipy/optimize/_lsap.py", line 93, in linear_sum_assignment
raise ValueError("matrix contains invalid numeric entries")
ValueError: matrix contains invalid numeric entries

    It seems that some values of C are nan or inf. Do you meet this problem during training? BTW, the training process using just the YouTube-VIS 2019 dataset works well in my setting.

Hi~ Thanks for your attention.

I found this problem a few days ago and it has been resolved. I have updated the coco2seq.py file, you can try the latest version.