aim-uofa / AdelaiDet

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

Home Page:https://git.io/AdelaiDet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evaluation during training fails - (multi-gpu/distributed) [BoxInst]

ameyparanjape opened this issue · comments

Many thanks to BoxInst authors for sharing the codebase for training and evaluating using BoxInst.
I am facing an issue while finetuning BoxInst models on my custom data.
Specs: Multi-gpu training on Linux VMs (4 Nvidia Tesla T4 GPUs)
I am using same code provided in this repo with some dataloader manipulations to finetune the COCO checkpoint on my custom data. During training I am using previously Instance segmentation annotated data for validation/testing, but COCO evaluation fails.
When I try to run --eval-only mode on the same data on 1 GPU, I can get the evaluation results. Is there any way/fix to this to be able to perform evaluation during training? Is this problem caused due to distributed evaluation running into stack race?

@ameyparanjape can you share if you succeeded at solving this?