megvii-research / Iter-E2EDET

Official implementation of the paper "Progressive End-to-End Object Detection in Crowded Scenes"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to train the model on CityPersons dataset

sungjune-p opened this issue · comments

Hello, authors!
First, thanks to your incredible work !!
However, I have some questions regarding the training on CityPersons dataset with Sparse R-CNN.

As mentioned in the paper, when you build the model, you first pre-train all the models on CrowdHuman dataset first, and then fine-tune them on CityPersons.

Here, several questions arise to me.
First, why are you pre-train the models on CrowdHuman? Does the sole training on CityPersons result in poor performance?
Second, if so, how many iterations are required for each pre-training on CrowdHuman and training on CityPersons, respectively?
Third, are the other parameters, such as learning rate, same with the description in the paper?
Lastly, is the same training schedule also applied to build the base Sparse R-CNN and achieve the performance reported in Table 3(c) from your paper (10.0 MR)?

Thanks!

commented

Pre-trained on CrowdHuman and then fine-tuned on CityScape can achieve better performance. Thus we pre-train all models on the CrowdHuman dataset following the original training schedule of each model at first. The training receipt on the CityScapes also follows that to train a model on the CrowdHuman. Training Sparse R-CNN on the CityScape dataset also follows the same training schedule.