Model | Original Paper | ChainerCV | TorchCV* |
---|---|---|---|
SSD300@voc07_test | 74.3% | 77.8% | 76.68% |
SSD512@voc07_test | 76.8% | 79.2% | 78.89% |
FPNSSD512@voc07_test | - | - | 81.46% |
The accuracy of TorchCV SSD is ~1% lower than ChainerCV. This is because the VGG base model I use performs slightly worse.
I did the experiment by replacing pytorch/vision VGG16 model with the model used in ChainerCV, the SSD512 model got 79.85% accuracy.
FPNSSD512 is created by replacing SSD VGG16 network with FPN50, the rest is the same. It beats all SSD models.
You can download the trained params here.
[2018-2-6] Our FPNSSD512 model achieved the 1st place on the PASCAL VOC 2012 dataset.
Check the leaderboard.
[2018-2-26] As issue(#11) mentioned I shouldn't use VOC07 data for training. I submit another result that is only trained on VOC12 data. The older submission is already marked to private.
[2018-3-29] As Alibaba Turing Lab submit a result of 74.8% MAP, which takes the first place on Comp3, I decided to train a deeper model (replace FPN50 with FPN152, trained only with VOC12 data).
It got MAP of 77%, which is far more higher than I expected.
Check the new leaderboard. The older submission is marked to private.
- SSD300
- SSD512
- FPNSSD512
- RetinaNet
- Faster R-CNN
- Mask R-CNN