- This work combines the one-stage detection pipeline, YOLOv2 with the idea of two-branch architecture from Mask R-CNN. Due to the hardware limitation, I only implemented it on a small CNN backbone ( MobileNet) with depthwise separable blocks, though it has the potential to be implemented with deeper network, e.g. ResNet-50 or ResNet-101 with FPN (Feature Pyramid Networks).
- The overall architecture can be visualized like this:
- Training results on Shapes dataset:
- Training results on Rice and Generic Food:
myolo
- the main implementation of Mask-YOLO. model.py is the model instantiation.
example
- including three training examples with inference: Shapes dataset is randomly generated by dataset_shapes.py. Rice and Food are small datasets I hand-annotated by VGG Image Annotator (VIA), and can be downloaded from https://drive.google.com/file/d/1druK4Kgx5AhfchClU2aq5kf7UVoDtkvu/view.
- Mask R-CNN paper: https://arxiv.org/pdf/1703.06870.pdf
- YOLOv2 paper: https://arxiv.org/pdf/1612.08242.pdf
- Kears and TensorFlow implementation of Mask R-CNN: https://github.com/matterport/Mask_RCNN