FabrizioDeSantis / Object-Detection-Manga109

Object detection on Manga109 dataset using FasterRCNN, RetinaNet and SSD300

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FasterRCNN Manga109 Object-Detection

This is my project for Deep Learning and Generative Models course at @UniPr.
The purpose of this deep learning project is to conduct an object detection task using models like FasterRCNN. The models were trained on the Manga109 dataset, a dataset compiled by the Aizawa Yamasaki Matsui Laboratory, University of Tokyo. Manga109 is composed of 109 manga volumes drawn by professional mangaka in Japan. The project consists of the following Python modules:

  • object-detection-main.py: This module is responsible for launching the simulation.
  • model.py: This module is responsibile for creating the model. The supported models are: FasterRCNN, RetinaNet, SSD300. In particular, you can modify the FasterRCNN template to set some custom parameters and possibly add author classification.
  • custom_roi_heads.py: This module implements a custom RoIHeads for author classification, a custom fasterrcnn loss with author classification and a custom FastRCNNPredictor for custom classes.
  • datasetManga109.py: This module implements the CustomDataset used for training and validating the model.
  • solver.py: This module includes methods for training, validation, with or without the author classification. It also provides functionality for saving and loading the model and visualizing model weights.
  • manga109api_custom.py: This module is an extension of manga109api from https://github.com/manga109/manga109api/tree/main/manga109api. The parser has been extended to support adding author information to annotations.
  • metrics.py: This module is responsible for the calculation of evaluation metrics, in particular for mAP (mean Average Precision) computation.
  • utils.py: This module contains various function for different purposes, such as checking for annotations in images, the early stopping implementation and uploading image information.
  • inference.py: This module is responsible for making inference on given images.

Author classification

In addition to the two original branches (classification and regression), a new branch has been added to classify authors. The new classifier is similar to the original classifier, with the only difference that it will have to recognize a number of classes equal to the number of authors. Structure

The parameters that can be provided through the command line and allow customization of the execution are:

Argument Description
model The name of the model (e.g. FasterRCNN)
bb The name of the backbon for the FasterRCNN model (e.g. resnet50v2, resnet50, mobilenet)
pretrained Determines whether to use a pre-trained model or not
fn The name of the model to be saved or loaded
add_auth Enabling the author classification
num_epochs The total number of training epochs
min_ep The minimum number of training epochs before enabling early stopping
bs The learning rate for optimization
lr The number of workers in the data loader
print_every The frequency of printing losses during training and validation
seed The random seed used to ensure reproducibility
opt The optimizer used for training (SGD or Adam)
early_stopping The threshold for early stopping during training (0 = disabled)
mode Determines the mode of the execution (0 = training, 1 = resume training, 2 = inference)
split The value used to split the dataset into train and validation subsets
dataset The path to retrieve the dataset
checkpoint_path The path to save and retrieve the trained model
inference_path The path that contains the images for inference
dataset_transform Determines if transformations (HorizontalFlip etc.) are applied to the images
res Resize dimensions of the input images for preprocessing
det_thresh Value of detection treshold for inference and metrics computation
body Include "body" class
face Include "face" class
text Include "text" class
frame Include "frame" class
size32 Include size 32 for anchors
size64 Include size 64 for anchors
size128 Include size 128 for anchors
size256 Include size 256 for anchors
size512 Include size 512 for anchors
ar05 Include aspect ratio 1:2 for anchors
ar1 Include aspect ratio 1:1 for anchors
ar2 Include aspect ratio 2:1 for anchors
rpn_nms_th NMS threshold used for postprocessing the RPN proposals
rpn_fg_th Minimum IoU between the anchor and the GT box so that they can be considered as positive during RPN training
rpn_bg_th Maximum IoU between the anchor and the GT box so that they can be considered as negative during RPN training
rpn_score_th During inference, only return proposals with a classification score greater than rpn_score_th
box_nms_th NMS threshold used for postprocessing the RPN proposals
box_fg_th Minimum IoU between the proposal and the GT box so that they can be considered as positive during the classification head training
box_bg_th Maximum IoU between the proposal and the GT box so that they can be considered as negative during the classification head training
box_score_th During inference, only return proposals with a classification score greater than box_score_th
box_detections Maximum number of detections per image, for all classes
map_authors Calculate mAP for author classification (available only if the author classification is enabled)
save_pred 1 if you want to save the first prediction of val loader on tensorboard, 0 otherwise

Prerequisites

Usage

Example of script launch:

python object-detection-main.py -model=fasterrcnn -bb=resnet50v2 -min_ep=0 -early_stopping=1 -num_epochs=10 -lr=0.0001 -opt=SGD -add_auth=1 -bs=4 -res=512 -size32=0 -size64=0 -ar05=0 -ar2=0 -frame=0

Example of inference:

python object-detection-main.py -mode=2 --file_name="model.pt" -det_thresh=0.50

About

Object detection on Manga109 dataset using FasterRCNN, RetinaNet and SSD300

License:MIT License


Languages

Language:Python 100.0%