FasterRCNN Manga109 Object-Detection

This is my project for Deep Learning and Generative Models course at @UniPr.
The purpose of this deep learning project is to conduct an object detection task using models like FasterRCNN. The models were trained on the Manga109 dataset, a dataset compiled by the Aizawa Yamasaki Matsui Laboratory, University of Tokyo. Manga109 is composed of 109 manga volumes drawn by professional mangaka in Japan. The project consists of the following Python modules:

object-detection-main.py: This module is responsible for launching the simulation.
model.py: This module is responsibile for creating the model. The supported models are: FasterRCNN, RetinaNet, SSD300. In particular, you can modify the FasterRCNN template to set some custom parameters and possibly add author classification.
custom_roi_heads.py: This module implements a custom RoIHeads for author classification, a custom fasterrcnn loss with author classification and a custom FastRCNNPredictor for custom classes.
datasetManga109.py: This module implements the CustomDataset used for training and validating the model.
solver.py: This module includes methods for training, validation, with or without the author classification. It also provides functionality for saving and loading the model and visualizing model weights.
manga109api_custom.py: This module is an extension of manga109api from https://github.com/manga109/manga109api/tree/main/manga109api. The parser has been extended to support adding author information to annotations.
metrics.py: This module is responsible for the calculation of evaluation metrics, in particular for mAP (mean Average Precision) computation.
utils.py: This module contains various function for different purposes, such as checking for annotations in images, the early stopping implementation and uploading image information.
inference.py: This module is responsible for making inference on given images.

Author classification

In addition to the two original branches (classification and regression), a new branch has been added to classify authors. The new classifier is similar to the original classifier, with the only difference that it will have to recognize a number of classes equal to the number of authors.

The parameters that can be provided through the command line and allow customization of the execution are:

Argument	Description
model	The name of the model (e.g. FasterRCNN)
bb	The name of the backbon for the FasterRCNN model (e.g. resnet50v2, resnet50, mobilenet)
pretrained	Determines whether to use a pre-trained model or not
fn	The name of the model to be saved or loaded
add_auth	Enabling the author classification
num_epochs	The total number of training epochs
min_ep	The minimum number of training epochs before enabling early stopping
bs	The learning rate for optimization
lr	The number of workers in the data loader
print_every	The frequency of printing losses during training and validation
seed	The random seed used to ensure reproducibility
opt	The optimizer used for training (SGD or Adam)
early_stopping	The threshold for early stopping during training (0 = disabled)
mode	Determines the mode of the execution (0 = training, 1 = resume training, 2 = inference)
split	The value used to split the dataset into train and validation subsets
dataset	The path to retrieve the dataset
checkpoint_path	The path to save and retrieve the trained model
inference_path	The path that contains the images for inference
dataset_transform	Determines if transformations (HorizontalFlip etc.) are applied to the images
res	Resize dimensions of the input images for preprocessing
det_thresh	Value of detection treshold for inference and metrics computation
body	Include "body" class
face	Include "face" class
text	Include "text" class
frame	Include "frame" class
size32	Include size 32 for anchors
size64	Include size 64 for anchors
size128	Include size 128 for anchors
size256	Include size 256 for anchors
size512	Include size 512 for anchors
ar05	Include aspect ratio 1:2 for anchors
ar1	Include aspect ratio 1:1 for anchors
ar2	Include aspect ratio 2:1 for anchors
rpn_nms_th	NMS threshold used for postprocessing the RPN proposals
rpn_fg_th	Minimum IoU between the anchor and the GT box so that they can be considered as positive during RPN training
rpn_bg_th	Maximum IoU between the anchor and the GT box so that they can be considered as negative during RPN training
rpn_score_th	During inference, only return proposals with a classification score greater than rpn_score_th
box_nms_th	NMS threshold used for postprocessing the RPN proposals
box_fg_th	Minimum IoU between the proposal and the GT box so that they can be considered as positive during the classification head training
box_bg_th	Maximum IoU between the proposal and the GT box so that they can be considered as negative during the classification head training
box_score_th	During inference, only return proposals with a classification score greater than box_score_th
box_detections	Maximum number of detections per image, for all classes
map_authors	Calculate mAP for author classification (available only if the author classification is enabled)
save_pred	1 if you want to save the first prediction of val loader on tensorboard, 0 otherwise

Prerequisites

Python 3.5 or later installed on your system.
The following modules:
- os
- json
- torch
- numpy
- tqdm
- matplotlib
- torchvision
- cv2
- PIL
- random
- argparse
- torch.utils.tensorboard
- torchmetrics
- pandas

Usage

Example of script launch:

python object-detection-main.py -model=fasterrcnn -bb=resnet50v2 -min_ep=0 -early_stopping=1 -num_epochs=10 -lr=0.0001 -opt=SGD -add_auth=1 -bs=4 -res=512 -size32=0 -size64=0 -ar05=0 -ar2=0 -frame=0

Example of inference:

python object-detection-main.py -mode=2 --file_name="model.pt" -det_thresh=0.50

FabrizioDeSantis / Object-Detection-Manga109

FasterRCNN Manga109 Object-Detection

Author classification

The parameters that can be provided through the command line and allow customization of the execution are:

Prerequisites

Usage

About

Languages