ai deeplab-v3-plus image image-inpainting image-processing image-segmentation lama mat

2023-AI-Final-Project

Enhanced Image Segmentation with Iterative Image Inpainting

2023-AI-Final-Project
- Enhanced Image Segmentation with Iterative Image Inpainting

Installation

Clone MAT.
pip install -r requirements.txt for MAT.
Clone this repo.
Copy the files of this repo to the directory of MAT.
Run python main.py

Intorduction

Image segmentation involves dividing an image into multiple regions or segments based on certain characteristics such as color, texture, or intensity. The purpose of segmentation is to simplify the representation of an image, making it easier to analyze and understand. It is a fundamental step in various computer vision tasks, including object detection, tracking, and recognition.

Image inpainting, on the other hand, is a task of reconstructing missing regions in an image.

Both segmentation and inpainting are important tasks in the field of computer vision. However, for image segmentation, if an object is partially covered by other objects, it becomes challenging to achieve accurate segmentation. Therefore, after an image is segmented, we utilize the given mask to perform inpainting, and continue to segment it with the inpainted image. After inpainting, previously hidden objects are likely to be successfully segmented after a few iterations. We use two models for inpainting and compare their performances. (MAT and LaMa)

Related Works

Rethinking Atrous Convolution for Semantic Image Segmentation
- DeepLabV3+ (Github)
- for image segmentation
MAT: Mask-Aware Transformer for Large Hole Image Inpainting
- MAT (Github)
- for image inpainting
LaMa: Resolution-robust Large Mask Inpainting with Fourier Convolutions
- LaMa (Github)
- another image inpainting model
Auto-Lama
- combines object detection and image inpainting to automate object removal

Dataset

The PASCAL Visual Object Classes
- VOC 2012

We utilize the training/validation data (2GB) of VOC 2012, which contains more than 10,000 images and their labeling. The images in VOC2012 are typically smaller than 512*512. (We only use a small portion of this dataset, for testing.)

We also use some images taken on our own.

Baseline

Our baseline is based on a pretrained model: deeplabv3_resnet101, provided by pytorch. The baseline model has been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset.

A Residual Neural Network (ResNet) is a deep learning model in which the weight layers learn residual functions with reference to the layer inputs.

Main Approach

The main goal of our algorithm is to produce a more-accurate image segmentation result; the most important subroutines are: ordinary image segmentation and image inpainting. We iteratively call the subroutines several times to obtain the final result.

The input of our algorithm is an image, and it outputs a mask, which is the result of segmentation.

Below is the pseudo-code of our algorithm.

procedure SegWithInpaint (
  img: input image, 
  Seg: segmentation subroutine, 
  Inp: inpaintingsubroutine, 
  Iter: number of iterations
)
1. M <- img
2. Initialize the BaseMask.
3. for iter in range(Iter); do
4.     mask <- Seg(M) # a mask is returned from Seg().
5.     M’ <- Inp(M, mask) # an inpainted image is returned from Inp().
6.     M <- M’
7.     BaseMask <- BaseMask & mask # combine masks
8. done
9. return BaseMask
end procedure

Evaluation metric

Qualitative:

We directly compare the segmentation results of DeepLabV3+ and that of our algorithm, through human’s eyes test: Check which one has detected more objects, accurately.

Quantitative:

We intend to use mIOU: Mean Intersection over Union as our quantitative metric, which is common in image segmentation literature. However, since our algorithm is not efficient enough, we fail to perform evalutation before the deadline of this project.