albertovitto / VCS-Project

Computer Vision project for detection, rectification and identification of paintings in the "Galleria Estense" art gallery in Modena (Italy)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VCS-Project

This project was developed as part of the Vision and Cognitive Systems course as an exam in the Master's Degree in Computer Engineering in Unimore.

Contributors

Name Email
Luca Denti 211805@studenti.unimore.it
Cristian Mercadante 213808@studenti.unimore.it
Alberto Vitto albertovitto@outlook.com

Tasks

Given a dataset of videos taken in "Gallerie Estensi" in Modena together with pictures of its paintings, it was required to implement a software in Python capable of detecting paintings in videos and retrieve the original image from the dataset.

In particular:

  • Painting detection
    • Given an input video, the code should output a list of bounding boxes (x, y, w, h), being (x, y) the upper left corner, each containing one painting.
    • Create an interface to visualize given an image the ROI of a painting.
    • Select painting and discard other artifacts.
    • Optional: segment precisely paintings with frames and also statues.
  • Painting rectification
    • Given an input video and detections (from the previous point), the code should output a new image for each painting, containing the rectified version of the painting.
    • Pay attention to not-squared paintings.
  • Painting retrieval
    • Given one rectified painting (from the previous point), the code should return a ranked list of all the images in the painting DB, sorted by descending similarity with the detected painting. Ideally, the first retrieved item should be the picture of the detected painting.
  • People detection
    • Given an input video, the code should output a list of bounding boxes (x, y, w, h), being (x, y) the upper left corner, each containing one person.
  • People localization
    • Given an input video and people bounding boxes (from the previous point), the code should assign each person to one of the rooms of the Gallery. To do that, exploit the painting retrieval procedure (third point), and the mapping between paintings and rooms (in data.csv). Also, a map of the Gallery is available (map.png) for a better visualization.

Optional tasks:

  • Given an input video, people and paintings' detections, determine whether each person is facing a painting or not.
  • Given a view taken from the 3D model, detect each painting and replace it with its corresponding picture in the paintings DB, appropriately deformed to match the 3D view.
  • Determine the distance of a person to the closest door: find the door, find the walls and the floor, try to compensate and predict distance.

Project structure

.
├── dataset
│   ├── data.csv
│   ├── features_db.npy
│   ├── ground_truth
│   │   ├── 000_0.json
│   │   ├── ...
│   │   └── 014_16.json
│   ├── img_features_db.npy
│   ├── map.png
│   ├── paintings_db
│   │   ├── 000.png
│   │   ├── ...
│   │   └── 094.png
│   ├── test_set
│   │   ├── 000_0.png
│   │   ├── ...
│   │   └── 014_16.png
│   └── videos
│       ├── 000
│       │   ├── VIRB0391.MP4
│       │   └── ...
│       ├── ...
│       │   └── ...
│       └── 014
│           ├── VID_20180529_112517.mp4
│           └── ...
├── venv
├── estensi
│   ├── painting_detection
│   │   ├── constants.py
│   │   ├── detection.py
│   │   ├── evaluation.py
│   │   └── utils.py
│   ├── painting_rectification
│   │   ├── rectification.py
│   │   └── utils.py
│   ├── painting_retrieval
│   │   ├── evaluation.py
│   │   ├── retrieval.py
│   │   └── utils.py
│   ├── people_detection
│   │   ├── cfg
│   │   │   └── yolov3.cfg
│   │   ├── darknet.py
│   │   ├── data
│   │   │   └── coco.names
│   │   ├── detection.py
│   │   ├── preprocess.py
│   │   ├── utils.py
│   │   └── yolov3.weights
│   ├── people_localization
│   │   ├── localization.py
│   │   └── utils.py
│   └── utils.py
├── estensi.py
├── painting_detection_evaluation.py
├── painting_retrieval_evaluation.py
├── README.md
├── requirements.txt
├── torch_cpu_requirements.txt
└── torch_requirements.txt

Instructions

  • Make sure to have installed all requirements (see requirements.txt).
  • Make sure to have installed also the PyTorch requirements, depending from the system (see torch_requirements.txt for CUDA version or torch_cpu_requirements.txt for CPU version).
  • Place the dataset folder at the same level as estensi.py and the estensi package (make sure to have paintings_db, videos, data.csv, and map.png inside as shown in the project structure).
  • Download yolov3.weights and place it into estensi/people_detection.

Arguments

estensi.py --video <path/to/video> --folder <path/to/folder/> --skip_frames <int_number> [--include_steps]

where:

  • --video targets the video to analyze.
  • --folder targets the folder containing different videos to analyze.
  • --skip_frames number of frames to skip during analysis, default is 1 (don't skip any frame).
  • --include_steps tells the script to show useful debug information.

Keys

  • Press R to start the painting retrieval, rectification and localization tasks. You will see the outputs in new windows and more details in the command line. Press any key to resume.
  • Press P to pause the video. Press any key to resume.
  • Press Q to quit the video. If --folder is specified, goes to the next video.

Evaluation

Following videos were used for the evaluation phase:

Folder Video
000 VIRB0393.MP4
001 GOPR5825.MP4
002 20180206_114720.mp4
003 GOPR1929.MP4
004 IMG_3803.MOV
005 GOPR2051.MP4
006 IMG_9629.MOV
007 IMG_7852.MOV
008 VIRB0420.MP4
009 IMG_2659.MOV
010 VID_20180529_112706.mp4
012 IMG_4087.MOV
013 20180529_112417_ok.mp4
014 VID_20180529_113001.mp4

To get the same results as in the report, download this test set.

Painting detection

  • The script will create a test_set folder under the dataset folder, containing the frames captured from the videos listed above.
  • Place the ground_truth folder under the dataset folder.
  • If no argument is passed to the script, the test set will be evaluated with the system hyperparameters configuration. Otherwise, it will be evaluated with the passed configuration.
  • Run:
    painting_detection_evaluation.py [--param <param_grid_file_path>]
    where:
    • --param is the path of a JSON file containing the parameters grid for grid search evaluation.

      Example of JSON file:

      {
          "MIN_ROTATED_BOX_AREA_PERCENT": [0.5, 0.8, 0.9],
          "MIN_ROTATED_ELLIPSE_AREA_PERCENT": [0.4, 0.6],
          "MAX_GRAY_80_PERCENTILE": [170, 200],
          "MIN_VARIANCE": [11, 18],
          "MIN_HULL_AREA_PERCENT_OF_MAX_HULL": [0.08, 0.15],
          "THRESHOLD_BLOCK_SIZE_FACTOR": [50, 80]
      }

      They key values are taken from estensi/painting_detection/constants.py

Painting retrieval

  • The script will create a test_set folder under the dataset folder, containing the frames captured from the videos listed above.
  • Place the ground_truth folder under the dataset folder.
  • Run:
    painting_retrieval_evaluation.py --mode <mode_str> [--rank_scope <scope_int>]
    where:
    • --mode is the mode (either classification or retrieval) in which the evaluation is done,
    • --rank_scope is the scope of the ranking list where a relevant item can be found. Default value is 5. It will be ignored in classification mode.

Disclaimer

This work has only been tested with PyCharm 2020.1.2 (Professional Edition) as IDE and Windows 10 as OS.

About

Computer Vision project for detection, rectification and identification of paintings in the "Galleria Estense" art gallery in Modena (Italy)


Languages

Language:Python 62.4%Language:TeX 37.6%