TugdualKerjan / Sticker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sticker detection using Faster-RCNN ๐ŸŽ‰

Bachelors project at EPFL 2021


Week 1 :

  • Learning how to read scientific papers, took a look at Frustratingly Simple Few-Shot Object Detection

    • The abstract of scientific papers usually states the simple but obvious
    • The intro states what the problem is and what their approach to fixing it will be. In this problem they only modify the last layer, that is the Box Classifier and the Box Regressor.
  • The previous cited paper uses Detectron 2, which implements Faster-RCNN in PyTorch - This is likely what I will be using for this project

  • COCO is a dataset frequently used in the field of object detection

  • Paperswithcode has an excellent database of possible reseach papers with implementations.


Week 2

Although RetinaNet is proposal free and thus faster as it is just one CNN it lacks the modularity of the proposal based Faster-RCNN (Which is Fast-RCNN with a RPN).

Deep object cosegmentation takes two images and finds the common features, would be potentially efficient for comparing to existing stickers.


Week 3

Installed Detectron2 in myenv environment, with PyTorch 1.7.1 and TorchVision 0.8.2, CPU version. Would like to connect to SCITAS and use that instead.

D2Go is interesting optimised version of Detectron2 but for mobile phones, gotta check it out.

How to run detecton2 demo:

  • Install packages from here

  • Run after pulling the git

git clone https://github.com/facebookresearch/detectron2.git
cd demo
python demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input ../../input.jpg --opts MODEL.DEVICE cpu MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

Issues I ran into:

- Had to add MODEL.DEVICE cpu for it to run on CPU
  • Had to point to a downloaded image
wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O input.jpg
  • Had to install two libraries for OpenCV
pip install opencv

What I learned

  • How to do Markdown

  • Why and how of Conda environments

  • How to use detectron pretrained models

  • Names of the pretrained


Week 4

Managed to ssh into SCITAS, spent some time understanding how to access CUDA, got it to run!

Downloaded the FlickrLogo dataset Links!

Wrote a simple script that will execute as a job using Slurm, made a venv with these packages added user for it to not be system installed:

pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.7/index.html --user
pip install torch==1.7.1 torchvision==0.8.2 --user
pip install opencv-python --user
pip install -U iopath==0.1.4 --user

To interact with SCITAS and have it on the GPUs

Sinteract -t 00:10:00 -p gpu -q gpu_free -g gpu:1

Issues I ran into:

  • Have to load python after the venv as venv replaces the python version

  • Have to install opencv-python everytime even I'm in the venv?

pip install opencv-python --user
  • Had to downgrade iopath for it to work on SCITAS
pip install -U iopath==0.1.4 --user

What I learned:

  • How to use SCITAS again :P

  • How to use scp and send the images back to my local machine

  • How to launch jobs instead of using Sinteract

  • What Python notebooks were and there use using the detectron2 tutorial


Week 5

Used FlickrLogos32 to learn to custom dataset training.

FlickrLogos32 1 Class 32 Classes
L 0.007 link link
L 0.005 link link
L 0.001 link link

Detectron needs to register a list[dict], a list of metadata about each image. The dataloader will then augment, batch and give to model.forward()

Issues I ran into:

  • How to correctly open pictures (Had an annoying "\n" that was invisible in print() but not when passing as a path to open the image)

  • How to correctly pass the mask (Have to transform it into RLE which is lightweight binary mask)

  • That I have to load the config of the model with my custom one

  • Too much memory use for the C4 models

  • After making my custom dataset and running it on SCITAS, it took around ~3h to get some sort of result, obtaining 3-4% on random parts of the picture by running it through the detectron2 Visualiser class. I tried using multiple different backbones, from C4, DC5, FPN and 3x or 1x to see if it would make a difference.

    Loss was at around 0.2 after ~15 minutes of training for an abysmal result. Results were slightly better for the classes version

    What solved the problem was changing the learning rate from 0.00025 to 0.02

What I learned:

  • The inner structure of detectron2, python (again)

  • How to use rsync

  • P a t i e n c e ๐ŸŒ 


Week 5 Part 2

  • Built a python bot that cuts out people from pictures that are submitted to it, you can try it out here: https://t.me/faststicker_bot

    • Hosted on heroku, took ~10h to do. Learned a lot about git, heroku and python dependencies.
  • Tried out the Flicker1Class47 dataset


Week 6

  • Label stickers, maybe just the box to then attempt to classify them link

  • Look around for the precision and curve, as well as IOU.


Week 10

Time to classify the stickers using Few Shot Image Classification. There are three pillars in this domain:

  • Prior knowledge about Similarity (Knows how to differitiate well)

  • Prior knowledge about Learning (Knows how to adapt well)

  • Prior knowledge about the data (Augment data to learn)

Detectron2 only needs 10 images per class in the training to start recognizing logos.

I chose to go with similarity implementations, specifically Matching Networks, which are basically KNNs with extra steps.

Lots of random research and looking at random indians on youtube


Week 11

I decided to go with Oscar's implementation in python

In which he highlights how to implement the Matching Networks in Python using PyTorch

Issues I ran into:

  • Issue following instructions of the git repo, but after downloading the miniImageNet dataset and setting it up all good

  • Had to change the pythonpath for it to have access to the local files config.py

export PYTHONPATH=.
  • After uploading all the files to the SCITAS cluster, I simply created a new environment with venv and installed all the libraries (Was missing Scikit, not sure why)

  • Basically roasting my computer 3 times trying to load the files of the datasets (Unzip)

  • The program that I downloaded uses q queries * k classes across the k classes when I simply want to be able to ask for a single image (Instead of k ones) Because of this I had to rewrite part of the program, and manage to make it run on the SCITAS servers.

  • Running the miniImageNet dataset is a pain because it is much more complex than the omniglot one. (Omniglot takes ~10min to run compared to the 2h of the miniImageNet one)

Adding the option to run on one image:

  • Change core.py so that the NShotTaskSampler takes the first sample instead of k.

  • Change core.py so that the create_nshot_task_label generates a target label of [0]

  • Create tugdual.py which loads the dataset and prepares a n_shot_task with a batch so we can check

What I learned:

  • How to read instructions :P

  • Simple use of requirements.txt

  • Metric Learning: Find encoding space in which classes are grouped together and far apart from one another.

    • Original concept was with Siamese networks (CNN Enconder to get feature embeddings) and then compare using any energy function (Cosine, Euclidean distance). If below a certain threshold then the images belong to the same class.

    • Prototypical networks take it a step further by encoding the n-shots of each class into a prototype, a.k.a. the mean of the encodings. Distance function is used to calculate distance and then a softmax to obtain the probabilities of the query image belonging to a class.

  • "Omniglot [16] is a dataset of 1623 handwritten characters collected from 50 alphabets. There are 20examples associated with each character, where each example is drawn by a different human subject.We follow the procedure of Vinyals et al.[29]by resizing the grayscale images to 28ร—28 andaugmenting the character classes with rotations in multiples of 90 degrees" - Proto. type


Week 14

Wrote tugdual.py which is a python script to test with 1 image the Prototypical networks!

Issues I ran into:

What I learned:


Works to cite:

Scalable Logo Recognition in Real-World Images Stefan Romberg, Lluis Garcia Pueyo, Rainer Lienhart, Roelof van Zwol ACM International Conference on Multimedia Retrieval 2011 (ICMR11), Trento, April 2011.

Vinyals, Oriol, et al. "Matching networks for one shot learning." arXiv preprint arXiv:1606.04080 (2016).

About


Languages

Language:Python 96.9%Language:Shell 3.1%