-
Learning how to read scientific papers, took a look at Frustratingly Simple Few-Shot Object Detection
- The abstract of scientific papers usually states the simple but obvious
- The intro states what the problem is and what their approach to fixing it will be. In this problem they only modify the last layer, that is the Box Classifier and the Box Regressor.
-
The previous cited paper uses Detectron 2, which implements Faster-RCNN in PyTorch - This is likely what I will be using for this project
-
COCO is a dataset frequently used in the field of object detection
-
Paperswithcode has an excellent database of possible reseach papers with implementations.
Although RetinaNet is proposal free and thus faster as it is just one CNN it lacks the modularity of the proposal based Faster-RCNN (Which is Fast-RCNN with a RPN).
Deep object cosegmentation takes two images and finds the common features, would be potentially efficient for comparing to existing stickers.
-
Few-shot Object Detection via Feature Reweighting Proposal based vs Proposal free: Uses loadable vectors to change the weights, with two Inputs, Meta-feature and LW reweighting
-
Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector No need to fine tune the model to novel classes - Uses way more catergories and few images per category, uses Attention network (Garbage in -> Garbage out). Concept of multirelation
-
Polytechnique X, MetaLearning algorithms Interesting paper with details of implementation
-
Mask R-CNN The reference in Proposal based FSO detection
- Amazing blog post on the implementation of Mask R-CNN
- Implementation of Mask R-CNN Sadly uses tensowflow
Installed Detectron2 in myenv environment, with PyTorch 1.7.1 and TorchVision 0.8.2, CPU version. Would like to connect to SCITAS and use that instead.
D2Go is interesting optimised version of Detectron2 but for mobile phones, gotta check it out.
-
Traffic sign detection Could be useful as similar to stickers
-
How to train detectron2 on a custom dataset
- The blog Where it is explained in great detail
-
Datasets:
- FlikrLogos Have to send email to get dataset โ
- BelgaLogos
-
A mobile first version of Detectron2 which is light weight
-
Install packages from here
-
Run after pulling the git
git clone https://github.com/facebookresearch/detectron2.git
cd demo
python demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input ../../input.jpg --opts MODEL.DEVICE cpu MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
- Had to add MODEL.DEVICE cpu for it to run on CPU
- Had to point to a downloaded image
wget http://images.cocodataset.org/val2017/000000439715.jpg -q -O input.jpg
- Had to install two libraries for OpenCV
pip install opencv
-
How to do Markdown
-
Why and how of Conda environments
-
How to use detectron pretrained models
-
Names of the pretrained
- R50, R101 is MSRA Residual Network
- X101 is ResNeXt
- Use 3x as it is more trained than 1x
Managed to ssh into SCITAS, spent some time understanding how to access CUDA, got it to run!
Downloaded the FlickrLogo dataset Links!
Wrote a simple script that will execute as a job using Slurm, made a venv with these packages added user for it to not be system installed:
pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.7/index.html --user
pip install torch==1.7.1 torchvision==0.8.2 --user
pip install opencv-python --user
pip install -U iopath==0.1.4 --user
To interact with SCITAS and have it on the GPUs
Sinteract -t 00:10:00 -p gpu -q gpu_free -g gpu:1
-
Have to load python after the venv as venv replaces the python version
-
Have to install opencv-python everytime even I'm in the venv?
pip install opencv-python --user
- Had to downgrade iopath for it to work on SCITAS
pip install -U iopath==0.1.4 --user
-
How to use SCITAS again :P
-
How to use scp and send the images back to my local machine
-
How to launch jobs instead of using Sinteract
-
What Python notebooks were and there use using the detectron2 tutorial
Used FlickrLogos32 to learn to custom dataset training.
FlickrLogos32 | 1 Class | 32 Classes |
---|---|---|
L 0.007 | link | link |
L 0.005 | link | link |
L 0.001 | link | link |
Detectron needs to register a list
[dict
], a list of metadata about each image. The dataloader will then augment, batch and give to model.forward()
-
How to correctly open pictures (Had an annoying "\n" that was invisible in print() but not when passing as a path to open the image)
-
How to correctly pass the mask (Have to transform it into RLE which is lightweight binary mask)
-
That I have to load the config of the model with my custom one
-
Too much memory use for the C4 models
-
After making my custom dataset and running it on SCITAS, it took around ~3h to get some sort of result, obtaining 3-4% on random parts of the picture by running it through the detectron2 Visualiser class. I tried using multiple different backbones, from
C4
,DC5
,FPN
and3x
or1x
to see if it would make a difference.Loss was at around
0.2
after ~15 minutes of training for an abysmal result. Results were slightly better for the classes versionWhat solved the problem was changing the learning rate from 0.00025 to 0.02
-
The inner structure of detectron2, python (again)
-
How to use
rsync
-
P a t i e n c e ๐
-
Built a python bot that cuts out people from pictures that are submitted to it, you can try it out here: https://t.me/faststicker_bot
- Hosted on heroku, took ~10h to do. Learned a lot about git, heroku and python dependencies.
-
Tried out the Flicker1Class47 dataset
-
Label stickers, maybe just the box to then attempt to classify them link
-
Look around for the precision and curve, as well as IOU.
Time to classify the stickers using Few Shot Image Classification. There are three pillars in this domain:
-
Prior knowledge about Similarity (Knows how to differitiate well)
-
Prior knowledge about Learning (Knows how to adapt well)
-
Prior knowledge about the data (Augment data to learn)
Detectron2 only needs 10 images per class in the training to start recognizing logos.
I chose to go with similarity implementations, specifically Matching Networks, which are basically KNNs with extra steps.
Lots of random research and looking at random indians on youtube
I decided to go with Oscar's implementation in python
In which he highlights how to implement the Matching Networks in Python using PyTorch
-
Issue following instructions of the git repo, but after downloading the miniImageNet dataset and setting it up all good
-
Had to change the pythonpath for it to have access to the local files config.py
export PYTHONPATH=.
-
After uploading all the files to the SCITAS cluster, I simply created a new environment with venv and installed all the libraries (Was missing Scikit, not sure why)
-
Basically roasting my computer 3 times trying to load the files of the datasets (Unzip)
-
The program that I downloaded uses q queries * k classes across the k classes when I simply want to be able to ask for a single image (Instead of k ones) Because of this I had to rewrite part of the program, and manage to make it run on the SCITAS servers.
-
Running the miniImageNet dataset is a pain because it is much more complex than the omniglot one. (Omniglot takes ~10min to run compared to the 2h of the miniImageNet one)
-
Change core.py so that the NShotTaskSampler takes the first sample instead of k.
-
Change core.py so that the create_nshot_task_label generates a target label of [0]
-
Create tugdual.py which loads the dataset and prepares a n_shot_task with a batch so we can check
-
How to read instructions :P
-
Simple use of requirements.txt
-
Metric Learning: Find encoding space in which classes are grouped together and far apart from one another.
-
Original concept was with Siamese networks (CNN Enconder to get feature embeddings) and then compare using any energy function (Cosine, Euclidean distance). If below a certain threshold then the images belong to the same class.
-
Prototypical networks take it a step further by encoding the n-shots of each class into a prototype, a.k.a. the mean of the encodings. Distance function is used to calculate distance and then a softmax to obtain the probabilities of the query image belonging to a class.
-
-
"Omniglot [16] is a dataset of 1623 handwritten characters collected from 50 alphabets. There are 20examples associated with each character, where each example is drawn by a different human subject.We follow the procedure of Vinyals et al.[29]by resizing the grayscale images to 28ร28 andaugmenting the character classes with rotations in multiples of 90 degrees" - Proto. type
Wrote tugdual.py which is a python script to test with 1 image the Prototypical networks!
Scalable Logo Recognition in Real-World Images Stefan Romberg, Lluis Garcia Pueyo, Rainer Lienhart, Roelof van Zwol ACM International Conference on Multimedia Retrieval 2011 (ICMR11), Trento, April 2011.
Vinyals, Oriol, et al. "Matching networks for one shot learning." arXiv preprint arXiv:1606.04080 (2016).