sid113/visionapi

The Cloud Vision API enables developers to integrate state-of-the-art computer vision algorithms in a line of code, without any algorithmic or integration struggle. Look at this Colab Notebook to try it and see how to use it.

Object Detection	Depth

Why this project ?

It can sometimes take several years before a great research breakthrough is translated into real-world impact. Our goal with VisionAPI is to reduce this gap, by selecting the best open-source algorithms as they are released, and make them runnable in a few lines of code and no additional library installation. Therefore, with this API:

Developers can use the best algorithms in their code in the language of their choice, and without any integration hassle
AI researchers can maximize the impact of their findings, and in particular see their algorithm go beyond the "demo" stage.
People can benchmark state-of-the-art computer vision algorithms in the context of THEIR data and THEIR use-case.

Although, this project is still at its infancy, it can be compared to:

The computer vision APIs from Microsoft, Amazon, and Google. However, these APIs do not propose most of the algorithms that are included in this API.
What Hugging Face is doing for Natural Language Processing. HF has actually been a great source of inspiration here for me, since any developper can build state-of-the-art NLP with a few lines of code thanks to their library.

Repo content

This repo contains:

A notebook showing how to use the API in a python environment with only basic libraries (requests, pillow, ...)
An app written with Flutter.

Algorithms available in the API (and credits)

The Cloud Vision API enables developers to integrate state-of-the-art computer vision algorithms in a line of code, without any algorithmic or integration struggle. Below is the list of algorithms currently available in the API:

Object Detection: locates and classifies objects in a given picture.
- Option 1: Based on the paper DETR: End-to-End Object Detection With Transformers
- Option 2: Based on the paper "EfficientDet: Scalable and Efficient Object Detection", ranked #2 as of May 2020 on COCO's Test set.
Panoptic Segmentation: Each pixel is assigned a class label and all object instances are uniquely segmented.
- Based on the paper DETR: End-to-End Object Detection With Transformers
Monocular Depth Estimation: estimates how far each pixel is from the camera
- Based on the paper "From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation", currently state-of-the-art on KITTI and MIT Datasets, and its PyTorch implementation. A video of the algorithm's results can be found here

You think that another algorithm should be included in this API? Please tell me about it at sebderhy@gmail.com.

Limits of the API

Since the server cost is currently on me, I cannot use a powerful server, and therefore, large images may fail.
When you submit an image, the results may take about 20 seconds to arrive.
The algorithms provided here can have variable results quality depending on the context. For more details about each algorithm's performances, please refer to the papers mentioned above.
Keep in mind that this is a side-project and not a finished product yet! Although I do my best to keep everything working and resilient, the results may be disappointing, and the server may fail (apologies if that's the case). In any case, please share your feedback with me (sebderhy@gmail.com), so that I can improve it accordingly.

Contact

Don't hesitate to contact me at sebderhy@gmail.com for any feedback / request / question. Please let me know also if there is also any algorithm that you think I should add to this API.

sid113 / visionapi

Why this project ?

Repo content

Algorithms available in the API (and credits)

Limits of the API

Contact

About

Languages