There are 3 repositories under mscoco topic.
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Object detection with multi-level representations generated from deep high-resolution representation learning (HRNetV2h). This is an official implementation for our TPAMI paper "Deep High-Resolution Representation Learning for Visual Recognition". https://arxiv.org/abs/1908.07919
VarifocalNet: An IoU-aware Dense Object Detector
The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond"
SWA Object Detection
Video Platform for Action Recognition and Object Detection in Pytorch
Official ImageNet Model repository
[ECCV 2020] Boundary-preserving Mask R-CNN
Semantic Propositional Image Caption Evaluation
High-resolution Networks for the Fully Convolutional One-Stage Object Detection (FCOS) algorithm
generate captions for images using a CNN-RNN model that is trained on the Microsoft Common Objects in COntext (MS COCO) dataset
A tensorflow implement mobilenetv3 centernet, which can be easily deployeed on android(MNN) and ios(CoreML).
A tool for converting computer vision label formats.
Adds SPICE metric to coco-caption evaluation server codes
Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.
Implementation of models in our EMNLP 2019 paper: A Logic-Driven Framework for Consistency of Neural Models
We aim to generate realistic images from text descriptions using GAN architecture. The network that we have designed is used for image generation for two datasets: MSCOCO and CUBS.
Clone of COCO API - Dataset @ http://cocodataset.org/ - with changes to support Windows build and python3
Official implementation of "Max Pooling with Vision Transformers reconciles class and shape in weakly supervised semantic segmentation"
A Keras implementation of DeepMask based on NIPS 2015 paper "Learning to Segment Object Candidates"
A demo for mapping class labels from ImageNet to COCO.
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
The Jakarnotator is an annotation tool to create your own database for instance segmentation problem.
MS COCO captions in Arabic
LabelMe to MsCOCO, PascalVOC, Yolo
Encoder-Decoder CNN-LSTM Model with an attention mechanism for image captioning. Trained using the Microsoft COCO Dataset.
PyTorch implementation of SSD: Single Shot MultiBox Detector.
COCOA: Semantic Amodal Segmentation for huggingface datasets