There are 3 repositories under mscoco topic.
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Object detection with multi-level representations generated from deep high-resolution representation learning (HRNetV2h). This is an official implementation for our TPAMI paper "Deep High-Resolution Representation Learning for Visual Recognition". https://arxiv.org/abs/1908.07919
VarifocalNet: An IoU-aware Dense Object Detector
SWA Object Detection
The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond"
Video Platform for Action Recognition and Object Detection in Pytorch
[ECCV 2020] Boundary-preserving Mask R-CNN
Official ImageNet Model repository
Semantic Propositional Image Caption Evaluation
High-resolution Networks for the Fully Convolutional One-Stage Object Detection (FCOS) algorithm
generate captions for images using a CNN-RNN model that is trained on the Microsoft Common Objects in COntext (MS COCO) dataset
A tensorflow implement mobilenetv3 centernet, which can be easily deployeed on android(MNN) and ios(CoreML).
Adds SPICE metric to coco-caption evaluation server codes
A tool for converting computer vision label formats.
Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.
Implementation of models in our EMNLP 2019 paper: A Logic-Driven Framework for Consistency of Neural Models
We aim to generate realistic images from text descriptions using GAN architecture. The network that we have designed is used for image generation for two datasets: MSCOCO and CUBS.
Official implementation of "Max Pooling with Vision Transformers reconciles class and shape in weakly supervised semantic segmentation"
Clone of COCO API - Dataset @ http://cocodataset.org/ - with changes to support Windows build and python3
A Keras implementation of DeepMask based on NIPS 2015 paper "Learning to Segment Object Candidates"
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
A demo for mapping class labels from ImageNet to COCO.
The Jakarnotator is an annotation tool to create your own database for instance segmentation problem.
LabelMe to MsCOCO, PascalVOC, Yolo
MS COCO captions in Arabic
Encoder-Decoder CNN-LSTM Model with an attention mechanism for image captioning. Trained using the Microsoft COCO Dataset.
PyTorch implementation of SSD: Single Shot MultiBox Detector.
Official code for “Cascaded Context Dependency: An Extremely Lightweight Module for Deep Convolutional Neural Networks”