hekj - Giters

Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual perceptions of the annotators

Language:PythonCC-BY-4.010800

pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Language:PythonApache-2.03088200

Discrete-Continuous-VLN

Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

Language:PythonMIT7900

Curriculum-Learning-For-VLN

Code for NeurIPS 2021 paper "Curriculum Learning for Vision-and-Language Navigation"

Language:PythonMIT1500

Awesome-Multimodal-Research

A curated list of Multimodal Related Research.

Language:PythonMIT128400

CVPR2024-Paper-Code-Interpretation

cvpr2024/cvpr2023/cvpr2022/cvpr2021/cvpr2020/cvpr2019/cvpr2018/cvpr2017 论文/代码/解读/直播合集，极市团队整理

1237100

VLN-HAMT

Official implementation of History Aware Multimodal Transformer for Vision-and-Language Navigation (NeurIPS'21).

Language:PythonMIT9500

Transformer-in-Vision

Recent Transformer-based CV and related works.

130700

awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

MIT571300

robo-vln

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Language:PythonMIT6500

ORIST

Know What and Know Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

Language:C++1600

airbert

Codebase for the Airbert paper

Language:PythonMIT4100

awesome-embodied-vision

Reading list for research topics in embodied vision

MIT46100

Recurrent-VLN-BERT

Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation

Language:PythonNOASSERTION14800

Google_Landmark_Retrieval_2021_2nd_Place_Solution

Language:PythonMIT23500

awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

MIT98900

awesome-embodied-vision

Reading list for research topics in embodied vision

MIT100

NvEM

[ACM MM 2021 Oral] Official repo of "Neighbor-view Enhanced Model for Vision and Language Navigation"

Language:C++MIT7700

MUST-GAN

Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

Language:Python7500

nlg-eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

Language:PythonNOASSERTION132200

awesome-vision-language-pretraining-papers

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

113300

bert

TensorFlow code and pre-trained models for BERT

Language:PythonApache-2.03754000

tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Language:PythonApache-2.01518400

vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Language:PythonMIT1900700