Keji's starred repositories
Landmark-RxR
A human-annotated, fine-grained dataset for Vision-and-Language Navigation
Inpaint-Anything
Inpaint anything using Segment Anything and inpainting models.
RxR
Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual perceptions of the annotators
pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
Discrete-Continuous-VLN
Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Curriculum-Learning-For-VLN
Code for NeurIPS 2021 paper "Curriculum Learning for Vision-and-Language Navigation"
Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
CVPR2024-Paper-Code-Interpretation
cvpr2024/cvpr2023/cvpr2022/cvpr2021/cvpr2020/cvpr2019/cvpr2018/cvpr2017 论文/代码/解读/直播合集,极市团队整理
Transformer-in-Vision
Recent Transformer-based CV and related works.
awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
awesome-embodied-vision
Reading list for research topics in embodied vision
Recurrent-VLN-BERT
Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation
awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
awesome-embodied-vision
Reading list for research topics in embodied vision
awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
tensor2tensor
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch