Keji (hekj)

hekj

Geek Repo

Company:Institute of Automation, Chinese Academy of Sciences

Location:BEIJING, CHINA

Github PK Tool:Github PK Tool

Keji's starred repositories

DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

Language:PythonLicense:Apache-2.0Stargazers:707Issues:0Issues:0

Landmark-RxR

A human-annotated, fine-grained dataset for Vision-and-Language Navigation

Stargazers:11Issues:0Issues:0

FDA

Official Implementation of Frequency-enhanced Data Augmentation for Vision-and-Language Navigation (NeurIPS2023)

Language:PythonStargazers:12Issues:0Issues:0

Inpaint-Anything

Inpaint anything using Segment Anything and inpainting models.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:5973Issues:0Issues:0

ETPNav

[TPAMI 2024] Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"

Language:PythonLicense:MITStargazers:174Issues:0Issues:0

VLN-DUET

Official implementation of Think Global, Act Local: Dual-scale GraphTransformer for Vision-and-Language Navigation (CVPR'22 Oral).

Language:PythonStargazers:96Issues:0Issues:0

RxR

Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual perceptions of the annotators

Language:PythonLicense:CC-BY-4.0Stargazers:108Issues:0Issues:0

pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Language:PythonLicense:Apache-2.0Stargazers:30882Issues:0Issues:0

Discrete-Continuous-VLN

Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

Language:PythonLicense:MITStargazers:79Issues:0Issues:0

Curriculum-Learning-For-VLN

Code for NeurIPS 2021 paper "Curriculum Learning for Vision-and-Language Navigation"

Language:PythonLicense:MITStargazers:15Issues:0Issues:0

Awesome-Multimodal-Research

A curated list of Multimodal Related Research.

Language:PythonLicense:MITStargazers:1284Issues:0Issues:0

CVPR2024-Paper-Code-Interpretation

cvpr2024/cvpr2023/cvpr2022/cvpr2021/cvpr2020/cvpr2019/cvpr2018/cvpr2017 论文/代码/解读/直播合集,极市团队整理

Stargazers:12371Issues:0Issues:0

VLN-HAMT

Official implementation of History Aware Multimodal Transformer for Vision-and-Language Navigation (NeurIPS'21).

Language:PythonLicense:MITStargazers:95Issues:0Issues:0

Transformer-in-Vision

Recent Transformer-based CV and related works.

Stargazers:1307Issues:0Issues:0

awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

License:MITStargazers:5713Issues:0Issues:0

robo-vln

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Language:PythonLicense:MITStargazers:65Issues:0Issues:0

ORIST

Know What and Know Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

Language:C++Stargazers:16Issues:0Issues:0

airbert

Codebase for the Airbert paper

Language:PythonLicense:MITStargazers:41Issues:0Issues:0

awesome-embodied-vision

Reading list for research topics in embodied vision

License:MITStargazers:461Issues:0Issues:0

Recurrent-VLN-BERT

Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation

Language:PythonLicense:NOASSERTIONStargazers:148Issues:0Issues:0
Language:PythonLicense:MITStargazers:235Issues:0Issues:0

awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

License:MITStargazers:989Issues:0Issues:0

awesome-embodied-vision

Reading list for research topics in embodied vision

License:MITStargazers:1Issues:0Issues:0

NvEM

[ACM MM 2021 Oral] Official repo of "Neighbor-view Enhanced Model for Vision and Language Navigation"

Language:C++License:MITStargazers:77Issues:0Issues:0

MUST-GAN

Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

Language:PythonStargazers:75Issues:0Issues:0

nlg-eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

Language:PythonLicense:NOASSERTIONStargazers:1322Issues:0Issues:0

awesome-vision-language-pretraining-papers

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

Stargazers:1133Issues:0Issues:0

bert

TensorFlow code and pre-trained models for BERT

Language:PythonLicense:Apache-2.0Stargazers:37540Issues:0Issues:0

tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Language:PythonLicense:Apache-2.0Stargazers:15184Issues:0Issues:0

vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Language:PythonLicense:MITStargazers:19007Issues:0Issues:0