Lily's repositories
actionformer_release
Code release for ActionFormer (ECCV 2022)
CoVGT
Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)
CRIPP-VQA
CRIPP-VQA Benchmark -- EMNLP, 2022
dest
The official implementation of Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling (BMVC 2022 Spotlight).
DJ-RN
As a part of HAKE project (HAKE-3D). Code for our CVPR2020 paper "Detailed 2D-3D Joint Representation for Human-Object Interaction".
evals
Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks.
G-VUE
General-purpose Vision Understanding Evaluation
gluon-cv
Gluon CV Toolkit
Graphormer
Graphormer is a deep learning package that allows researchers and developers to train custom models for molecule modeling tasks. It aims to accelerate the research and application in AI for molecule science, such as material design, drug discovery, etc.
ML-MWN
Official code for the Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation (ICMR 2023).
mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
OpenPSG
Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22
paper-reading
深度学习经典、新论文逐段精读
pytracking
Visual tracking library based on PyTorch.
question-decomposition-to-sql
Weakly Supervised Text-to-SQL Parsing through Question Decomposition
RelateAnything
Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.
RelTR
RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2
SIMPAC-2023-146--
中文环境领域文本分析包,纯神经网络架构,支持EnvBert,LSTM,RNN,word2vec等模型,支持自定义模型,下游任务包括分类,回归,多选,情感分析,命名实体识别等,专题包括气候变化文本分析,环境知识图谱等。针对领域研究进行了接口的优化,一键使用模型。
STTran
Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021
study_resources
study resources of model and engineering
svitt
Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"
VGT
Video Graph Transformer for Video Question Answering (ECCV'22)
visual-chatgpt
Official repo for the paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch