wh0330's starred repositories
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
bottom-up-attention.pytorch
A PyTorch reimplementation of bottom-up-attention models
SceneGraphParser
A python toolkit for parsing captions (in natural language) into scene graphs (as symbolic representations).
Graph-Optimal-Transport
Code for ICML 2020 "Graph Optimal Transport for Cross-Domain Alignment"
SUTD-TrafficQA
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
cheatsheets
Official Matplotlib cheat sheets
FINCH-Clustering
Source Code for FINCH Clustering Algorithm
Mixture-of-Embedding-Experts
Mixture-of-Embeddings-Experts
NeXtVLAD.pytorch
Pytorch implementation of NetVlad for classification on UCF101
CrossViT-pytorch
Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
PoseFormer
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".
visdial_conv
This repository contains code used in our ACL'20 paper History for Visual Dialog: Do we really need it?