Linhui Xiao's repositories
MAttNet
MAttNet: Modular Attention Network for Referring Expression Comprehension
ViLD
Reference models and tools for Cloud TPUs.
PVT
Official implementation of PVT series
ALBEF
Code for ALBEF: a new vision-language pre-training method
detr
End-to-End Object Detection with Transformers
DeeCap
Dynamic Early Exit for Image Captioning
densecap
Dense image captioning in Torch
deep-learning-for-image-processing
deep learning for image processing including classification and object-detection etc.
RelTR
RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2
DenseRelationalCaptioning
Code of Dense Relational Captioning
Scene-Graph-Benchmark.pytorch
A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”
meshed-memory-transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
CLIP_prefix_caption
Simple image captioning model
scene_graph_benchmark
image scene graph generation benchmark
CoOp
Prompt Learning for Vision-Language Models
SLIP
Code release for SLIP Self-supervision meets Language-Image Pre-training
DeCLIP
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
PromptPapers
Must-read papers on prompt-based tuning for pre-trained language models.
CLIP
Contrastive Language-Image Pretraining
latex
tools.md 是科研写作常用小工具
pytorch-grad-cam
Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Examples for classification, object detection, segmentation, embedding networks and more. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM
SlowFast
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
dino
Image segmentation code for Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers.
M-DGT
The source code of the CVPR22 paper titled "Multi-Modal Dynamic Graph Transformer for Visual Grounding".
clash
A rule-based tunnel in Go.