linzhiqiu's repositories
cross_modal_adaptation
Cross-modal few-shot adaptation with CLIP
t2v_metrics
Evaluating text-to-image/video/3D models with VQAScore
visual_gpt_score
VisualGPTScore for visio-linguistic reasoning
CLIP-FlanT5
Training code for CLIP-FlanT5
vl_finetuning
Few-shot Finetuning of CLIP
debiased-pseudo-labeling
[CVPR 2022] Debiased Learning from Naturally Imbalanced Pseudo-Labels
HRNet-Semantic-Segmentation
The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919
HTML4Vision
A simple HTML visualization tool for computer vision research :hammer_and_wrench:
linzhiqiu.github.io
Zhiqiu Lin's site
lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
PerceptualSimilarity
LPIPS metric. pip install lpips
pytorchvideo
A deep learning library for video understanding research.
streamlit-feedback-video
Collect user feedback from within your Streamlit app
streamlit-video-captioning
Streamlit LLM app
video_annotation
Video Annotation Format
vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
why-winoground-hard
Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022