linzhiqiu's starred repositories
Awesome-LLM-Post-training
Awesome Reasoning LLM Tutorial/Survey/Guide
pasa
PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries.
VideoLLaMA3
Frontier Multimodal Foundation Models for Image and Video Understanding
LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
PerspectiveFields
[CVPR 2023 Highlight] Perspective Fields for Single Image Camera Calibration
superclass
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
NaturalBench
🚀 [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with simple questions about natural imagery.
MAmmoTH-VL
(ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
MotionBench
Official code for MotionBench (CVPR 2025)
VidComposition
[CVPR 2025] VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
T2I-Probology
Experimental results + resources for probing compositional structure in generative text-to-image (T2I) models
t2v_metrics
Evaluating text-to-image/video/3D models with VQAScore