Beast code in Giters

linzhiqiu's starred repositories

Cosmos

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

Apache-2.08055 850

mochi

The best OSS video generation models, created by Genmo

Language:PythonApache-2.03424 44 117

lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Language:PythonNOASSERTION3101 6 377

Awesome-LLM-Post-training

Awesome Reasoning LLM Tutorial/Survey/Guide

Language:Python2069 18 2

ml-aim

This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.

Language:PythonNOASSERTION1366 27 32

PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries.

Language:PythonApache-2.01338 9 13

Sa2VA

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Language:PythonApache-2.01259 23 52

mega-sam

Code for the project "MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos"

Language:PythonApache-2.01048 51 35

VideoLLaMA3

Frontier Multimodal Foundation Models for Image and Video Understanding

Language:Jupyter NotebookApache-2.0985 12 85

LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Language:PythonMIT828 14 67

tarsier

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Language:PythonApache-2.0470 8 32

LongVU

[ICML 2025] Official PyTorch implementation of LongVU

Language:PythonApache-2.0398 4 44

PerspectiveFields

[CVPR 2023 Highlight] Perspective Fields for Single Image Camera Calibration

Language:Jupyter NotebookNOASSERTION272 7 19

superclass

[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training

Language:PythonApache-2.0216 7 13

NaturalBench

🚀 [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with simple questions about natural imagery.

Language:Python85 100