canqin001

followers

following

stars

canqin001's starred repositories

UniLMMV

Language:Python100

switch-cuda

A simple bash script for switching between installed versions of CUDA.

Language:ShellMIT56700

Video-Dataset-Loading-Pytorch

Generic PyTorch dataset implementation to load and augment VIDEOS for deep learning training loops.

Language:PythonBSD-2-Clause44000

llama.cpp

LLM inference in C/C++

Language:C++MIT6253200

Online-RLHF

A recipe for online RLHF.

Language:Python34200

scaling_on_scales

When do we not need larger vision models?

Language:PythonMIT27300

MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Language:PythonBSD-3-Clause48600

DoRA

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Language:PythonNOASSERTION48100

SoM-LLaVA

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Language:Python10300

emoji-cheat-sheet

A markdown version emoji cheat sheet

Language:TypeScriptMIT1214600

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION2451000

T2I-CompBench

[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Language:PythonMIT17500

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonNOASSERTION263800

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonAGPL-3.0258200

SQ-LLaVA

Visual self-questioning for large vision-language assistant.

Language:PythonMIT1700

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.02090200

Maskgit-pytorch

Language:Jupyter NotebookMIT13400

vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Language:PythonMIT222500

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonApache-2.0153600

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION577200

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01836600

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonCC-BY-4.0108800

DiffSynth-Studio

Enjoy the magic of Diffusion models!

Language:PythonApache-2.0600200

VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

Language:PythonApache-2.041500

AIGCBench

Official repo for AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

Language:PythonApache-2.02600

decord

An efficient video loader for deep learning with smart shuffling that's super easy to digest

Language:C++Apache-2.0174500

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.03402200

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

MMVP

Language:Python25600

DragNUWA