Wuziyi616

Ziyi Wu's starred repositories

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.027458 224 4602

flux

Official inference repo for FLUX.1 models

Language:PythonApache-2.014159 125 130

clip-as-service

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Language:PythonNOASSERTION12389 220 607

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.011031 64 256

progress

Linux tool to show progress for cp, mv, dd, ... (formerly known as cv)

Language:CGPL-3.08532 140 111

AI-Scientist

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Language:Jupyter NotebookApache-2.07621 86 97

torchtitan

A native PyTorch Library for large model training

Language:PythonBSD-3-Clause2217 37 135

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Language:PythonMIT2035 31 84

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonApache-2.01645 23 106

glomap

GLOMAP - Global Structured-from-Motion Revisited

Language:C++BSD-3-Clause1323 22 73

yarn

YaRN: Efficient Context Window Extension of Large Language Models

Language:PythonMIT1315 14 56

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT1212 21 54

shape-of-motion

Language:PythonMIT758 17 46

MINT-1T

MINT-1T: A one trillion token multimodal interleaved dataset.

733 24 9

diffusion-forcing

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Language:PythonNOASSERTION512 6 20

attention_with_linear_biases

Code for the ALiBi method for transformer language models (ICLR 2022)

Language:PythonMIT501 12 19

ReconX

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

MIT457 55 4

TransNetV2

TransNet V2: Shot Boundary Detection Neural Network

Language:PythonMIT445 9 47

vfusion3d

[ECCV 2024] Code for VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

Language:PythonNOASSERTION390 13 9

VisionLLaMA

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Language:Python357 23 6

MiraData

Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"

Language:PythonGPL-3.0351 14 15

rerope

Rectified Rotary Position Embeddings

Language:Python332 11 20

unmasked_teacher

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Language:PythonMIT284 13 47

DOVER

[ICCV 2023, Official Code] for paper "Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives". Official Weights and Demos provided.

Language:Jupyter NotebookNOASSERTION264 4 33