Tianhao-Qi's starred repositories

Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

Language:PythonLicense:MITStargazers:315Issues:0Issues:0

movie-gen

An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community to help implement this model!

Language:PythonLicense:MITStargazers:39Issues:0Issues:0
License:MITStargazers:681Issues:0Issues:0

CogVideoX-Fun

📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.

Language:PythonLicense:Apache-2.0Stargazers:344Issues:0Issues:0

Vchitect-2.0

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

Language:PythonLicense:Apache-2.0Stargazers:608Issues:0Issues:0

Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:2533Issues:0Issues:0

MoMA

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

Language:Jupyter NotebookStargazers:186Issues:0Issues:0

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonLicense:AGPL-3.0Stargazers:2729Issues:0Issues:0
Language:PythonLicense:CC-BY-4.0Stargazers:50Issues:0Issues:0

CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Language:PythonLicense:Apache-2.0Stargazers:7950Issues:0Issues:0

Kolors

Kolors Team

Language:PythonLicense:Apache-2.0Stargazers:3687Issues:0Issues:0

I2V-Adapter-repo

I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models

Stargazers:199Issues:0Issues:0

VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

Language:PythonLicense:Apache-2.0Stargazers:513Issues:0Issues:0

SEED-Story

SEED-Story: Multimodal Long Story Generation with Large Language Model

Language:PythonLicense:NOASSERTIONStargazers:722Issues:0Issues:0

VADER

Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.

Language:PythonStargazers:200Issues:0Issues:0

InfEdit

[CVPR 2024] Official implementation of CVPR 2024 paper: "Inversion-Free Image Editing with Natural Language"

Language:PythonLicense:NOASSERTIONStargazers:270Issues:0Issues:0
Language:PythonLicense:MITStargazers:21Issues:0Issues:0

Portrait-Mode-Video

Video dataset dedicated to portrait-mode video recognition.

Language:PythonStargazers:35Issues:0Issues:0

ComfyUI_LayerStyle

A set of nodes for ComfyUI that can composite layer and mask to achieve Photoshop like functionality.

Language:PythonLicense:MITStargazers:1264Issues:0Issues:0

gpt_academic

为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。

Language:PythonLicense:GPL-3.0Stargazers:64675Issues:0Issues:0

CV-VAE

[NeurIPS 24] CV-VAE: A Compatible Video VAE for Latent Generative Video Models

Language:Jupyter NotebookStargazers:215Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:2581Issues:0Issues:0

RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)

Language:Jupyter NotebookLicense:MITStargazers:1667Issues:0Issues:0

Awesome-Animation-Research

Papers, datasets, and resources related to 2D cartoon video research. Contributions welcome.

License:MITStargazers:74Issues:0Issues:0

V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.

Language:PythonStargazers:2209Issues:0Issues:0

Omost

Your image is almost there!

Language:PythonLicense:Apache-2.0Stargazers:7249Issues:0Issues:0

DiT-Visualization

Visualization of DiT self attention features

Language:PythonStargazers:147Issues:0Issues:0

sglang

SGLang is a fast serving framework for large language models and vision language models.

Language:PythonLicense:Apache-2.0Stargazers:5524Issues:0Issues:0

sdxl_prompt_styler

Custom prompt styler node for SDXL in ComfyUI

Language:PythonLicense:MITStargazers:738Issues:0Issues:0

VGDiffZero

[ICASSP 2024] VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders

Language:PythonStargazers:9Issues:0Issues:0