Bin Zhu's repositories
AnimateDiff
Official implementation of AnimateDiff.
chatgpt-on-wechat
Wechat robot based on ChatGPT, which using OpenAI api and itchat library. 使用ChatGPT搭建微信聊天机器人,基于GPT3.5/4.0 API实现,支持个人微信、公众号、企业微信部署,能处理文本、语音和图片,访问操作系统和互联网。
consistency_models
Official repo for consistency models.
ControlNet
Let us control diffusion models!
denoising-diffusion-pytorch
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
Dlink_Parse
解析 优酷,腾讯,哔哩哔哩,抖音,芒果TV,爱奇艺,PP视频,咪咕视频,AcFun,快手,搜狐视频,QQ音乐,网易云音乐,酷我音乐,腾讯课堂,西瓜视频等下载地址
fastmoe
A fast MoE impl for PyTorch
Grounded-Segment-Anything
Marrying Grounding DINO with Segment Anything & Stable Diffusion & Tag2Text & BLIP & Whisper & ChatBot - Automatically Detect , Segment and Generate Anything with Image, Text, and Audio Inputs
hfai-models
HFAI deep learning models
Latte
Latte: Latent Diffusion Transformer for Video Generation.
LaVIN
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.
lux
👾 Fast and simple video download library and CLI tool written in Go
MagicTime
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
minGPT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Open-Sora-Plan
This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.
opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (LLaMA, LLaMa2, ChatGLM2, ChatGPT, Claude, etc) over 50+ datasets.
SEED-Bench
A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
stable-diffusion
A latent text-to-image diffusion model
TaiSu
TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)
taming-transformers
Taming Transformers for High-Resolution Image Synthesis
VALOR
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Video-Bench
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
Video-LLaMA
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding