bryanyzhu

followers

following

stars

Amazon AI

SF Bay Area

https://bryanyzhu.github.io/

Yi Zhu's starred repositories

faceswap

Deepfakes Software For All

Language:PythonGPL-3.050218 1530 856

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.021397 179 454

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.018908 159 1454

marker

Convert PDF to markdown quickly with high accuracy

Language:PythonGPL-3.015908 68 203

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Language:PythonMIT11635 71 265

surya

OCR, layout analysis, reading order, line detection in 90+ languages

Language:PythonGPL-3.09655 78 123

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Language:PythonMIT8961 82 36

EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

SillyTavern

LLM Frontend for Power Users.

Language:JavaScriptAGPL-3.07287 61 1439

Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Language:PythonApache-2.06676 49 206

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION4614 49 422

VideoCrafter

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Language:PythonNOASSERTION4429 70 80

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonNOASSERTION3955 48 841

llm-foundry

LLM training code for Databricks foundation models

Language:PythonApache-2.03934 47 371

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonAGPL-3.02647 460

dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

Language:PythonNOASSERTION2493 40 23

DynamiCrafter

[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Language:PythonApache-2.02305 31 120

human

Human: AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition

Language:HTMLMIT2237 44 268

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonApache-2.01886 24 88

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION1366 25 64

MotionCtrl

Official Code for MotionCtrl [SIGGRAPH 2024]

Language:PythonApache-2.01244 50 31

twikit

Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

Language:PythonMIT1119 17 138

yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.

Language:PythonGPL-3.0848 17 10

3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Language:PythonApache-2.0799 16 75

LVDM

LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation

Language:PythonMIT436 28 22

DocLayNet

DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis

NOASSERTION229 4 16

WildBench

Benchmarking LLMs with Challenging Tasks from Real Users

Language:PythonApache-2.0170 4 6

QuRating

[ICML 2024] Selecting High-Quality Data for Training Language Models

Language:Python130 6 5

Inflection-Benchmarks

Public Inflection Benchmarks

MIT67 6 4

mt-bench-101

[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

Apache-2.035 6 5