Yi Zhu (bryanyzhu)

bryanyzhu

Geek Repo

Company:Amazon AI

Location:SF Bay Area

Home Page:https://bryanyzhu.github.io/

Github PK Tool:Github PK Tool

Yi Zhu's starred repositories

faceswap

Deepfakes Software For All

Language:PythonLicense:GPL-3.0Stargazers:50218Issues:1530Issues:856

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:21397Issues:179Issues:454

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:18908Issues:159Issues:1454

marker

Convert PDF to markdown quickly with high accuracy

Language:PythonLicense:GPL-3.0Stargazers:15908Issues:68Issues:203

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Language:PythonLicense:MITStargazers:11635Issues:71Issues:265

surya

OCR, layout analysis, reading order, line detection in 90+ languages

Language:PythonLicense:GPL-3.0Stargazers:9655Issues:78Issues:123

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Language:PythonLicense:MITStargazers:8961Issues:82Issues:36

EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

SillyTavern

LLM Frontend for Power Users.

Language:JavaScriptLicense:AGPL-3.0Stargazers:7287Issues:61Issues:1439

Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Language:PythonLicense:Apache-2.0Stargazers:6676Issues:49Issues:206

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:4614Issues:49Issues:422

VideoCrafter

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Language:PythonLicense:NOASSERTIONStargazers:4429Issues:70Issues:80

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonLicense:NOASSERTIONStargazers:3955Issues:48Issues:841

llm-foundry

LLM training code for Databricks foundation models

Language:PythonLicense:Apache-2.0Stargazers:3934Issues:47Issues:371

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonLicense:AGPL-3.0Stargazers:2647Issues:46Issues:0

dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

Language:PythonLicense:NOASSERTIONStargazers:2493Issues:40Issues:23

DynamiCrafter

[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Language:PythonLicense:Apache-2.0Stargazers:2305Issues:31Issues:120

human

Human: AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition

Language:HTMLLicense:MITStargazers:2237Issues:44Issues:268

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonLicense:Apache-2.0Stargazers:1886Issues:24Issues:88

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:1366Issues:25Issues:64

MotionCtrl

Official Code for MotionCtrl [SIGGRAPH 2024]

Language:PythonLicense:Apache-2.0Stargazers:1244Issues:50Issues:31

twikit

Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

Language:PythonLicense:MITStargazers:1119Issues:17Issues:138

yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.

Language:PythonLicense:GPL-3.0Stargazers:848Issues:17Issues:10

3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Language:PythonLicense:Apache-2.0Stargazers:799Issues:16Issues:75

LVDM

LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation

Language:PythonLicense:MITStargazers:436Issues:28Issues:22

DocLayNet

DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis

WildBench

Benchmarking LLMs with Challenging Tasks from Real Users

Language:PythonLicense:Apache-2.0Stargazers:170Issues:4Issues:6

QuRating

[ICML 2024] Selecting High-Quality Data for Training Language Models

Inflection-Benchmarks

Public Inflection Benchmarks

mt-bench-101

[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues