shiyuzh2007's starred repositories

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonLicense:AGPL-3.0Stargazers:141382Issues:1080Issues:7656

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonLicense:NOASSERTIONStargazers:83068Issues:1739Issues:45734

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonLicense:MITStargazers:69886Issues:574Issues:0

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:67973Issues:557Issues:712

llama

Inference code for Llama models

Language:PythonLicense:NOASSERTIONStargazers:56060Issues:526Issues:969

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookLicense:MITStargazers:35737Issues:329Issues:439

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:21914Issues:186Issues:490

reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.

Language:Jupyter NotebookLicense:MITStargazers:20513Issues:860Issues:155

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:19791Issues:156Issues:1505

deepmind-research

This repository contains implementations and illustrative code to accompany DeepMind publications

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:13148Issues:325Issues:321

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Language:HTMLLicense:Apache-2.0Stargazers:8836Issues:57Issues:1117

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:8742Issues:134Issues:1093

LWM

Large World Model With 1M Context

Language:PythonLicense:Apache-2.0Stargazers:7115Issues:66Issues:71

Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Language:PythonLicense:Apache-2.0Stargazers:5669Issues:67Issues:128

mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Language:PythonLicense:NOASSERTIONStargazers:5489Issues:114Issues:656

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:4937Issues:49Issues:442

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:4529Issues:59Issues:156

ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)

Language:PythonLicense:Apache-2.0Stargazers:3891Issues:22Issues:1179

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Language:PythonLicense:BSD-3-ClauseStargazers:3254Issues:57Issues:101

MAE-pytorch

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Language:Jupyter NotebookLicense:MITStargazers:1670Issues:40Issues:74

DI-star

An artificial intelligence platform for the StarCraft II with large-scale distributed training and grand-master agents.

Language:PythonLicense:Apache-2.0Stargazers:1221Issues:18Issues:26

SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

Language:PythonLicense:Apache-2.0Stargazers:1018Issues:26Issues:49

Whisper-Finetune

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment

Language:CLicense:Apache-2.0Stargazers:849Issues:8Issues:90

Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

Language:PythonLicense:MITStargazers:744Issues:71Issues:14

deer

DEEp Reinforcement learning framework

Language:PythonLicense:NOASSERTIONStargazers:485Issues:50Issues:32

welm

One command to build TLG.fst for WeNet.

jaxrl

JAX (Flax) implementation of algorithms for Deep Reinforcement Learning with continuous action spaces.

Language:Jupyter NotebookLicense:MITStargazers:1Issues:0Issues:0