i-MaTh

i-MaTh

Geek Repo

Company:East China Normal University

Location:Shanghai

Github PK Tool:Github PK Tool

i-MaTh's starred repositories

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonLicense:MITStargazers:30055Issues:189Issues:994

ControlNet

Let us control diffusion models!

Language:PythonLicense:Apache-2.0Stargazers:29256Issues:217Issues:532

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:23963Issues:217Issues:3700

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonLicense:MITStargazers:11058Issues:164Issues:234

whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Language:PythonLicense:BSD-2-ClauseStargazers:10283Issues:127Issues:655

StreamDiffusion

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

Language:PythonLicense:Apache-2.0Stargazers:9354Issues:77Issues:109

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Language:PythonLicense:MITStargazers:8824Issues:82Issues:36

Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Language:PythonLicense:Apache-2.0Stargazers:6550Issues:48Issues:201

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Language:PythonLicense:MITStargazers:6393Issues:60Issues:78
Language:PythonLicense:NOASSERTIONStargazers:6157Issues:70Issues:116

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonLicense:BSD-3-ClauseStargazers:5403Issues:63Issues:96

cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Language:PythonLicense:NOASSERTIONStargazers:5004Issues:35Issues:177

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:4336Issues:58Issues:141

latent-consistency-model

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Language:PythonLicense:MITStargazers:4237Issues:62Issues:93

metavoice-src

Foundational model for human-like, expressive TTS

Language:PythonLicense:Apache-2.0Stargazers:3583Issues:76Issues:120

stable-audio-tools

Generative models for conditional audio generation

Language:PythonLicense:MITStargazers:2386Issues:42Issues:77

AudioLDM2

Text-to-Audio/Music Generation

Language:PythonLicense:NOASSERTIONStargazers:2168Issues:44Issues:66

OpenDiT

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Language:PythonLicense:Apache-2.0Stargazers:1379Issues:23Issues:57

voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Language:PythonLicense:MITStargazers:567Issues:50Issues:25

SiT

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Language:PythonLicense:MITStargazers:565Issues:10Issues:18

UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

Language:PythonLicense:NOASSERTIONStargazers:411Issues:21Issues:44

SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Language:PythonLicense:Apache-2.0Stargazers:380Issues:16Issues:10

MiniGPT4Qwen

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

Language:Jupyter NotebookStargazers:294Issues:4Issues:24

spear-tts-pytorch

Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch

Language:PythonLicense:MITStargazers:250Issues:28Issues:6

tts-scores

Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models

Language:PythonLicense:Apache-2.0Stargazers:128Issues:5Issues:12

stable-audio-metrics

Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.

Language:PythonLicense:MITStargazers:122Issues:3Issues:0

UniCATS-CTX-vec2wav

[AAAI 2024] Code for CTX-vec2wav in UniCATS

agc

Audiogen Codec

Language:PythonLicense:MITStargazers:106Issues:3Issues:1

whisper-punctuator

Zero-shot multimodal punctuation insertion and truecasing using Whisper

Language:PythonLicense:MITStargazers:94Issues:6Issues:5

OpenSora

Exquisite video generation

License:Apache-2.0Stargazers:9Issues:0Issues:0