asr-pub's starred repositories

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonLicense:MITStargazers:28986Issues:186Issues:942

LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonLicense:Apache-2.0Stargazers:26047Issues:176Issues:4209

loguru

Python logging made (stupidly) simple

Language:PythonLicense:MITStargazers:18765Issues:139Issues:984

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:14148Issues:116Issues:374

latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:Jupyter NotebookLicense:MITStargazers:11092Issues:96Issues:336

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonLicense:Apache-2.0Stargazers:10896Issues:163Issues:198

AnimateDiff

Official implementation of AnimateDiff.

Language:PythonLicense:Apache-2.0Stargazers:9791Issues:103Issues:330

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:7233Issues:88Issues:111

fish-speech

Brand new TTS solution

Language:PythonLicense:NOASSERTIONStargazers:5087Issues:50Issues:226

riffusion

Stable diffusion for real-time music generation

Language:PythonLicense:MITStargazers:3296Issues:38Issues:93

parler-tts

Inference and training library for high-quality TTS models.

Language:PythonLicense:Apache-2.0Stargazers:2874Issues:48Issues:57

Resemblyzer

A python package to analyze and compare voices with deep learning

Language:PythonLicense:Apache-2.0Stargazers:2663Issues:73Issues:79

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Language:PythonLicense:NOASSERTIONStargazers:2335Issues:41Issues:100

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:1241Issues:25Issues:59

HierSpeechpp

The official implementation of HierSpeech++

Language:PythonLicense:MITStargazers:1133Issues:56Issues:48

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonLicense:MITStargazers:1096Issues:24Issues:76

ER-NeRF

[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

Language:PythonLicense:MITStargazers:902Issues:16Issues:149

treelib

An efficient implementation of tree data structure in python 2/3.

Language:PythonLicense:NOASSERTIONStargazers:801Issues:30Issues:129

fairseq2

FAIR Sequence Modeling Toolkit 2

Language:PythonLicense:MITStargazers:623Issues:18Issues:89

whisper-at

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

Language:PythonLicense:BSD-2-ClauseStargazers:292Issues:10Issues:28

megatts2

Unoffical implementation of Megatts2

Language:PythonLicense:MITStargazers:242Issues:22Issues:20

ar-vits

text to speech using autoregressive transformer and VITS

Language:PythonLicense:MITStargazers:211Issues:15Issues:4

audioset-processing

Toolkit for downloading and processing Google's AudioSet dataset.

Language:Jupyter NotebookLicense:MITStargazers:152Issues:3Issues:6

UniCATS-CTX-vec2wav

[AAAI 2024] Code for CTX-vec2wav in UniCATS

admin

Admin console

Language:GoLicense:MITStargazers:107Issues:12Issues:11

LoRA-Torch

PyTorch Reimplementation of LoRA

Language:PythonLicense:MITStargazers:35Issues:2Issues:5

MakeMultiHeadNaive

Use naive MultiheadAttention implement to replace nn.MultiheadAttention in pytorch

Language:PythonStargazers:2Issues:0Issues:0