asr-pub's starred repositories
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
LLaMA-Factory
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
AnimateDiff
Official implementation of AnimateDiff.
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
fish-speech
Brand new TTS solution
parler-tts
Inference and training library for high-quality TTS models.
Resemblyzer
A python package to analyze and compare voices with deep learning
Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
HierSpeechpp
The official implementation of HierSpeech++
whisper-at
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
audioset-processing
Toolkit for downloading and processing Google's AudioSet dataset.
UniCATS-CTX-vec2wav
[AAAI 2024] Code for CTX-vec2wav in UniCATS
LoRA-Torch
PyTorch Reimplementation of LoRA
MakeMultiHeadNaive
Use naive MultiheadAttention implement to replace nn.MultiheadAttention in pytorch