Beast code in Giters

asr-pub's starred repositories

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonMIT28986 186 942

LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonApache-2.026047 176 4209

loguru

Python logging made (stupidly) simple

Language:PythonMIT18765 139 984

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.014148 116 374

latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:Jupyter NotebookMIT11092 96 336

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonApache-2.010896 163 198

AnimateDiff

Official implementation of AnimateDiff.

Language:PythonApache-2.09791 103 330

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:Jupyter NotebookNOASSERTION7233 88 111

fish-speech

Brand new TTS solution

Language:PythonNOASSERTION5087 50 226

transformer-debugger

Language:PythonMIT3972 26 14

riffusion

Stable diffusion for real-time music generation

Language:PythonMIT3296 38 93

parler-tts

Inference and training library for high-quality TTS models.

Language:PythonApache-2.02874 48 57

Resemblyzer

A python package to analyze and compare voices with deep learning

Language:PythonApache-2.02663 73 79

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Language:PythonNOASSERTION2335 41 100

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION1241 25 59

HierSpeechpp

The official implementation of HierSpeech++

Language:PythonMIT1133 56 48

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonMIT1096 24 76

ER-NeRF

[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

Language:PythonMIT902 16 149

treelib

An efficient implementation of tree data structure in python 2/3.

Language:PythonNOASSERTION801 30 129

fairseq2

FAIR Sequence Modeling Toolkit 2

Language:PythonMIT623 18 89

whisper-at

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

Language:PythonBSD-2-Clause292 10 28

megatts2

Unoffical implementation of Megatts2

Language:PythonMIT242 22 20

ar-vits

text to speech using autoregressive transformer and VITS

Language:PythonMIT211 15 4

laughter-detection

Language:PythonMIT176 8 11

audioset-processing

Toolkit for downloading and processing Google's AudioSet dataset.

Language:Jupyter NotebookMIT152 3 6

UniCATS-CTX-vec2wav

[AAAI 2024] Code for CTX-vec2wav in UniCATS

Language:Python109 10 9

admin

Admin console

Language:GoMIT107 12 11

LoRA-Torch

PyTorch Reimplementation of LoRA

Language:PythonMIT35 2 5

MakeMultiHeadNaive

Use naive MultiheadAttention implement to replace nn.MultiheadAttention in pytorch

Language:Python22 2 2

HAS_CNF

Language:Python200