Beast code in Giters

panpanpan's repositories

flash-attention

Fast and memory-efficient exact attention

BSD-3-Clause000

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Apache-2.0000

detail_tts

All generative model in one for better TTS model

000

NAST

Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11037

MIT000

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

MIT000

emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

000

voicefixer

General Speech Restoration

MIT000

ChatTTS

ChatTTS is a generative speech model for daily dialogue.

NOASSERTION000

direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Apache-2.0000

OpenVoice

Instant voice cloning by MyShell.

MIT000

Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi

MIT000

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

MIT000

demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

MIT000

parler-tts

Inference and training library for high-quality TTS models.

Apache-2.0000

UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

NOASSERTION000

KazEmoTTS

An open-source Kazakh Emotional Text-to-Speech Dataset

000

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

MIT000

grok-1

Grok open release

Apache-2.0000

UniCATS-CTX-txt2vec

[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS

000

brouhaha-vad

Predicts the level of noise and reverberation on your audiofiles

MIT000

AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

000

Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction

Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)

000

AudioSR-Upsampling

AudioSR-Upsampling (any -> 48kHz)

NOASSERTION000

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

MIT000

MahaTTS

Apache-2.0000

PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

MIT000

megatts2

Unoffical implementation of Megatts2

MIT000

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

MIT000

pan310

panpanpan's repositories

flash-attention

speech-to-speech

CosyVoice

fish-speech

detail_tts

NAST

Amphion

emotion2vec

voicefixer

ChatTTS

direct-preference-optimization

OpenVoice

Montreal-Forced-Aligner

audiocraft

demucs

parler-tts

UniSpeech

KazEmoTTS

descript-audio-codec

grok-1

UniCATS-CTX-txt2vec

brouhaha-vad

AnyGPT

Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction

AudioSR-Upsampling

VALL-E-X

MahaTTS

PL-BERT

megatts2

GPT-SoVITS