panpanpan (pan310)

pan310

Geek Repo

Github PK Tool:Github PK Tool

panpanpan's repositories

flash-attention

Fast and memory-efficient exact attention

License:BSD-3-ClauseStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

License:Apache-2.0Stargazers:0Issues:0Issues:0

fish-speech

Brand new TTS solution

License:NOASSERTIONStargazers:0Issues:0Issues:0

detail_tts

All generative model in one for better TTS model

Stargazers:0Issues:0Issues:0

NAST

Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11037

License:MITStargazers:0Issues:0Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

License:MITStargazers:0Issues:0Issues:0

emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Stargazers:0Issues:0Issues:0

voicefixer

General Speech Restoration

License:MITStargazers:0Issues:0Issues:0

ChatTTS

ChatTTS is a generative speech model for daily dialogue.

License:NOASSERTIONStargazers:0Issues:0Issues:0

direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

License:Apache-2.0Stargazers:0Issues:0Issues:0

OpenVoice

Instant voice cloning by MyShell.

License:MITStargazers:0Issues:0Issues:0

Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi

License:MITStargazers:0Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

License:MITStargazers:0Issues:0Issues:0

demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

License:MITStargazers:0Issues:0Issues:0

parler-tts

Inference and training library for high-quality TTS models.

License:Apache-2.0Stargazers:0Issues:0Issues:0

UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

License:NOASSERTIONStargazers:0Issues:0Issues:0

KazEmoTTS

An open-source Kazakh Emotional Text-to-Speech Dataset

Stargazers:0Issues:0Issues:0

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

License:MITStargazers:0Issues:0Issues:0

grok-1

Grok open release

License:Apache-2.0Stargazers:0Issues:0Issues:0

UniCATS-CTX-txt2vec

[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS

Stargazers:0Issues:0Issues:0

brouhaha-vad

Predicts the level of noise and reverberation on your audiofiles

License:MITStargazers:0Issues:0Issues:0

AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Stargazers:0Issues:0Issues:0

Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction

Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)

Stargazers:0Issues:0Issues:0

AudioSR-Upsampling

AudioSR-Upsampling (any -> 48kHz)

License:NOASSERTIONStargazers:0Issues:0Issues:0

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

License:MITStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

License:MITStargazers:0Issues:0Issues:0

megatts2

Unoffical implementation of Megatts2

License:MITStargazers:0Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

License:MITStargazers:0Issues:0Issues:0