xzm2004's repositories
awesome-music-informatics
A curated list of awesome article, tutorial, library, webpage, etc.
DNS-Challenge
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
agc
Audiogen Codec
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
audioFlux
A library for audio and music analysis, feature extraction.
audioseal
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
audiowmark
Audio Watermarking
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
DiJiang
The official implementation of "DiJiang: Efficient Large Language Models through Compact Kernelization"
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
megatts2
Unoffical implementation of Megatts2
metavoice-src
Foundational model for human-like, expressive TTS
muzic
Muzic: Music Understanding and Generation with Artificial Intelligence
open-musiclm
Implementation of MusicLM, a text to music model published by Google Research, with a few modifications.
parler-tts
Inference and training library for high-quality TTS models.
RTNeural
Real-time neural network inferencing
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
so-vits-svc-fork
so-vits-svc fork with realtime support, improved interface and more features.
sparse-vqvae
Experimental implementation for a sparse-dictionary based version of the VQ-VAE2 paper
Speech-Editing-Toolkit
It's a repository for implementations of neural speech editing algorithms.
StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
supervoice-gpt
GPT-style network for phonemization with durations of text
ttts
Train the next generation of TTS systems.
USLM
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"
vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
wavmark
AI-based Audio Watermarking Tool