AmorJNYH's repositories
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
AnimateAnyone
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
AudioDec
An Open-source Streaming High-fidelity Neural Audio Codec
audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
Awesome-Talking-Head-Synthesis
💬 An extensive collection of exceptional resources dedicated to the captivating world of talking face synthesis! ⭐ If you find this repo useful, please give it a star! 🤩
clone-voice
一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频
cutword
一个简单快速的分词、命名实体识别工具
deepvoice3_pytorch
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
emotion2vec
Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
HierSpeechpp
The official implementation of HierSpeech++
MP-SENet
MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
OpenVoice
Instant voice cloning by MyShell
pesto
Self-supervised learning for fast pitch estimation
pflowtts_pytorch
Unofficial implementation of NVIDIA P-Flow TTS paper
pheme
VALL-E style models
PitchSqueezer
A robust pitch tracker using synchro-squeezed fft and frequency domain autocorrelation
pretty-midi
Utility functions for handling MIDI data in a nice/intuitive way.
SECap
音频情感标注
tts-frontend-dataset
TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization
UTMOS
UT-Sarulab MOS prediction system using SSL models
vallex
代码美化
vid2densepose
Convert your videos to densepose and use it on MagicAnimate
zhihu-tfm-llm-gpt
:books: 知乎大语言模型、ChatGPT、Transformers问答