splinter21's repositories
arxiv-translate-fix
arxiv翻译修复器!
breath-removal
Detect and remove or lower the volume of breathing in speech recordings.
CELSDS
A Chinese Expressive Long-dialogue Speech Dataset with Scripts
Cosmos-Tokenizer
A suite of image and video neural tokenizers
echomimic_v2
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
FastVideo
FastVideo is an open-source framework for accelerating large video diffusion model.
fsspec_disk
万能硬盘!
g2pW-Cantonese
Cantonese Grapheme-to-Phoneme Converter based on GitYCC/g2pW
genmoai-models
The best OSS video generation models
GetQzonehistory
获取QQ空间发布的历史说说
GIMM-VFI
[NeurIPS 2024] Generalizable Implicit Motion Modeling for Video Frame Interpolation
hertz-dev
first base model for full-duplex conversational audio
LTX-Video
Official repository for LTX-Video
Neural-Codec-and-Speech-Language-Models
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
py2many
Transpiler of Python to many other languages
REAL-Video-Enhancer
Interpolate and Upscale easily on Linux/Windows.
SimVQ
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
TTSAudioNormalizer
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
vec2wav2.0
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
WavChat
A Survey of Spoken Dialogue Models (60 pages)
WaveFM
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching