splinter21's repositories
audiocodecs
A collections of audio codecs with a standardized API
audiocomplib
A Python library for high-quality, fast, and customizable dynamic audio compression and peak limiting.
BigVGAN-32k-sr-free
16khz, 24khz, 32khz to 32khz decoding from mel spectrogram
ConsisID
[CVPR 2025🔥] Identity-Preserving Text-to-Video Generation by Frequency Decomposition
DiffSynth-Studio
Enjoy the magic of Diffusion models!
diffusion-pipe
A pipeline parallel training script for diffusion models.
FlowDec
An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.
focalcodec
A low-bitrate single-codebook 16 kHz speech codec based on focal modulation
HunyuanVideo-I2V
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
kokoro
https://hf.co/hexgrad/Kokoro-82M
LLaSE-G1
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
MoBA
MoBA: Mixture of Block Attention for Long-Context LLMs
NotaGen
NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms
PixelDatasetAutoArb
Pixelart dataset preprocess workflow
PodAgent
PodAgent: A Comprehensive Framework for Podcast Generation
R3MOE
[RecurrentNN × Regression × Regularized]-base Mouth Opening Estimation via SSL(Semi-supervised Learning).
SkyReels-V1
SkyReels V1: The first and most advanced open-source human-centric video foundation model
Spark-TTS
Spark-TTS Inference Code
tidy-tunes
Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open source models while minimizing dependencies.
TIGER
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
UniCodec
UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound
VVQuest
智能检索张维为表情包
waifu-age
waifu年龄检测器!
Wan2GP
Wan 2.1 for the GPU Poor
xAR
This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation"