splinter21

User data from Github https://github.com/splinter21

followers

following

stars

splinter21's repositories

audiocodecs

A collections of audio codecs with a standardized API

Apache-2.0000

audiocomplib

A Python library for high-quality, fast, and customizable dynamic audio compression and peak limiting.

MIT000

BigVGAN-32k-sr-free

16khz, 24khz, 32khz to 32khz decoding from mel spectrogram

MIT000

ConsisID

[CVPR 2025🔥] Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Apache-2.0000

DiffRhythm

000

DiffSynth-Studio

Enjoy the magic of Diffusion models!

Apache-2.0000

diffusion-pipe

A pipeline parallel training script for diffusion models.

MIT000

FlowDec

An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.

NOASSERTION000

focalcodec

A low-bitrate single-codebook 16 kHz speech codec based on focal modulation

Apache-2.0000

HunyuanVideo-I2V

HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo

NOASSERTION000

kokoro

https://hf.co/hexgrad/Kokoro-82M

Apache-2.0000

LLaSE-G1

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement

000

MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

MIT000

Moonlight

MIT000

NotaGen

NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms

000

PixelDatasetAutoArb

Pixelart dataset preprocess workflow

000

PodAgent

PodAgent: A Comprehensive Framework for Podcast Generation

Apache-2.0000

prosody_gan

MIT000

R3MOE

[RecurrentNN × Regression × Regularized]-base Mouth Opening Estimation via SSL(Semi-supervised Learning).

GPL-3.0000

SkyReels-V1

SkyReels V1: The first and most advanced open-source human-centric video foundation model

NOASSERTION000

Spark-TTS

Spark-TTS Inference Code

Language:PythonApache-2.0000

Step-Audio-tts

000

Step-Video-T2V

Language:PythonMIT000

tidy-tunes

Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open source models while minimizing dependencies.

MIT000

TIGER

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

000

UniCodec

UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound

000

VVQuest

智能检索张维为表情包

MIT000

waifu-age

waifu年龄检测器！

000

Wan2GP

Wan 2.1 for the GPU Poor

NOASSERTION000

xAR

This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation"

000