splinter21's repositories

audiocodecs

A collections of audio codecs with a standardized API

License:Apache-2.0Stargazers:0Issues:0Issues:0

audiocomplib

A Python library for high-quality, fast, and customizable dynamic audio compression and peak limiting.

License:MITStargazers:0Issues:0Issues:0

BigVGAN-32k-sr-free

16khz, 24khz, 32khz to 32khz decoding from mel spectrogram

License:MITStargazers:0Issues:0Issues:0

ConsisID

[CVPR 2025🔥] Identity-Preserving Text-to-Video Generation by Frequency Decomposition

License:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

DiffSynth-Studio

Enjoy the magic of Diffusion models!

License:Apache-2.0Stargazers:0Issues:0Issues:0

diffusion-pipe

A pipeline parallel training script for diffusion models.

License:MITStargazers:0Issues:0Issues:0

FlowDec

An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.

License:NOASSERTIONStargazers:0Issues:0Issues:0

focalcodec

A low-bitrate single-codebook 16 kHz speech codec based on focal modulation

License:Apache-2.0Stargazers:0Issues:0Issues:0

HunyuanVideo-I2V

HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo

License:NOASSERTIONStargazers:0Issues:0Issues:0

kokoro

https://hf.co/hexgrad/Kokoro-82M

License:Apache-2.0Stargazers:0Issues:0Issues:0

LLaSE-G1

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement

Stargazers:0Issues:0Issues:0

MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

License:MITStargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

NotaGen

NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms

Stargazers:0Issues:0Issues:0

PixelDatasetAutoArb

Pixelart dataset preprocess workflow

Stargazers:0Issues:0Issues:0

PodAgent

PodAgent: A Comprehensive Framework for Podcast Generation

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

R3MOE

[RecurrentNN × Regression × Regularized]-base Mouth Opening Estimation via SSL(Semi-supervised Learning).

License:GPL-3.0Stargazers:0Issues:0Issues:0

SkyReels-V1

SkyReels V1: The first and most advanced open-source human-centric video foundation model

License:NOASSERTIONStargazers:0Issues:0Issues:0

Spark-TTS

Spark-TTS Inference Code

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Language:PythonLicense:MITStargazers:0Issues:0Issues:0

tidy-tunes

Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open source models while minimizing dependencies.

License:MITStargazers:0Issues:0Issues:0

TIGER

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

Stargazers:0Issues:0Issues:0

UniCodec

UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound

Stargazers:0Issues:0Issues:0

VVQuest

智能检索张维为表情包

License:MITStargazers:0Issues:0Issues:0

waifu-age

waifu年龄检测器!

Stargazers:0Issues:0Issues:0

Wan2GP

Wan 2.1 for the GPU Poor

License:NOASSERTIONStargazers:0Issues:0Issues:0

xAR

This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation"

Stargazers:0Issues:0Issues:0