seastar105

HAESUNG JEON (chad.plus)'s starred repositories

supabase

The open source Firebase alternative.

Language:TypeScriptApache-2.067455 501 3464

mlx

MLX: An array framework for Apple silicon

Language:C++MIT15082 137 435

marker

Convert PDF to markdown quickly with high accuracy

Language:PythonGPL-3.011565 48 124

video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Language:PythonApache-2.05883 70 219

pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Language:PythonApache-2.05721 211 307

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION5411 46 73

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonBSD-3-Clause5263 61 87

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT4065 54 116

FreeU

FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)

MIT1529 41 28

stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Language:PythonMIT1364 34 241

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION1148 25 54

HierSpeechpp

The official implementation of HierSpeech++

Language:PythonMIT1110 57 45

Real-Time-Latent-Consistency-Model

App showcasing multiple real-time diffusion models pipelines with Diffusers

Language:PythonApache-2.0835 19 36

NeumAI

Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

Language:PythonApache-2.0793 9 14

speech-denoising-wavenet

A neural network for end-to-end speech denoising

Language:PythonMIT667 18 42

normalizing-flows

PyTorch implementation of normalizing flow models

Language:PythonMIT627 13 39

soft-dtw

Python implementation of soft-DTW.

Language:PythonBSD-2-Clause522 28 26

ZeroSpeech

VQ-VAE for Acoustic Unit Discovery and Voice Conversion

Language:Python311 9 18

stopes

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

Language:PythonMIT239 21 40

paura

Python AUdio Recording and Analysis (paura)

Language:PythonMIT217 15 7

ai-audio-datasets-list

This is a list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications. It is mainly used for speech recognition, speech synthesis, singing voice synthesis, music information retrieval, music generation, etc.

MIT182 8 1

seastar105

HAESUNG JEON (chad.plus)'s starred repositories

supabase

mlx

marker

video-retalking

pyAudioAnalysis

DiT

gpt-fast

Amphion

FreeU

stable-ts

Qwen-Audio

HierSpeechpp

Real-Time-Latent-Consistency-Model

NeumAI

speech-denoising-wavenet

normalizing-flows

soft-dtw

ZeroSpeech

stopes

paura

ai-audio-datasets-list

pflowtts_pytorch

awesome-voice-conversion

HPMDubbing

character-factory

Algorithms

APNet2

LaDiffCodec

EDMSound

VISinger