HAESUNG JEON (chad.plus) (seastar105)

seastar105

Geek Repo

Company:@kakaobrain

Location:Seoul, Korea

Github PK Tool:Github PK Tool

HAESUNG JEON (chad.plus)'s starred repositories

supabase

The open source Firebase alternative.

Language:TypeScriptLicense:Apache-2.0Stargazers:67455Issues:501Issues:3464

mlx

MLX: An array framework for Apple silicon

marker

Convert PDF to markdown quickly with high accuracy

Language:PythonLicense:GPL-3.0Stargazers:11565Issues:48Issues:124

video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Language:PythonLicense:Apache-2.0Stargazers:5883Issues:70Issues:219

pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Language:PythonLicense:Apache-2.0Stargazers:5721Issues:211Issues:307

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5411Issues:46Issues:73

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonLicense:BSD-3-ClauseStargazers:5263Issues:61Issues:87

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:4065Issues:54Issues:116

FreeU

FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)

stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Language:PythonLicense:MITStargazers:1364Issues:34Issues:241

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:1148Issues:25Issues:54

HierSpeechpp

The official implementation of HierSpeech++

Language:PythonLicense:MITStargazers:1110Issues:57Issues:45

Real-Time-Latent-Consistency-Model

App showcasing multiple real-time diffusion models pipelines with Diffusers

Language:PythonLicense:Apache-2.0Stargazers:835Issues:19Issues:36

NeumAI

Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

Language:PythonLicense:Apache-2.0Stargazers:793Issues:9Issues:14

speech-denoising-wavenet

A neural network for end-to-end speech denoising

Language:PythonLicense:MITStargazers:667Issues:18Issues:42

normalizing-flows

PyTorch implementation of normalizing flow models

Language:PythonLicense:MITStargazers:627Issues:13Issues:39

soft-dtw

Python implementation of soft-DTW.

Language:PythonLicense:BSD-2-ClauseStargazers:522Issues:28Issues:26

ZeroSpeech

VQ-VAE for Acoustic Unit Discovery and Voice Conversion

stopes

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

Language:PythonLicense:MITStargazers:239Issues:21Issues:40

paura

Python AUdio Recording and Analysis (paura)

Language:PythonLicense:MITStargazers:217Issues:15Issues:7

ai-audio-datasets-list

This is a list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications. It is mainly used for speech recognition, speech synthesis, singing voice synthesis, music information retrieval, music generation, etc.

pflowtts_pytorch

Unofficial implementation of NVIDIA P-Flow TTS paper

Language:PythonLicense:MITStargazers:181Issues:15Issues:40

awesome-voice-conversion

A curated list of awesome voice conversion, projects and communities.

HPMDubbing

[CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.

Language:PythonLicense:MITStargazers:92Issues:9Issues:9

character-factory

Generate characters for SillyTavern, TavernAI, TextGenerationWebUI using LLM and Stable Diffusion

Language:PythonLicense:AGPL-3.0Stargazers:77Issues:0Issues:0

APNet2

Source code of APNet2, a vocoder

Language:PythonLicense:MITStargazers:45Issues:2Issues:2

EDMSound

Codebase and project page for EDMSound

Language:PythonLicense:MITStargazers:27Issues:0Issues:0

VISinger

Use VITS and Opencpop to develop singing voice synthesis; Different from VISinger.

Language:PythonLicense:Apache-2.0Stargazers:26Issues:0Issues:2