Beast code in Giters

Young Han Lee's starred repositories

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonAGPL-3.0256400

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01827500

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT2026200

sp1ny

Language:Python700

lvc-vc

End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions

Language:PythonMIT8100

korean-romanizer

A Python library for Korean romanization

Language:PythonNOASSERTION9300

sherpa

Speech-to-text server framework with next-gen Kaldi

Language:C++Apache-2.048400

QuantTrading

NOASSERTION4000

PyConKR2023-ModelServing-BentoML

Pycon KR 2023 presentation

Language:HTMLMIT1300

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

MIT124300

SVCC23_FastSVC

Singing Voice Conversion Challenge 2023 Starter Kit: FastSVC Reimplementation

Language:Python10800

s3prl-vc

S3PRL-VC: A Voice Conversion Toolkit based on S3PRL

Language:PythonMIT8900

vdm

Language:Jupyter NotebookApache-2.028400

CLAP

Contrastive Language-Audio Pretraining

Language:PythonCC0-1.0126000

photometric_optimization

Photometric optimization code for creating the FLAME texture space and other applications

Language:PythonMIT50400

DualCycleGAN

Official implementation of DualCycleGAN for nonparallel audio super resolution

Language:PythonApache-2.04700

MB-iSTFT-VITS

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform

Language:PythonApache-2.040500

SiFiGAN

Official implementation of the source-filter HiFiGAN vocoder

Language:PythonMIT23400

nnsvs

Neural network-based singing voice synthesis library for research

Language:PythonMIT67000

Awesome-Gaze-Estimation

Awesome Curated List of Eye Gaze Estimation Paper

44100

gpu-burn

Multi-GPU CUDA stress test

Language:C++BSD-2-Clause127600

FACEGOOD-Audio2Face

http://www.facegood.cc

Language:PythonMIT177600

Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Language:Python984100

Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Language:Jupyter Notebook54700

BentoML

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

Language:PythonApache-2.0682100

YOLOX_AUDIO

Audio event detection model based on YOLOX

Language:PythonApache-2.08200

ort

Accelerate PyTorch models with ONNX Runtime

Language:PythonMIT35000

torchgpipe

A GPipe implementation in PyTorch

Language:PythonBSD-3-Clause79000

code-server

VS Code in the browser

Language:TypeScriptMIT6669700

VARA-TTS

Demo audio of VARA-TTS model

2000

beckgom