Wooseok Han's starred repositories
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
faster-whisper
Faster Whisper transcription with CTranslate2
NoiseTorch
Real-time microphone noise suppression on Linux.
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
promptbase
All things prompt engineering
whisper-jax
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
text-generation-webui-colab
A colab gradio web UI for running Large Language Models
WhisperLive
A nearly-live implementation of OpenAI's Whisper.
FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
resource-stream
CUDA related news and material links
train-CLIP
A PyTorch Lightning solution to training OpenAI's CLIP from scratch.
voicebox-pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
Awesome-Korean-Speech-Recognition
한국어 음성인식 STT API 리스트. 각 성능 벤치마크.
Whispering-LLaMA
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
UnitSpeech
An official implementation of "UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data"
ConfidenceIntervals
Confidence interval computation for evaluation in machine learning using the bootstrapping approach
whisper-onnx-tensorrt
ONNX and TensorRT implementation of Whisper