Xubo Liu's starred repositories
log-wmse-audio-quality
logWMSE, an audio quality metric with support for digital silence target. Useful for evaluating audio source separation systems, even when there are many audio tracks or stems.
Diff-Foley
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
resemble-enhance
AI powered speech denoising and enhancement
Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
StylerDALLE
Code for ICCV 2023 paper ✨ "StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model".
LLM-groundedVideoDiffusion
[ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
WavJourney
WavJourney: Compositional Audio Creation with LLMs
Speech-Prompts-Adapters
This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.
co-separation
Co-Separating Sounds of Visual Objects (ICCV 2019)
distributed-system
Creative and educational project for distributed system