liuxubo717

followers

following

stars

CVSSP, University of Surrey

https://liuxubo717.github.io/

Xubo Liu's starred repositories

log-wmse-audio-quality

logWMSE, an audio quality metric with support for digital silence target. Useful for evaluating audio source separation systems, even when there are many audio tracks or stems.

Language:PythonApache-2.03000

Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Language:PythonApache-2.013100

agc

Audiogen Codec

Language:PythonMIT10300

mamba

Mamba SSM architecture

Language:PythonApache-2.01159000

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT420100

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonMIT101700

APT

BSD-3-Clause700

resemble-enhance

AI powered speech denoising and enhancement

Language:PythonMIT109800

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION124000

musicfm

Language:PythonNOASSERTION14500

distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Language:PythonMIT334400

StylerDALLE

Code for ICCV 2023 paper ✨ "StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model".

Language:Python1700

LLM-groundedVideoDiffusion

[ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper

Language:Python10400

U-FFIA

The audio-visual fusion method for FFIA

Language:Python500

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookNOASSERTION1053200

lam4fsl

An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"

Language:PythonMIT2500

AudioSep

Official implementation of "Separate Anything You Describe"

Language:PythonMIT150100

AudioLDM2

Text-to-Audio/Music Generation

Language:PythonNOASSERTION215000

WavJourney

WavJourney: Compositional Audio Creation with LLMs

Language:PythonNOASSERTION51000

visprog

Official code for VisProg (CVPR 2023 Best Paper!)

Language:PythonApache-2.066900

Speech-Prompts-Adapters

This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.

9900

ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Language:Python33400

CLIPSep

Language:PythonMIT3100

co-separation

Co-Separating Sounds of Visual Objects (ICCV 2019)

Language:PythonCC-BY-4.09100

AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Language:PythonNOASSERTION989400

ReAtt

Retrieval as Attention

Language:Python7800

VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Language:PythonMIT24800

WavCaps

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.

Language:Python18700

distributed-system

Creative and educational project for distributed system

Language:GoMIT1900

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Language:PythonNOASSERTION233400