Xubo Liu (liuxubo717)

liuxubo717

Geek Repo

Company:CVSSP, University of Surrey

Home Page:https://liuxubo717.github.io/

Github PK Tool:Github PK Tool

Xubo Liu's starred repositories

log-wmse-audio-quality

logWMSE, an audio quality metric with support for digital silence target. Useful for evaluating audio source separation systems, even when there are many audio tracks or stems.

Language:PythonLicense:Apache-2.0Stargazers:30Issues:0Issues:0

Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Language:PythonLicense:Apache-2.0Stargazers:131Issues:0Issues:0

agc

Audiogen Codec

Language:PythonLicense:MITStargazers:103Issues:0Issues:0

mamba

Mamba SSM architecture

Language:PythonLicense:Apache-2.0Stargazers:11590Issues:0Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:4201Issues:0Issues:0

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonLicense:MITStargazers:1017Issues:0Issues:0
License:BSD-3-ClauseStargazers:7Issues:0Issues:0

resemble-enhance

AI powered speech denoising and enhancement

Language:PythonLicense:MITStargazers:1098Issues:0Issues:0

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:1240Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:145Issues:0Issues:0

distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Language:PythonLicense:MITStargazers:3344Issues:0Issues:0

StylerDALLE

Code for ICCV 2023 paper ✨ "StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model".

Language:PythonStargazers:17Issues:0Issues:0

LLM-groundedVideoDiffusion

[ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper

Language:PythonStargazers:104Issues:0Issues:0

U-FFIA

The audio-visual fusion method for FFIA

Language:PythonStargazers:5Issues:0Issues:0

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:10532Issues:0Issues:0

lam4fsl

An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"

Language:PythonLicense:MITStargazers:25Issues:0Issues:0

AudioSep

Official implementation of "Separate Anything You Describe"

Language:PythonLicense:MITStargazers:1501Issues:0Issues:0

AudioLDM2

Text-to-Audio/Music Generation

Language:PythonLicense:NOASSERTIONStargazers:2150Issues:0Issues:0

WavJourney

WavJourney: Compositional Audio Creation with LLMs

Language:PythonLicense:NOASSERTIONStargazers:510Issues:0Issues:0

visprog

Official code for VisProg (CVPR 2023 Best Paper!)

Language:PythonLicense:Apache-2.0Stargazers:669Issues:0Issues:0

Speech-Prompts-Adapters

This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.

Stargazers:99Issues:0Issues:0

ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Language:PythonStargazers:334Issues:0Issues:0
Language:PythonLicense:MITStargazers:31Issues:0Issues:0

co-separation

Co-Separating Sounds of Visual Objects (ICCV 2019)

Language:PythonLicense:CC-BY-4.0Stargazers:91Issues:0Issues:0

AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Language:PythonLicense:NOASSERTIONStargazers:9894Issues:0Issues:0

ReAtt

Retrieval as Attention

Language:PythonStargazers:78Issues:0Issues:0

VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Language:PythonLicense:MITStargazers:248Issues:0Issues:0

WavCaps

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.

Language:PythonStargazers:187Issues:0Issues:0

distributed-system

Creative and educational project for distributed system

Language:GoLicense:MITStargazers:19Issues:0Issues:0

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Language:PythonLicense:NOASSERTIONStargazers:2334Issues:0Issues:0