Beast code in Giters

Yin Xinlei's starred repositories

audiocaps

🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps

Language:PythonMIT13900

open_flamingo

An open-source framework for training large multimodal models.

Language:PythonMIT369000

UTMOS22

UT-Sarulab MOS prediction system using SSL models

Language:PythonMIT17300

sigsep-mus-db

Python parser and tools for MUSDB18 Music Separation Dataset

Language:PythonMIT16100

WritingAIPaper

Writing AI Conference Papers: A Handbook for Beginners

111200

tango

A family of diffusion models for text-to-audio generation.

Language:PythonNOASSERTION100000

llm-tse

Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction (LLM-TSE)

Language:JavaScript3200

audio-retrieval-benchmark

Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".

Language:Python4500

versatile_audio_super_resolution

Versatile audio super resolution (any -> 48kHz) with AudioSR.

Language:PythonMIT111500

audio-flamingo

PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.

Language:PythonMIT17600

LLM101n

LLM101n: Let's build a Storyteller

2927900

AudioEditingCode

Language:Python13200

GESS

Code for GeSS: Benchmarking Geometric Deep Learning under Scientific Applications with Distribution Shifts

Language:PythonMIT1300

Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

Language:Python20500

AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Language:Python57500

mtg-jamendo-dataset

Metadata, scripts and baselines for the MTG-Jamendo dataset

Language:PythonApache-2.026700

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION611000

AudioSep

Official implementation of "Separate Anything You Describe"

Language:PythonMIT159600

EnCLAP

Official Implementation of EnCLAP (ICASSP 2024)

Language:PythonMIT8800

VGGSound

VGGSound: A Large-scale Audio-Visual Dataset

Language:PythonNOASSERTION28700

Zero_Shot_Audio_Source_Separation

The official code repo for "Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data", in AAAI 2022

Language:PythonMIT18600

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.02548700

ustcthesis

LaTeX template for USTC thesis

Language:TeXLPPL-1.3c161400

ACT

Source code for the paper 'Audio Captioning Transformer'

Language:Jupyter Notebook4800

AudioLDM-training-finetuning

AudioLDM training, finetuning, evaluation and inference.

Language:PythonMIT19700

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT1978400

melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis

Language:PythonMIT96400

WavCraft

Official repo for WavCraft, an AI agent for audio creation and editing

Language:PythonNOASSERTION65000

visqol

Perceptual Quality Estimator for speech and audio

Language:C++Apache-2.068600

AudioLDM2

Text-to-Audio/Music Generation

Language:PythonNOASSERTION226300