YuanGongND

followers

following

stars

MIT

Cambridge, MA

yuangongnd.github.io

Yuan Gong's repositories

ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

Language:Jupyter NotebookBSD-3-Clause1077 18 131

ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".

Language:PythonBSD-3-Clause358 7 34

ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Language:Python340 14 45

whisper-at

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

Language:PythonBSD-2-Clause304 10 30

cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

Language:PythonBSD-2-Clause219 5 28

gopt

Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".

Language:PythonBSD-3-Clause138 5 35

psla

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

Language:PythonBSD-3-Clause131 1 12

vocalsound

Dataset and baseline code for the VocalSound dataset (ICASSP2022).

Language:Jupyter Notebook96 2 6

uavm

Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".

Language:PythonBSD-2-Clause54 2 4

python-compute-eer

Simple Python script to compute equal error rate (EER) for machine learning model evaluation.

Language:Python36 1 2

ReMASC

ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems

Language:Python36 3 1

realtime-adversarial-attack

Code for IJCAI 2019 paper "Real-time Adversarial Attack".

Language:Python20 5 3

llm_speech_emotion_challenge

Language:Jupyter NotebookBSD-2-Clause1200

multichannel-antispoof

Code for SPL paper "Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method"

Language:PythonBSD-3-Clause5 2 1

awesome-whisper

🔊 Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI

CC0-1.0400

efficient-voice-antispoof

Language:Jupyter Notebook4 30

Awesome-Multimodal-Large-Language-Models

Latest Papers and Datasets on Multimodal Large Language Models

3 10

ESC-50

ESC-50: Dataset for Environmental Sound Classification

Language:PythonNOASSERTION200

kaldi-abbr

kaldi name convention note

2 10

yuangongnd.github.io

Language:HTML2 10

SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.

Language:PythonMIT100

audioset_tagging_cnn

Language:PythonMIT000

Autoregressive-Predictive-Coding

Autoregressive Predictive Coding: An unsupervised autoregressive model for speech representation learning

Language:Python000

docs

TensorFlow documentation

Language:Jupyter NotebookApache-2.0000

espnet

End-to-End Speech Processing Toolkit

Language:PythonApache-2.0000

kaldi

This is the official location of the Kaldi project.

Language:ShellNOASSERTION000

kaldi-io-for-python

Python functions for reading kaldi data formats. Useful for rapid prototyping with python.

Language:PythonApache-2.0000

pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Language:PythonMIT010

skynet-ddp-slurm-example

Example of using PyTorch DistributedDataParallel and SLURM on skynet

Language:Python000

tutorials

PyTorch tutorials.

Language:Jupyter NotebookBSD-3-Clause000