SaoYear

Nian's starred repositories

mamba

Mamba SSM architecture

Language:PythonApache-2.01190200

RealMAN

A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

Language:Python4800

ChatTTS

A generative speech model for daily dialogue.

Language:PythonAGPL-3.02824200

dcase2023_task4b_baseline

Baseline code for DCASE 2023 task 4 B

Language:Python1200

SAR-SSL

A python implementation of “Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer”

Language:PythonMIT1500

MetaAF

Control adaptive filters with neural networks.

Language:Python21500

pytorch_misc

Code snippets created for the PyTorch discussion board

Language:Python54000

LibMTL

A PyTorch Library for Multi-Task Learning

Language:PythonMIT188700

byol-a

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

Language:PythonNOASSERTION20300

SaProt

[ICLR'24 spotlight] Saprot: Protein Language Model with Structural Alphabet

Language:PythonMIT28900

SED_SoftLabel

Sound Event Classification With Soft Label

Language:Python300

pb_sed

Paderborn Sound Event Detection

Language:PythonMIT6800

OI-wiki

:star2: Wiki of OI / ICPC for everyone. （某大型游戏线上攻略，内含炫酷算术魔法）

Language:TypeScript1982800

uoe_speech_processing_course

Language:Jupyter NotebookMIT2700

HTS-Audio-Transformer

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Language:PythonMIT33500

FS-EEND

The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors". [ICASSP 2024]

Language:PythonMIT7200

RVAE-EM

Official PyTorch implementation of "RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutive transfer function" [ICASSP2024]

Language:PythonMIT3500

ATST-RCT

ATST-RCT model for DCASE 2022 task4.

Language:Python200

RCT

This repo gives the code for the official implementation of RCT.

Language:Python1200

ATST-SED

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Language:Jupyter NotebookMIT7000

UMA-ASR

This repository is the official implementation of "Unimodal Aggregation for CTC-based Speech Recognition".

Language:Shell1200

ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

Language:Jupyter NotebookBSD-3-Clause107700

sed_scores_eval

Language:PythonMIT2500

FDY-SED

Language:PythonMIT7500

FN-SSL

The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization

Language:Python6800

ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".

Language:PythonBSD-3-Clause35800

ontology

The Audio Set Ontology aims to provide a comprehensive set of categories to describe sound events.

63700

AudioSetOntologyTree

Tree visualization of the AudioSet Ontology - https://github.com/audioset/ontology

Language:HTML1600

download_audioset

📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).

Language:PythonNOASSERTION9700

McNet

The official repo: "McNet: Fuse Multiple Cues for Multichannel Speech Enhancement", ICASSP 2023

Language:Python9500