shengzhang0222

shengzhang0222's starred repositories

SC-Wind-Noise-Generator

Generate synthetic wind noise signals based on a wind speed profile.

Language:PythonMIT1800

Matcha-TTS

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Language:Jupyter NotebookMIT53700

FQSE

Fully Quantized Neural Networks For Speech Enhancement

Language:PythonApache-2.04900

G2Net

The implementation of G2Net, the extension of GaGNet and is in submission to T-ASLP

Language:PythonMIT1900

SDDNet

Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.

Language:Python5700

SEMamba

This is the official implementation of the SEMamba paper.

Language:Python10000

CMGAN

Conformer-based Metric GAN for speech enhancement

Language:PythonMIT29300

ChatTTS

A generative speech model for daily dialogue.

Language:PythonAGPL-3.02885700

Sixty-years-of-frequency-domain-monaural-speech-enhancement

Language:Python10400

DPCRN_DNS3

Implementation of paper "DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement"

Language:Python17400

3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Language:PythonApache-2.098600

ExARN-Target-Speaker-Extraction-with-Attentive-Recurrent-Networks

Language:Python600

ASL

Official Pytorch Implementation of: "Asymmetric Loss For Multi-Label Classification"(ICCV, 2021) paper

Language:PythonMIT71400

MP-SENet

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

Language:PythonMIT26900

gtcrn

The official implementation of GTCRN, an ultra-lite speech enhancement model.

Language:PythonMIT14900

query-bandit

Banquet: A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

Language:Jupyter NotebookMIT2000

FN-SSL

The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization

Language:Python7000

perception_scale

Human ear perception scales and feature（mel、bark、ERB、gammatone）

Language:C2400

MossFormer2

This is the audio sample repository for speech separation model "MossFormer2".

Language:PythonMIT7000

parler-tts

Inference and training library for high-quality TTS models.

Language:PythonApache-2.0294200

RepDistiller

[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods

Language:PythonBSD-2-Clause212600

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf

Language:Python77200

denoiser

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

Language:PythonNOASSERTION162200

Bert-VITS2

vits2 backbone with multilingual-bert

Language:PythonAGPL-3.0762800