BaekMS / SepReformer

Official repository of SepReformer for speech separation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SepReformer

Official implementation from the following paper:

“Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation” Paper Link(Arxiv)

PWC PWC PWC

Untitled

We propose SepReformer, a novel approach to speech separation using an asymmetric encoder-decoder network named SepReformer.

Demo Pages: Sample Results of speech separation by SepReformer

Requirement

  • python 3.10
  • torch 2.1.2
  • torchaudio 2.1.2
  • pyyaml 6.0.1
  • ptflops
  • wandb
  • mir_eval

Features

  • You can log the training process by wandb as well as tensorboard.
  • Support dynamic mixing (DM) in training

Data Preparation

  • For training or evaluation, you need dataset and scp file
    1. Prepare dataset for speech separation (eg. WSJ0-2mix)
    2. create scp file using data/crate_scp/*.py

Training

  • If you want to train the network, you can simply trying by
    • set the scp file in ‘models/SepReformer_Base_WSJ0/configs.yaml’

    • run training as

      python run.py --model SepReformer_Base_WSJ0 --engine-mode train

Inference

  • Simply evaluating a model without saving output as audio files

    python run.py --model SepReformer_Base_WSJ0 --engine-mode test
  • Evaluating with output wav files saved

    python run.py --model SepReformer_Base_WSJ0 --engine-mode test_wav --out_wav_dir '/your/save/directoy[optional]'

Training Curve

  • For SepReformer-B with WSJ-2MIX, the training and validation curve is as follows: Untitled


Untitled

Untitled

Citation

If you find this repository helpful, please consider citing:

@misc{shin2024separate,
      title={Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation}, 
      author={Ui-Hyeop Shin and Sangyoun Lee and Taehan Kim and Hyung-Min Park},
      year={2024},
      eprint={2406.05983},
      archivePrefix={arXiv},
}

TODO

  • To add the pretrained model.

About

Official repository of SepReformer for speech separation


Languages

Language:Python 83.2%Language:MATLAB 16.8%