kjw11 / CSEnet-ASR

Cross-Speaker Encoding Network for Multi-talker Speech Recognition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Paper Poster | PDF Paper | HTML Paper | Citation

This repository contains the core implementation of the Cross-Speakre Encoding (CSE) and CSE-SOT network.

Requirments

  • ESPnet and its required dependencies
  • Additional packages used for scoring can be found in ./scoring/requirements.txt

Usage

To use this code, please:

  1. Replace the original ESPnet code with code under the ./espnet2-patch directory
  2. Run ESPnet ASR recipe (refer to Librispeech recipe) using configurations under the ./config directory
  3. After ESPnet scoring, additionally perform permutation-invariant scoring with ./scoring/run_pi_scoring.sh

./run.sh provides a running demo for more useful details. Please note that this code was developed under ESPnet 202209 version and could be incompatible with later versions.

Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.

@article{kang2024cross,
  title={Cross-Speaker Encoding Network for Multi-Talker Speech Recognition},
  author={Kang, Jiawen and Meng, Lingwei and Cui, Mingyu and Guo, Haohan and Wu, Xixin and Liu, Xunying and Meng, Helen},
  journal={arXiv preprint arXiv:2401.04152},
  year={2024}
}

Contact

Feel free to contact me if you have any question.

Acknowledgements

This repository is based on ESPnet speech processing toolkit, version 202209.

About

Cross-Speaker Encoding Network for Multi-talker Speech Recognition

License:MIT License


Languages

Language:Python 94.1%Language:Shell 5.9%