International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Paper Poster | PDF Paper | HTML Paper | Citation
This repository contains the core implementation of the Cross-Speakre Encoding (CSE) and CSE-SOT network.
- ESPnet and its required dependencies
- Additional packages used for scoring can be found in
./scoring/requirements.txt
To use this code, please:
- Replace the original ESPnet code with code under the
./espnet2-patch
directory - Run ESPnet ASR recipe (refer to Librispeech recipe) using configurations under the
./config
directory - After ESPnet scoring, additionally perform permutation-invariant scoring with
./scoring/run_pi_scoring.sh
./run.sh
provides a running demo for more useful details. Please note that this code was developed under ESPnet 202209 version and could be incompatible with later versions.
If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.
@article{kang2024cross,
title={Cross-Speaker Encoding Network for Multi-Talker Speech Recognition},
author={Kang, Jiawen and Meng, Lingwei and Cui, Mingyu and Guo, Haohan and Wu, Xixin and Liu, Xunying and Meng, Helen},
journal={arXiv preprint arXiv:2401.04152},
year={2024}
}
Feel free to contact me if you have any question.
This repository is based on ESPnet speech processing toolkit, version 202209.