Information Quantizer

This repository contains the code for the paper ``Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition'' (more features available soon).

@inproceedings{wang-etal-2022-iq,
  author={Liming Wang and Siyuan Feng and Mark Hasegawa-Johnson and Chang D. Yoo},
  title={Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
  year={2022}
}

Dependencies

ZeroSpeech 2021 baseline system
UnsupSeg
BEER
Other dependencies can be found in requirements.txt

How to run it?

Simply run bash run.sh for the small datasets we provided. To reproduce the results in the paper, please download the whole datasets and convert them in a similar format as the small datasets by the following steps:

Prepare datasets. Download the LibriSpeech dataset, manually cut out spoken word segments using information provided in resources/librispeech_word/librispeech_word.json. Also download the TIMIT dataset, convert the audio files to .wav and create the meta data files as done in resources/TIMIT/test_subset.
Modify the paths and variables in run.sh and configs/librispeech_word.conf.
Run bash run.sh.

lwang114 / InformationQuantizer

Information Quantizer

Dependencies

How to run it?

About

Languages