SPEAR-tools

This repository countains the code to compute the baseline and run the metrics used in the SPEAR challenge. To download the data go here.

Terminal commands to download are also provided below. Use the flag --continue-at if the download aborted to early.

cd <your-spear-directory>

# All core Train data. Size of 14, 20, 20 an 24GB compressed for D1, D2, D3 and D4 respectively.
curl https://spear2022data.blob.core.windows.net/spear-data/CoreTrainDataset1_v1.1.tar -o CoreTrainDataset1_v1.1.tar
curl https://spear2022data.blob.core.windows.net/spear-data/CoreTrainDataset2_v1.1.tar -o CoreTrainDataset2_v1.1.tar
curl https://spear2022data.blob.core.windows.net/spear-data/CoreTrainDataset3_v1.1.tar -o CoreTrainDataset3_v1.1.tar
curl https://spear2022data.blob.core.windows.net/spear-data/CoreTrainDataset4_v1.1.tar -o CoreTrainDataset4_v1.1.tar

# ATFs
curl https://spear2022data.blob.core.windows.net/spear-data/Device_ATFs.h5 --create-dirs -o Miscellaneous/Array_Transfer_Functions/Device_ATFs.h5

# All core Dev data (size of 24GB unpacked, 35GB decompressed)
curl https://spear2022data.blob.core.windows.net/spear-data/CoreDev_1.1.tar -o CoreDev_v1.1.tar

Note that the ATFs should be downloaded in a directory called Miscellaneous/Array_Transfer_Functions at the same level as the extracted Main and Extra folders for the example bash script to run. The file is already included in CoreDev but not in the CoreTrain files.

Analysis tools

The provided framework consists mainly of three python scripts (more details are found in the powerpoint presentation in assets):

spear_enhance.py computes the baseline enhancement (MVDR beamformer) and output processed audio files.
spear_evaluate.py computes the metrics on the processed audio files and output a csv file with all metrics for all chunks.
spear_visualise.py plot the results of the computed metrics csv file and save them.

The resulting metrics csv files for passthrough and baseline enhancements are already generated and provided in metrics. To compute those files yourself, a bash script run_example.sh is provided. Instructions are found below.

Installation

We assume that conda is installed. If not follow the instruction here.

# First clone the repo
git clone https://github.com/ImperialCollegeLondon/spear-tools
cd spear-tools

# Second create & activate environment with conda
conda env create -n <your-env-name> -f env-spear-metrics.yml
conda activate <your-env-name>
cd analysis

For windows users, Microsoft C++ build tools are needed when building the environment.

How to Use

To run an example, use the command bash run_example.sh in the analysis folder. The following parameters can be modified

# Setup soft links to make paths easier
ln -sf <outputs-folder-full-path> my_results  # Output path for the processed audio, metrics csv and plots
ln -sf <dataset-folder-full-path> spear_data  # Root folder of the SPEAR datasets containing the folders Main and Extra

# Define variables
SET='Dev'               # 'Train', 'Dev' or 'Eval'.
DATASET=2               # 1 to 4.
SESSION='10'            # 1 to 9 for Train, 10 to 12 for Dev, 13 to 15 for Eval. Select '' for all session of current Dev.
MINUTE='00'             # 00 to 30 (depends on the current session). '' for all minutes of current session.
PROCESSING='baseline'   # Name of desired processing. 'baseline' by default
REFERENCE='passthrough' # Name of desired reference enhancement. 'passthrough' by default.

The rest of the script computes the enhancement, metrics evaluation and plotting for both passthrough and the desired METHOD.

python spear_enhance.py   <input-root> <audio-dir> --method_name <method-name> --list_cases <list-cases>
python spear_evaluate.py  <input-root> <audio-dir> <segments-csv> <metrics-csv> --list_cases <list-cases>
python spear_visualise.py <output-root> <metrics-csv-ref> <metrics-proc> --reference_name <reference-name> --method_name <method-name>

List of metrics

Category	Metric	Metric Abbreviation	Reference	Python Package
SNR	Signal to Noise Ratio	SNR	[-]	PySEPM
SNR	Frequency weighted Segmental SNR	fwSegSNR	Hu et al. 2008 ¹	PySEPM
SI	Short Time Objective Intelligibility	STOI	Taal et al. 2011 ²	PYSTOI
SI	Extended STOI	ESTOI	Jensen et al. 2016 ³	PYSTOI
SI	Modified Binaural STOI	MBSTOI	Andersen et al. 2018 ⁴	CLARITY
SI	Speech to Artifacts Ratio	SAR	Vincent et al. 2006 ⁵	Speech Metric
SI	Image to Spatial Ratio	ISR	Vincent et al. 2006 ⁵	Speech Metric
SI	Speech to Distortion Ratio	SDR	Vincent et al. 2006 ⁵	Speech Metric
SI	Scale Invariant SDR	SI-SDR	Roux et al. 2019 ⁶	Speech Metric
SI	Hearing Aid Speech Perception Index	HASPI	Kates et al. 2014 ⁷ ⁸	CLARITY
SQ	Perceptual Evaluation of Speech Quality	PESQ	Rix et al 2001 ⁹	Speech Metric
SQ	PESQ Narrow Band	PESQ-NB	Rix et al 2001 ⁹	Speech Metric

References

Y. Hu and P. C. Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 1, pp. 229–238, 2008. ↩
C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011. ↩
J. Jensen and C. H. Taal, “An algorithm for predicting the intelligibility of speech masked by modulated noise maskers,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 11, pp. 2009–2022, 2016. ↩
A. H. Andersen, J. M. de Haan, Z.-H. Tan, and J. Jensen, “Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions,” Speech Communication, vol. 102, pp. 1–13, 2018. ↩
E. Vincent, R. Gribonval, and C. F ́evotte, “Performance measurement in blind audio source separation,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp. 1462–1469, 2006. ↩ ↩² ↩³
J. Barker, S. Graetzer, and T. Cox, “Software to support the 1st Clarity Enhancement Challenge [software and data collection],” https://doi.org/10.5281/zenodo.4593856, 2021. ↩
J. M. Kates and K. H. Arehart, “The hearing-aid speech perception index (haspi),” Speech Communication, vol. 65, pp. 75–93, 2014. ↩
J. M. Kates and K. H. Arehart, “The hearing-aid speech perception index (haspi) version 2,” Speech Communication, vol. 131, pp. 35–46, 2021. ↩
A. Rix, J. Beerends, M. Hollier, and A. Hekstra, “Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs,” in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), vol. 2, pp. 749–752 vol.2, 2001. ↩ ↩²

YoshikiMas / spear-tools

SPEAR-tools

Analysis tools

Installation

How to Use

List of metrics

References

About

Languages

SPEAR-tools

Analysis tools

Installation

How to Use

List of metrics

References

Footnotes

About

Languages