plzcallstella

This python script scrapes the GMU Speech Accent Archive and dumps the files out by participant ID with an excel manifest.

e.g.

recordings/
    0001.mp3
transcripts
    0001.gif
json/
    0001.json
manifest.xlsx
info.json

Usage

REQ:

# install deps
poetry install --no-dev

# run script
poetry run python scrape.py

Results will be output into output/<TIMESTAMP>/

This was written to provide data for linguistics students (i.e. phonetics transcription exercises).

This repackaging is intended for educational purposes. Credit goes completely to Steven H. Weinberger, George Maison University, and others.

P.S. The Speech Accent Archive is proof that you do not need 20MB of frontend javascript to make something truly valuble.

python script to scrape GMU Speech Accent Archive

Language:Python 98.5%Language:Shell 1.5%