support the commonvoice dataset
dpriskorn opened this issue · comments
how to handle big downloads?
do the sentences have a unique persistent ID?
any API for lookup using the ID?
emailed mozilla 2 days ago to ask
No answer from Mozilla yet. Maybe the best way forward is to count lines in their file with sentences?