weimeng23 / audio-speech-datasets

:scroll: A list of various Audio/Speech datasets about Speech Recognition, Speech Synthesis, Noise, Audio Tagging/Sound Event Detection, Speaker Diarization, Speaker Recognition, (Inverse) Text normalization, Speech Translation, Multilingual, etc. (continuously update)

Home Page:https://github.com/weimeng23/audio-speech-datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Audio/Speech Datasets

A list of various Audio/Speech datasets about Speech Recognition, Speech Synthesis, Noise, Audio Tagging/Sound Event Detection, Speaker Diarization, Speaker Recognition, (Inverse) Text normalization, Speech Translation, Multilingual, etc. (continuously update)

Table of contents generated with markdown-toc

Overview

  • Task
    • ASR
    • TTS
    • Noise
    • Audio/Sound
    • SD
    • SR
    • TN/ITN
    • ST
  • Language
    • chinese
    • english
    • ohter

Task

Speech Recognition

chinese

Name Duration(hours) Links Comments
THCHS-30 30 [SLR18] train 30 speakers, 10893 utterances
test 10 speakers, 2496 utterances
Aishell 179 [SLR33] 400 speakers
Aishell2 1000 [Website] if available, 1991 speakers
Free ST Chinese Mandarin (ST-CMDS) 110 [SLR38] 855 speakers, 102600 utterances
Primewords Chinese Corpus Set 1 99 [SLR47] 296 native Chinese speakers
aidatatang_200zh 200 [SLR62] 600 speakers
aidatatang_1505zh 1505 [Github] if available
MAGICDATA Mandarin Read 755 [SLR68] 1080 speakers
MAGICDATA Mandarin Conversational (RAMC) 180 [SLR123] 663 speakers
AliMeeting (M2MeT) 118.75 (train/dev/test 104.75/4/10) [SLR119] ASR, SD
WenetSpeech 10000+ [SLR121]
[Github]
[Website]
TAL-ASR 100 [Website] 80+ speakers
TAL-CSASR 587 [Website] code-switching, 200+ speakers
didispeech if available

english

Name Duration(hours) Links Comments
LibriSpeech 1000 [SLR12]
[LM]
GigaSpeech 33,000+ for unsupervised
10,000 for supervised
[Github]
Multilingual LibriSpeech (MLS) [SLR94] Multilingual
libri-light 60,000 unlabelled speech [Github] pretraining, unsupervised, semi-supervised
libriheavy 50,000 [Github] casing, punctuation, context
Spgispeech
People's Speech

Speech Synthesis

chinese

Name Duration(hours) Links Comments
AISHELL-3 85 [Website] 44.1k, 218 native Chinese spearkers, 88035 utterances
LibriTTS

Noise

Name Duration(hours) Links Comments
MUSAN [SLR17]
Aachen Impulse Response database (AIR) [SLR20]
Simulated Room Impulse Response Database [SLR26]
Room Impulse Response and Noise Database [SLR28]

Audio Tagging/Sound Event Detection

Speaker Diarization

Name Duration(hours) Links Comments
AliMeeting (M2MeT) 118.75 (train/dev/test 104.75/4/10) [SLR119] ASR, SD

Speaker Recognition

(Inverse) Text normalization

Speech Translation

GigaST

GigaS2S

Reference

About

:scroll: A list of various Audio/Speech datasets about Speech Recognition, Speech Synthesis, Noise, Audio Tagging/Sound Event Detection, Speaker Diarization, Speaker Recognition, (Inverse) Text normalization, Speech Translation, Multilingual, etc. (continuously update)

https://github.com/weimeng23/audio-speech-datasets

License:Creative Commons Attribution Share Alike 4.0 International