A list of Korean open speech corpora for Speech Technology research and development.
This list has a preference for free (i.e. no $ cost) and truly open corpora (e.g. released under a Creative Commons license or a Community Data License Agreement). Not all these corpora may meet those criteria, but all the following corpora are accessible and usable for research and/or commercial use.
Feel free to propse additions to the list!
It is strongly inspired by open-speech-corpora
last updated on Feb. 24, 2021
CORPUS | # HOURS OF TRAINING DATA | # SPEAKERS | DOWNLOAD | LICENSE | TESTSET | ETC | |
---|---|---|---|---|---|---|---|
KSS dataset | 12 | 1(female, professional voice actress) | https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset | No commercial | 44.1kHz | ||
Zeroth_korean | 52.8 | 115 | http://openslr.org/40/ | CC BY 4.0 | O | ||
Pansori-TEDxKR | 3 | 41 | http://openslr.org/58/ | CC BY-NC-ND 4.0 | Youtube의 TEDxKR 오디오 | ||
Deeply Korean read speech corpus | 3 | http://openslr.org/97/ | CC BY-NC-ND 4.0 | ||||
Deeply parent-child vocal interaction dataset | 16 | http://openslr.org/98/ | CC BY-NC-ND 4.0 | ||||
MINDsLab-ETRI VOTE400 | 300(dialogue), 100(reading) | 52 | https://ai4robot.github.io/mindslab-etri-vote400/# | ETRI 허가 후 사용 가능 | 노인 음성데이터 | ||
Clova call | 130 | https://github.com/clovaai/ClovaCall | NAVER 허가 후 사용 가능 | O | 전화데이터 | ||
AIHUB | 1000 | https://aihub.or.kr/aidata/105/download | AIHUB 허가 후 사용 가능 | O | 대화 | ||
AIHUB2 | 2000 | https://aihub.or.kr/aidata/7968 | AIHUB 허가 후 사용 가능 | 방송녹음/ 미출시 | |||
AISTARTHON | https://aihub.or.kr/open_data/ai_starthon_x_naver/download | AIHUB 허가 후 사용 가능 | |||||
잡음처리 및 음성검출을 위한 스마트폰 환경 연속어 음성 데이터 | https://aiopen.etri.re.kr/service_dataset.php?category=voice | ETRI 허가 후 사용 가능 | |||||
음성인터페이스 개발을 위한 어린이음성데이터 | https://aiopen.etri.re.kr/service_dataset.php?category=voice | ETRI 허가 후 사용 가능 |