indra622 / Korean-open-speech-corpora

A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Korean-open-speech-corpora

A list of Korean open speech corpora for Speech Technology research and development.

This list has a preference for free (i.e. no $ cost) and truly open corpora (e.g. released under a Creative Commons license or a Community Data License Agreement). Not all these corpora may meet those criteria, but all the following corpora are accessible and usable for research and/or commercial use.

Feel free to propse additions to the list!

It is strongly inspired by open-speech-corpora

last updated on Feb. 24, 2021

CORPUS # HOURS OF TRAINING DATA # SPEAKERS DOWNLOAD LICENSE TESTSET ETC
KSS dataset 12 1(female, professional voice actress) https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset No commercial 44.1kHz
Zeroth_korean 52.8 115 http://openslr.org/40/ CC BY 4.0 O
Pansori-TEDxKR 3 41 http://openslr.org/58/ CC BY-NC-ND 4.0 Youtube의 TEDxKR 오디오
Deeply Korean read speech corpus 3 http://openslr.org/97/ CC BY-NC-ND 4.0
Deeply parent-child vocal interaction dataset 16 http://openslr.org/98/ CC BY-NC-ND 4.0
MINDsLab-ETRI VOTE400 300(dialogue), 100(reading) 52 https://ai4robot.github.io/mindslab-etri-vote400/# ETRI 허가 후 사용 가능 노인 음성데이터
Clova call 130 https://github.com/clovaai/ClovaCall NAVER 허가 후 사용 가능 O 전화데이터
AIHUB 1000 https://aihub.or.kr/aidata/105/download AIHUB 허가 후 사용 가능 O 대화
AIHUB2 2000 https://aihub.or.kr/aidata/7968 AIHUB 허가 후 사용 가능 방송녹음/ 미출시
AISTARTHON https://aihub.or.kr/open_data/ai_starthon_x_naver/download AIHUB 허가 후 사용 가능
잡음처리 및 음성검출을 위한 스마트폰 환경 연속어 음성 데이터 https://aiopen.etri.re.kr/service_dataset.php?category=voice ETRI 허가 후 사용 가능
음성인터페이스 개발을 위한 어린이음성데이터 https://aiopen.etri.re.kr/service_dataset.php?category=voice ETRI 허가 후 사용 가능

About

A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

License:MIT License