Create dataset loader for VLSP2020 MT
SamuelCahyawijaya opened this issue · comments
Samuel Cahyawijaya commented
Dataloader name: vlsp2020_mt_envi/vlsp2020_mt_envi.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?vlsp2020_mt_envi
Dataset | vlsp2020_mt_envi |
---|---|
Description | Parallel and monolingual data for training machine translation systems translating English texts into Vietnamese, with a focus on news domain. The data was crawled from high-quality bilingual or multilingual websites of news and one-speaker educational talks on various topics, mostly technology, entertainment, and design (hereby referred to as TED-like talks). The dataset also includes noisy movie subtitles from the OpenSubtitle dataset. |
Subsets | - |
Languages | vie |
Tasks | Machine Translation |
License | Unknown (unknown) |
Homepage | https://github.com/thanhleha-kit/EnViCorpora |
HF URL | - |
Paper URL | - |
Patrick Amadeus Irawan commented
#self-assign