Create dataset loader for VLSP2020 MT

Question

SamuelCahyawijaya opened this issue 3 months ago · comments

Dataset	vlsp2020_mt_envi
Description	Parallel and monolingual data for training machine translation systems translating English texts into Vietnamese, with a focus on news domain. The data was crawled from high-quality bilingual or multilingual websites of news and one-speaker educational talks on various topics, mostly technology, entertainment, and design (hereby referred to as TED-like talks). The dataset also includes noisy movie subtitles from the OpenSubtitle dataset.
Subsets	-
Languages	vie
Tasks	Machine Translation
License	Unknown (unknown)
Homepage	https://github.com/thanhleha-kit/EnViCorpora
HF URL	-
Paper URL	-

Patrick Amadeus Irawan · Answer 1 · Tue Apr 09 2024 02:18:00 GMT+0800 (China Standard Time)

#self-assign