Nexdata-AI / 1990000-Groups-Chinese-Czech-Parallel-Corpus-Data

1990000-Groups-Chinese-Czech-Parallel-Corpus-Data

Home Page:https://www.nexdata.ai/datasets/nlu/1336?source=Github

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

1990000-Groups-Chinese-Czech-Parallel-Corpus-Data

Description

1,990,000 sets of Chinese and Czech language parallel translation corpus, data storage format is txt document. Data cleaning, desensitization, and quality inspection have been carried out, which can be used as a basic corpus for text data analysis and in fields such as machine translation. For more details, please refer to the link:https://www.nexdata.ai/datasets/nlu/1336?source=Github

Storage format

TXT

Data content

Chinese-Czech Parallel Corpus Data, content has been preliminarily categorized, covering the fields of technology, healthcare, tourism, spoken, news and military.

Data size

1.99 million pairs of Chinese-Czech Parallel Corpus Data.

Language

Chinese, Czech

Application scenario

machine translation

Licensing Information

Commercial License