jaaack-wang / Chinese-fixed-phrases-idioms

A large corpus of Chinese fixed phrases and idioms scraped from a reputable educational website (30310 instances). 一个大型的中文成语及俗语语料库,内含30310条语例

Home Page:http://cy.5156edu.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Description 说明

一个大型的中文成语及俗语语料库,内含30310条语例,爬取自在线成语词典. 语料库以词典的形式存储于成语及俗语词典.json.

每条语例也都以字典形式表现,包括如下信息(如有):拼音,简拼,近义词,反义词,用法,解释,出处,例子,歇后语,谜语,成语故事,和链接。

成语及俗语词典.ipynb是对应的抓取代码。

A large corpus of Chinese fixed phrases and idioms scraped from a reputable educational website. The corpus currently contains 30310 instances of fixed phrases or idiom saved in Json format, or, in 成语及俗语词典.json.

Literally, 成语及俗语词典 means Fixed Phrases and Idioms Dictionary. In this dictionary, every instance of a fixed phrase or idiom itself is another dictionary with the following keys where applicable: Pinyin (拼音), acronym of Pinyin (简拼), Synonyms (近义词), Antonyms (反义词), usage (用法), interpretation (解释), origin (出处), example (例子), a two-part allegorical saying (谒后语, with the fixed phrase or idiom being the first part), riddle (谜语), background story (成语故事), and the link to the original webpage where the information was scraped (链接).

成语及俗语词典.ipynb is the original script used to scrape the related information and compile the corpus.

About

A large corpus of Chinese fixed phrases and idioms scraped from a reputable educational website (30310 instances). 一个大型的中文成语及俗语语料库,内含30310条语例

http://cy.5156edu.com


Languages

Language:Jupyter Notebook 100.0%