Mr-Pepe / pengyou-data-generator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repository handles the parsing of several data sources and merges them together in one SQLite3 database to be used be the pengyou app.

In order to build the database using all resources, run the run_all.py file.

Sources are obtained from the following links:

CC-CEDICT Open Source Dictionary

The dictionary is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

Stroke Order Data

Download all.json from here.

This data comes from the Make Me A Hanzi project, which extracted the data from fonts by Arphic Technology, a Taiwanese font forge that released their work under a permissive license in 1999. You can redistribute and/or modify this data under the terms of the Arphic Public License as published by Arphic Technology Co., Ltd. A copy of this license can be found in here.

Character Decomposition Data

The data is shipped with the app but unused at the moment. Licensed under several licenses, e.g., Creative Commons BY-SA 3.0.

Character Frequency Data

Character frequency data calculated on the Chinese Wikipedia

Traditional to Simplified Transformation

Licensed under an Apache License 2.0.

Unihan

Used when CEDICT doesn't have a definition.

About


Languages

Language:HTML 91.7%Language:Python 8.3%