sph-mn / hanyu

chinese language data and dictionary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

chinese language data and dictionary

dictionary

dictionary that sorts results by word frequency and character count. it is a single file, html/hanyu-dictionary.html, and needs to be served via http for the javascript to run in the browser. also hosted here.

data files

see under data/

  • frequency-pinyin-translation.csv: words with pinyin and translation sorted by frequency
  • cedict.csv: filtered csv version of cedict with one translation per line
  • character-strokes-composition.csv: characters with stroke count and composition
  • table-of-general-standard-chinese-characters.csv: the official character list including pronunciations
  • characters-by-pinyin.csv
  • words-by-type/: separated by verb, noun, adjective, and so on
  • hsk.csv and hsk-pinyin-translations.csv
  • character-learning.csv: characters sorted by frequency, with readings and number of words with this reading, false pronunciations for guessing and syllable commonness among all characters, compositions, character meaning, and example words
  • character-overlap.csv: characters and characters with the most shared components
  • characters-repeated-components.csv: characters that consist of a repetition of another character
  • ... and more

data sources

license

creative commons share-alike

development

  • ./exe/update-dictionary to build html/hanyu-dictionary.html from html/hanyu-dictionary-template.html
  • the main code file is js/main.coffee

hanzi-convert

a command-line utility to convert text.

convert marks to numbers:

echo fāshāo shì yīn | ./exe/hanzi-convert --numbers
fa1shao1 shi4 yin1

convert traditional to simplified:

echo 發燒試音 | ./exe/hanzi-convert --simplify
发烧试音

convert from hanzi to pinyin:

echo 发烧试音 | ./exe/hanzi-convert --pinyin
fa1shao1 shi4 yin1

convert from pinyin to hanzi:

echo fa1shao1 shi4 yin1 | ./exe/hanzi-convert --hanzi
发烧 是/事/试/市/式/室/世/仕/侍/势/嗜/噬/士/奭/弑/忕/恃/戺/拭/揓/柿/栻/氏/澨/示/筮/舐/莳/螫/视/誓/谥/贳/轼/逝/适/释/铈/饰/𬤊 因/阴/喑/垔/堙/姻/愔/慇/殷/氤/洇/瘖/禋/筃/茵/裀/铟/音/骃/𬘡/𬮱

alternatives, like basically every output of this project, sorted by frequency.

About

chinese language data and dictionary


Languages

Language:HTML 99.1%Language:CoffeeScript 0.8%Language:JavaScript 0.1%Language:Shell 0.0%