sotch-pr35mac / syng

A free, open source, cross-platform, Chinese-To-English dictionary for desktops.

Home Page:https://getsyng.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Integrate Chinese Character Frequency Counter

baimafeima opened this issue · comments

It would be great to have the ability to paste random Chinese text into a field/box as part of Syng and get a Chinese character frequency count upon clicking a button. This would allow to quickly identify the most important characters to learn from particular Chinese texts and to efficiently prepare for exams for any college student.

https://czielinski.github.io/hanzifreq/hanzifreq/output/frequencies.html
See: https://github.com/czielinski/hanzifreq

These scripts allow the analysis of character frequencies in Chinese text corpora. This might be helpful for Chinese language learners to prioritize common characters when learning how to write.

That sounds like it could be a pretty helpful tool! So the feature would be to paste in some arbitrary block of Chinese text and get frequency data back from it about which characters are most frequently used?

Yes, exactly. I think Syng would be a great choice for that, especially since Hanzifreq is a terminal-based program without a suitable frontend for it.

I wouldn’t be able to include the actual hanzifreq script but I would definitely be able to build a tool that does something similar. My question is: would we want just character frequency or word frequency?

My question is: would we want just character frequency or word frequency?

I think character frequency would be the feature I would most often use. How would you approach word frequency?

First the text would be tokenized and then count the frequency of the tokenized words.