A homework asignment at https://ksi.ff.cuni.cz, subject Linguistic Analysis of Chinese.
Basic quantitative analysis of chinese characters and their aspects in a given text:
type/token ratio, frequencies of radicals, of characters with multiple pronounciations, etc. Most of the information I use is from Unicode Unihan. Resulting tables look like this.
All my code here is under MIT license.
Straňák, P. Hanzi Stats [Computer software]. https://github.com/stranak/hanzi-stats
@software{Stranak_Hanzi_Stats,
author = {Straňák, Pavel},
license = {MIT},
title = {{Hanzi Stats}},
url = {https://github.com/stranak/hanzi-stats}
}