Ancastal / HSK-Character-Profiler

HSK Character Profiler is a Python tool that analyzes Chinese character proficiency and text readability based on HSK lists, with customizable settings. Developed as part of a Master's thesis in Computational Linguistics.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HSK Character Profiler

24/12/2023 Edit: This repo will soon be merged with a more up-to-date repository.

The HSK Character Profiler is a Python command-line tool developed as part of a Master's thesis in Computational Linguistics titled "Evaluating the Effectiveness of Machine Translation for Literary Works: A Comparative Study of English and Chinese Corpora."

The tool provides a way to analyze a text file containing Chinese characters and determine the levels of proficiency in Chinese language skills based on the HSK (Hanyu Shuiping Kaoshi) system. It identifies the HSK level of each character in the text file and generates a report of the number of characters found at each HSK level, as well as the average HSK level of the text file.

The HSK Character Profiler is flexible and customizable, allowing users to modify the input file and the HSK character sets used for analysis. It utilizes popular NLP libraries such as NLTK and Jieba for character segmentation and analysis.

This tool can be particularly useful for Chinese language learners, teachers, and researchers who need to assess the difficulty level of a text or determine the appropriate HSK level for a specific vocabulary list or learning material.

About

HSK Character Profiler is a Python tool that analyzes Chinese character proficiency and text readability based on HSK lists, with customizable settings. Developed as part of a Master's thesis in Computational Linguistics.


Languages

Language:Python 100.0%