Analysis on the Story of a Stone

Author: Yu Lou

Written for Python 3.

preprocess.py

Preprocess text.

Input

hlm.txt: original text

Output

preprocessing.txt: preprocessed text.

preprocess_chapters.py

Split text into chapters and preprocess them.

Input

hlm.txt: original text

Output

chapters(folder): preprocessed text. One file for each chapter, numbered from "1.text".

dict_creat.py

Creat dictionary.

Input

preprocessing.txt: preprocessed text.

Output

dict.csv: dictionary.

word_split.py

Split words apart.

Input

preprocessing.txt: preprocessed text.
dict.csv: dictionary.

Output

word_split.text: splitted text.

word_split_chapters.py

Split words apart in all chapters.

Input

preprocessing.txt: preprocessed text.
dict.csv: dictionary.
chapter(folder): preprocessed text for all chapters.

Output

chapter_split(folder): splitted text. One file for each chapter, numbered from "1.text".

word_count.py

Count words.

Input

word_split.text: splitted text.

Output

word_count.csv: counting result, sorted by number of occurence.

word_count_chapters.py

Count words in each chapters.

Input

chapter_split(folder): splitted text for each chapter.

Output

word_count_chapters.csv: counting result. One line per word and one chapter per column.

analysis.py

Do PCA analysis. Show result on screen.

Prerequisite

"sklearn", "numpy" and "matplotlib" is needed to run this program.

Input

word_count_chapters.csv: word counting result for each chapters.

Output

components.csv: weights for each components.

suffix_tree.py

Libary for suffix tree.

correctness_calculate.py

Calculate the correctness of word splitting algorithm.

Input

*_answer.txt: answer.
*_result.txt: result of the program.

("*" is file prefix)

About

Analysis on the novel "the Story of a Stone"

Languages

Language:Python 100.0%