cyzLoveDream / CPTools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A series of Chinese processing tools

There are some features in the package. If you need to use it can be installed directly

extract_chinese(self, text, symbols= "", remove_letter=False)

Extract chinese from raw text. You can choose symbols to replace non-Chinese characters in the corpus

filter_words_in_corups(self, sensitive_list, raw_text_list)

Use AC automata to filter sensitive words

is_chinese(self, uchar) Determine if a character is Chinese

get_json_data(self, path)

Read json data from a file, where each line is a Dict

save_data_to_npy(self, data, path)

read_data_from_npy(self, path)

About

License:MIT License


Languages

Language:Python 100.0%