The related code function is to process the Chinese txt document, eliminate the unnecessary content, that is, regularization processing, and make simple typesetting. I will give two versions here, one is to divide the data by sentence, and the other is to divide the data by segment.