Tokenize method
mapingshuo opened this issue · comments
mapingshuo commented
How did you tokenize the raw Douban corpus? I'm trying to test my model with new data, I need to tokenize them firstly.
Yu Wu (吴俣) commented
Hi, sorry for late replay. I used an internal Microsoft tokenizer to preprocess the data. For the sake of Microsoft policy, I cannot share the tool with you. Do you have any good idea to address it?
mapingshuo commented
Thanks for your reply.