MarkWuNLP / MultiTurnResponseSelection

This repo contains our ACL 2017 paper data and source code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tokenize method

mapingshuo opened this issue · comments

How did you tokenize the raw Douban corpus? I'm trying to test my model with new data, I need to tokenize them firstly.

Hi, sorry for late replay. I used an internal Microsoft tokenizer to preprocess the data. For the sake of Microsoft policy, I cannot share the tool with you. Do you have any good idea to address it?

Thanks for your reply.