baidu / DuReader

Baseline Systems of DuReader Dataset

Home Page:http://ai.baidu.com/broad/subordinate?dataset=dureader

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

用demo数据运行paddle的问题

urcllr opened this issue · comments

运行paddle的infer这步,需要用到preprocessed/testnet下的数据
dureader

但github代码中并没有demo对应的preprocessed数据。我按Preprocess the Data那节的命令来生成testnet预处理数据出错(trainset和devset成功,只有testset失败。经查search.test.json的确没有segmented_answers键)
paddle-demo-preprocess

这样导致用demo的数据无法执行paddle infer这步,执行完后models底下的infer目录是空的。

For convenience, please use preprocessed version of our dataset, or segment questions, documents and references by yourself and provide "segmented_*" field for the preprocess script.

不好意思,是我昨天没理解透preprocess个章节,看漏了要自己先分词放入对应的segmented字段。