baidu / DuReader

Baseline Systems of DuReader Dataset

Home Page:http://ai.baidu.com/broad/subordinate?dataset=dureader

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About convert MARCO dataset to Dureader style

AmosHua opened this issue · comments

when using the script marcov1_to_dureader.py to convert MARCOv1 to dureader, it failed because ValueError: Trailing data

when using the script marcov1_to_dureader.py to convert MARCOv1 to dureader, it failed because ValueError: Trailing data

@AmosHua Could you please provide more information? For example, what's the version of MSMARCO dataset do you use? More info can help us to reproduce the issue. Thanks.

when using the script marcov1_to_dureader.py to convert MARCOv1 to dureader, it failed because ValueError: Trailing data

@AmosHua Could you please provide more information? For example, what's the version of MSMARCO dataset do you use? More info can help us to reproduce the issue. Thanks.

I use MSMARCO v2

I have the same questions. The Traceback is as follows:
Traceback (most recent call last):
File "marcov1_to_dureader.py", line 33, in
df = pd.read_json(sys.argv[1])
File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 366, in read_json
return json_reader.read()
File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 467, in read
obj = self._get_object_parser(self.data)
File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 484, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 576, in parse
self._parse_no_numpy()
File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 793, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Trailing data