About convert MARCO dataset to Dureader style

Question

About convert MARCO dataset to Dureader style

AmosHua opened this issue 6 years ago · comments

when using the script marcov1_to_dureader.py to convert MARCOv1 to dureader, it failed because ValueError: Trailing data

Jing Liu · Answer 1 · Fri Oct 12 2018 20:06:52 GMT+0800 (China Standard Time)

when using the script marcov1_to_dureader.py to convert MARCOv1 to dureader, it failed because ValueError: Trailing data

@AmosHua Could you please provide more information? For example, what's the version of MSMARCO dataset do you use? More info can help us to reproduce the issue. Thanks.

AmosHua · Answer 2 · Mon Oct 22 2018 17:13:55 GMT+0800 (China Standard Time)

when using the script marcov1_to_dureader.py to convert MARCOv1 to dureader, it failed because ValueError: Trailing data

@AmosHua Could you please provide more information? For example, what's the version of MSMARCO dataset do you use? More info can help us to reproduce the issue. Thanks.

I use MSMARCO v2

Wei-Peng · Answer 3 · Thu Apr 09 2020 11:14:56 GMT+0800 (China Standard Time)

I have the same questions. The Traceback is as follows:
Traceback (most recent call last):
File "marcov1_to_dureader.py", line 33, in
df = pd.read_json(sys.argv[1])
File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 366, in read_json
return json_reader.read()
File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 467, in read
obj = self._get_object_parser(self.data)
File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 484, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 576, in parse
self._parse_no_numpy()
File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 793, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Trailing data