More details in error output of .load_json()

Question

More details in error output of .load_json()

mrwunderbar666 opened this issue a year ago · comments

Hi,

thanks for providing this package! It is really a blessing when working with large quantities of data!

I want to suggest providing more details when parsing a JSON file fails. For example, I get the following error when I parse over 1000 files:

Error in .load_json(json = input, query = query, empty_array = empty_array): 
NO_SUCH_FIELD: The JSON field referenced does not exist in this object.

In my case, I had a single file that was malformed and I think it would be useful to know the filename where the query failed.

Desired message:

Error in .load_json(json = input, query = query, empty_array = empty_array): 
NO_SUCH_FIELD: The JSON field referenced does not exist in /path/to/file.json

Dirk Eddelbuettel · Answer 1 · Mon May 22 2023 20:01:28 GMT+0800 (China Standard Time)

Sure, that's a good idea in principle but in practice this is much harder as there are two steps

creating json input from a file
parsing json

So an error in step two no longer has an association with the input, in your case a file. Here the error comes from.load_json() which is an identifier provided by the load() function in src/deserialize.cpp. It has a JSON string argument and does not know the file.

We do offer a quicker 'is it valid?' test function so in your case maybe one needs a local loop over possible files. Sorry.

mrwunderbar666 · Answer 2 · Mon May 22 2023 20:43:19 GMT+0800 (China Standard Time)

Thanks, fair enough.

For such edge cases, I will then just implement a slower loop that checks each file separately

Dirk Eddelbuettel · Answer 3 · Mon May 22 2023 20:45:45 GMT+0800 (China Standard Time)

Really appreciate your understanding -- the 'better error messages are good' suggestion is totally valid. However, the very performance-oriented design of simdjson makes it tricky.