More details in error output of .load_json()
mrwunderbar666 opened this issue · comments
Hi,
thanks for providing this package! It is really a blessing when working with large quantities of data!
I want to suggest providing more details when parsing a JSON file fails. For example, I get the following error when I parse over 1000 files:
Error in .load_json(json = input, query = query, empty_array = empty_array):
NO_SUCH_FIELD: The JSON field referenced does not exist in this object.
In my case, I had a single file that was malformed and I think it would be useful to know the filename where the query failed.
Desired message:
Error in .load_json(json = input, query = query, empty_array = empty_array):
NO_SUCH_FIELD: The JSON field referenced does not exist in /path/to/file.json
Sure, that's a good idea in principle but in practice this is much harder as there are two steps
- creating json input from a file
- parsing json
So an error in step two no longer has an association with the input, in your case a file. Here the error comes from.load_json()
which is an identifier provided by the load()
function in src/deserialize.cpp
. It has a JSON string argument and does not know the file.
We do offer a quicker 'is it valid?' test function so in your case maybe one needs a local loop over possible files. Sorry.
Thanks, fair enough.
For such edge cases, I will then just implement a slower loop that checks each file separately
Really appreciate your understanding -- the 'better error messages are good' suggestion is totally valid. However, the very performance-oriented design of simdjson
makes it tricky.