eddelbuettel / rcppsimdjson

Rcpp Bindings for the 'simdjson' Header Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

More details in error output of .load_json()

mrwunderbar666 opened this issue · comments

Hi,

thanks for providing this package! It is really a blessing when working with large quantities of data!

I want to suggest providing more details when parsing a JSON file fails. For example, I get the following error when I parse over 1000 files:

Error in .load_json(json = input, query = query, empty_array = empty_array): 
NO_SUCH_FIELD: The JSON field referenced does not exist in this object.

In my case, I had a single file that was malformed and I think it would be useful to know the filename where the query failed.

Desired message:

Error in .load_json(json = input, query = query, empty_array = empty_array): 
NO_SUCH_FIELD: The JSON field referenced does not exist in /path/to/file.json 

Sure, that's a good idea in principle but in practice this is much harder as there are two steps

  • creating json input from a file
  • parsing json

So an error in step two no longer has an association with the input, in your case a file. Here the error comes from.load_json() which is an identifier provided by the load() function in src/deserialize.cpp. It has a JSON string argument and does not know the file.

We do offer a quicker 'is it valid?' test function so in your case maybe one needs a local loop over possible files. Sorry.

Thanks, fair enough.

For such edge cases, I will then just implement a slower loop that checks each file separately

Really appreciate your understanding -- the 'better error messages are good' suggestion is totally valid. However, the very performance-oriented design of simdjson makes it tricky.