eddelbuettel / rcppsimdjson

Rcpp Bindings for the 'simdjson' Header Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider upgrading to simdjson 0.3.1

lemire opened this issue · comments

Version 0.3 of simdjson is now available

Highlights

  • Multi-Document Parsing: Read a bundle of JSON documents (ndjson) 2-4x faster than doing it individually. API docs / Design Details
  • Simplified API: The API has been completely revamped for ease of use, including a new JSON navigation API and fluent support for error code and exception styles of error handling with a single API. Docs
  • Exact Float Parsing: Now simdjson parses floats flawlessly without any performance loss (simdjson/simdjson#558).
    Blog Post
  • Even Faster: The fastest parser got faster! With a shiny new UTF-8 validator
    and meticulously refactored SIMD core, simdjson 0.3 is 15% faster than before, running at 2.5 GB/s (where 0.2 ran at 2.2 GB/s).

Minor Highlights

  • Fallback implementation: simdjson now has a non-SIMD fallback implementation, and can run even on very old 64-bit machines.
  • Automatic allocation: as part of API simplification, the parser no longer has to be preallocated-it will adjust automatically when it encounters larger files.
  • Runtime selection API: We've exposed simdjson's runtime CPU detection and implementation selection as an API, so you can tell what implementation we detected and test with other implementations.
  • Error handling your way: Whether you use exceptions or check error codes, simdjson lets you handle errors in your style. APIs that can fail return simdjson_result, letting you check the error code before using the result. But if you are more comfortable with exceptions, skip the error code and cast straight to T, and exceptions will be thrown automatically if an error happens. Use the same API either way!
  • Error chaining: We also worked to keep non-exception error-handling short and sweet. Instead of having to check the error code after every single operation, now you can chain JSON navigation calls like looking up an object field or array element, or casting to a string, so that you only have to check the error code once at the very end.

Saw the tweet, of course. Will try to get to it in the next few days...

Congrats to team. Really impressive work.

The new API received a get_type method (v0.3.1).

It so happens that I upgraded a few minutes ago (to your upstream) but singleheader/ files were unchanged. Now they are updated. But luck on my part :) Will repeat. (Hadn't merged so no loss...)

BTW do I need to change anything on my end to benefit from / invoke the fallback method?

BTW do I need to change anything on my end to benefit from / invoke the fallback method?

You do not per se, but you should compile without any architecture flag. Obviously, if you compile with -msse42 or something (or the equivalent under Windows), then this will break the fallback method. I scanned quickly your code and scripts, and I do not see anything of the sort, so I would guess that you will be fine.

cc @jkeiser

It so happens that I upgraded a few minutes ago (to your upstream) but singleheader/ files were unchanged. Now they are updated. But luck on my part :)

If your code compiled with 0.3.0, it should compile with 0.3.1.

We do not systematically update the single-header files, but they do get updated on a release.

For a two-line example file I had quite a bit of work to do to catch up to the deprecations :)

All good. I compile with -pedantic so it once again whined about semicolons:

edd@rob:~/git/rcppsimdjson(master)$ diff -u ../simdjson/singleheader/simdjson.h inst/include/simdjson.h 
--- ../simdjson/singleheader/simdjson.h 2020-04-02 18:47:06.413612008 -0500
+++ inst/include/simdjson.h     2020-04-02 19:20:55.871628417 -0500
@@ -1810,7 +1810,7 @@
  * @param value The value to print.
  * @throw if there is an error with the underlying output stream. simdjson itself will not throw.
  */
-inline std::ostream& operator<<(std::ostream& out, const element &value) { return out << minify(value); };
+inline std::ostream& operator<<(std::ostream& out, const element &value) { return out << minify(value); }
 /**
  * Print JSON to an output stream.
  *
@@ -3492,7 +3492,7 @@
 // object inline implementation
 //
 really_inline object::object() noexcept : internal::tape_ref() {}
-really_inline object::object(const document *_doc, size_t _json_index) noexcept : internal::tape_ref(_doc, _json_index) { };
+really_inline object::object(const document *_doc, size_t _json_index) noexcept : internal::tape_ref(_doc, _json_index) { }
 inline object::iterator object::begin() const noexcept {
   return iterator(doc, json_index + 1);
 }
@@ -4394,13 +4394,13 @@
 really_inline T& simdjson_result_base<T>::value() noexcept(false) {
   if (error()) { throw simdjson_error(error()); }
   return this->first;
-};
+}
 
 template<typename T>
 really_inline T&& simdjson_result_base<T>::take_value() && noexcept(false) {
   if (error()) { throw simdjson_error(error()); }
   return std::forward<T>(this->first);
-};
+}
 
 template<typename T>
 really_inline simdjson_result_base<T>::operator T&&() && noexcept(false) {
edd@rob:~/git/rcppsimdjson(master)$ 

Happy to send you guys a PR but was just called for dinner so give me a few...

Happy to send you guys a PR but was just called for dinner so give me a few...

Don't worry. Easy to fix.

Yep, I sent you #673 but feel free to either merge or ignore. The CI should be busy running now.

This can be closed -- version 0.0.4 is now at CRAN.

Thanks for all the amazing work, and of course the heads-up here too.