travisstaloch / simdjzon

simdjson port to zig

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

address new ci failure

travisstaloch opened this issue · comments

I created a minimal reproduction and an issue here ziglang/zig#17996

this fix made it into the tarballs today. ziglang/zig#18729.

was able to run zig build test again locally 🚀

here is a perf point. this is with new versions of simdjson.cpp/h from https://github.com/simdjson/simdjson/blob/master/singleheader/

~/.../zig/simdjzon $ zig version
0.12.0-dev.2540+776cd673f
~/.../zig/simdjzon $ g++ --version
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
~/.../zig/simdjzon $ zig build -Doptimize=ReleaseFast
~/.../zig/simdjzon $ cat main.cpp
#include "simdjson.h"
using namespace simdjson;
int main(int argc, char** argv) {
    if(argc != 2) {
        std::cout << "USAGE: ./simdjson <file.json>" << std::endl;
        exit(1);
    }
    dom::parser parser; 
    try
    {
        const dom::element doc = parser.load(argv[1]);
    }
    catch(const std::exception& e)
    {
        std::cerr << e.what() << '\n';
        return 1;
    }
    return 0;
}

~/.../zig/simdjzon $ g++ main.cpp simdjson.cpp -o simdjson -O3 -march=native
~/.../zig/simdjzon $ poop "./simdjson test/twitter.json" "zig-out/bin/simdjzon test/twitter.json"
Benchmark 1 (2408 runs): ./simdjson test/twitter.json
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.03ms ±  117us    1.78ms … 2.90ms         77 ( 3%)        0%
  peak_rss           4.98MB ± 2.67KB    4.85MB … 4.98MB          1 ( 0%)        0%
  cpu_cycles         2.16M  ±  102K     2.07M  … 3.32M         225 ( 9%)        0%
  instructions       4.87M  ± 77.8      4.87M  … 4.87M         599 (25%)        0%
  cache_references    125K  ± 2.46K      113K  …  167K         164 ( 7%)        0%
  cache_misses       19.7K  ±  885      17.5K  … 26.3K          80 ( 3%)        0%
  branch_misses      19.2K  ±  255      17.8K  … 20.3K          47 ( 2%)        0%
Benchmark 2 (3103 runs): zig-out/bin/simdjzon test/twitter.json
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.58ms ± 94.6us    1.33ms … 2.09ms         50 ( 2%)        ⚡- 22.2% ±  0.3%
  peak_rss           1.70MB ± 3.33KB    1.57MB … 1.70MB          2 ( 0%)        ⚡- 65.8% ±  0.0%
  cpu_cycles         2.05M  ± 45.6K     1.93M  … 2.77M         225 ( 7%)        ⚡-  5.1% ±  0.2%
  instructions       4.49M  ± 0.55      4.49M  … 4.49M           9 ( 0%)        ⚡-  7.8% ±  0.0%
  cache_references   73.3K  ± 2.44K     68.1K  …  125K         102 ( 3%)        ⚡- 41.3% ±  0.1%
  cache_misses       4.91K  ± 1.36K     1.43K  … 12.4K          92 ( 3%)        ⚡- 75.0% ±  0.3%
  branch_misses      4.83K  ±  677      3.28K  … 7.40K         146 ( 5%)        ⚡- 74.8% ±  0.1%

cc @Validark

I'm surprised how big of a difference there is! Is simdjson doing a lot more work? Have you made more improvements?

No I haven't done anything. I'm surprised too. Makes me think something might be amiss with my benchmark or that were skipping some work.

The only significant changes I've made recently were adding some initExisting() methods to make dom memory re-use possible. But that shouldn't affect main(). 🤔

@Validark if you want to run the benchmark youself, i've added bench/twitter in a new commit just now. you should be able to at least build the benchmark binaries by running bench/twitter/run.sh. and then there is a poop command which you may have to run manually (it fails to find poop for me).

No I haven't done anything. I'm surprised too. Makes me think something might be amiss with my benchmark or that were skipping some work.

The only significant changes I've made recently were adding some initExisting() methods to make dom memory re-use possible. But that shouldn't affect main(). 🤔

Were the results this extreme before those changes? Is the memory reduction just from your new strategy or was it there before?

No I don't remember the results being this different. This is the first time I've noticed such a large difference. Its been quite a while since ran any benchmarks so I'm not sure when it might have changed.

I don't think the memory reuse changes should affect this at all since those should only help in situations when the parser is being re-used to parse different documents.