Add `Store::call_hook` API

Question

Add `Store::call_hook` API

Robbepop opened this issue a month ago · comments

This is about adding a Store::call_hook API similar to Wasmtime's Store::call_hook in order to improve Wasmi's Wasmtime API mirroring.}

This should not regress performance of host function calls or the Wasmi executor generally.

Emil Taylor Bye · Answer 1 · Sat Jul 13 2024 21:53:47 GMT+0800 (China Standard Time)

I have a use case for call_hook, so I figured I could take a shot at this. I now have a working version of call_hook, but had one question.

Should call_hook be behind a feature flag? It seems like a change is coming to wasmtime to put it behind a feature flag, which should make it easier to make sure that this has no/minimal performance impact where it is not needed. I have tried benchmarking with my changes locally and could see some regressions, but I also saw 20% speedups that I can't explain, so I don't know how much I can trust the benchmark numbers on my computer.

Robin Freyler · Answer 2 · Mon Jul 15 2024 02:53:01 GMT+0800 (China Standard Time)

Hi @emiltayl ,

thank you for working on this feature.

Can you please share a link to the discussion for putting this behind a feature flag? I would like to know more about the reasoning.
From what I can see with the linked PR the feature has not yet been implemented. Is that correct?
What performance regressions are you already seeing?
I imagined that this feature would not have significant performance regressions for Wasmi, since Wasmi being an interpreter will be slowed down but not as harshly as a JIT runtime. Maybe I am wrong about this.

Emil Taylor Bye · Answer 3 · Mon Jul 15 2024 03:19:42 GMT+0800 (China Standard Time)

The pull request for creating the feature flag is at bytecodealliance/wasmtime#8795, and at bytecodealliance/wasmtime#8808 they disable it by default. It seems to be merged to main, but not put in a release yet if I understand correctly.

I agree that wasmi's performance shouldn't be impacted to a large degree by this change in theory. I don't have the code in front of me right now, but I will be back with the regressions from the benchmarks tomorrow.

Emil Taylor Bye · Answer 4 · Mon Jul 15 2024 20:16:09 GMT+0800 (China Standard Time)

I've included the benchmarks that reported regressions sorted by % change. Judging by the changes I've made, I would have expected to see most change in execute/call/host/* as there is an extra function call on host->wasm and wasm->host boundaries, but I did not get any reported regressions from those benchmarks.

In the instantiate benchmarks an extra None is added when creating Store, but I don't think that should cause a 32% increase in run time. I have not touch linking or module parsing (I'm pretty sure of this at least).

Edit I tried running the benchmarks again on main, comparing with the first benchmark and got comparable results to my call_hook branch, with some benchmarks unexpectedly performing much poorer or slightly better. Seems to be a problem with me running the benchmarks locally. The overhead, especially when no call hook is set, should be quite low.

instantiate/reverse_complement
                        time:   [14.146 µs 14.256 µs 14.354 µs]
                        change: [+30.381% +32.448% +35.163%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

linker/build/construct/same/50
                        time:   [10.326 µs 10.380 µs 10.443 µs]
                        change: [+23.023% +24.755% +26.901%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

instantiate/tiny_keccak time:   [10.863 µs 10.965 µs 11.062 µs]
                        change: [+21.084% +23.059% +25.380%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

linker/setup/same/50    time:   [9.4257 µs 9.4622 µs 9.4998 µs]
                        change: [+12.403% +13.816% +15.746%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

translate/case/best     time:   [82.389 ms 82.601 ms 82.813 ms]
                        change: [+13.051% +13.671% +14.316%] (p = 0.00 < 0.05)
                        Performance has regressed.

execute/br_table        time:   [1.1662 ms 1.1701 ms 1.1777 ms]
                        change: [+8.5328% +10.312% +11.784%] (p = 0.00 < 0.05)
                        Performance has regressed.

translate/reverse_complement/checked/lazy/default
                        time:   [23.113 µs 23.249 µs 23.448 µs]
                        change: [+3.7300% +5.0512% +6.2639%] (p = 0.00 < 0.05)
                        Performance has regressed.

linker/build/finish/unique/50
                        time:   [109.96 ns 110.48 ns 111.05 ns]
                        change: [+3.9326% +5.0184% +6.2214%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

execute/tiny_keccak     time:   [380.57 µs 388.58 µs 395.36 µs]
                        change: [+2.5787% +4.2768% +5.8044%] (p = 0.00 < 0.05)
                        Performance has regressed.

translate/bz2/checked/lazy/default
                        time:   [47.556 µs 47.612 µs 47.658 µs]
                        change: [+2.8848% +3.5006% +4.1674%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

translate/case/worst/stackbomb/10000
                        time:   [115.92 ms 116.61 ms 117.22 ms]
                        change: [+2.0026% +3.2688% +4.4893%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild

translate/reverse_complement/checked/lazy-translation/default
                        time:   [156.43 µs 156.79 µs 157.09 µs]
                        change: [+1.7000% +2.7463% +3.6790%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

Robin Freyler · Answer 5 · Thu Jul 25 2024 17:52:49 GMT+0800 (China Standard Time)

Hi @emiltayl and sorry for my late reply.

The benchmarks indeed do not look all too good all in all but most of the regressions fortunately are not in the Wasmi executor but in the Wasm module instantiation.
The tiny_keccak Wasmi executor regression is a bit worrysome though since it reflects real-world usage and is usually very stable.

I am not sure how we shall continue with this ideally. The best is if you could open a PR so we could discuss the technical details there but at this point I won't guarantee to merge the efforts if we cannot resolve the regressions or at least soften them.