tetratelabs / wazero

Is your feature request related to a problem? Please describe.
I'd like to know which of the wazero benchmarks (specific Benchmark... functions) are most representative of wazero user code, if you know that. The problem I have is that, as a Go developer, I am trying to maintain a collection of relevant benchmarks for compiler, runtime, and/or GC, so that we can improve performance in user-relevant ways. But I don't know a lot about wazero or its user community.

Describe the solution you'd like
Someone will provide me with a list of benchmarks, between 2 and 10 I think, that decently model the performance of wazero user code.

Describe alternatives you've considered
I've considered:

randomly selecting benchmarks (excluding the really short ones, which are obviously not user code, and not repeating a benchmark that is just longer/shorter versions of the same benchmark).
profiling the benchmarks, and selecting the ones that seem to exercise the greatest diversity of wazero source code.
checking sensitivity of benchmarks to PGO/not, inlining/not, optimization/not, randomized linking/not, and sampling along those various axes.

Additional context
Really, any guidance you can offer about what you think would matter to your users, is good.
Thanks very much for any information you have.
The "grand plan" is to incorporate these benchmarks into https://github.com/golang/benchmarks/tree/master/cmd/bent (this is stale, I have a stack of CLs that need to land) which would eventually put the selected wazero benchmarks onto the Go perf dashboard.

This seems fine here, but if you want to open up a discussion with users of wazero, we have a couple of channels in Gophers Slack: wazero and wazero-dev.

i think @ncruces you have a Wasm-compiled sqlite speetest right? I think that can be considered representative use case/bench of wazero. Given this is mainly interested in the Go compiler's perf, running it with Interpreter mode would be a great i guess?

I'd say that, if you want something representative of user relevant Go code that we have, I'd look at a compilation benchmark: take a complex pice of Wasm (could be my SQLite binary, but a Go wasip1 binary could be even better) and benchmark wazero compiling that.

Improving the performance of the compiler is a big focus for us now, where we're trying to make the wazero compiler as fast as wasmtime's (written in Rust).

A benchmark like my SQLite speedtest, as @mathetake suggested, spends most of its time in compiled code, and you'd need to use the interpreter to stress Go code.

To be more concrete, I was looking at the benchmarks in wazero/internal/integration_test/bench. Based on remarks above, I think that means I maybe want these?

BenchmarkInvocation/interpreter/string_manipulation_size_50
BenchmarkInvocation/interpreter/fib_for_20
BenchmarkInvocation/interpreter/random_mat_mul_size_20
BenchmarkCompilation/with_extern_cache
BenchmarkCompilation/without_extern_cache
BenchmarkCompilation/interpreter

Maybe some of those are not the best choice, and I'm not entirely sure what BenchmarkCompilation/interpreter means but I suspect it is "not really compiling".

I'd like to avoid being in the benchmark-writing business for other people's code.

Hi @dr2chase, I am perhaps not as well placed as the authors of this project to comment, however, as another user who is interested in wazero performance for my own hobby projects, I am having a hard time understanding what you are looking to benchmark.

Are you trying to benchmark:

the speed at which wazero compiles wasm binaries to machine code
the performance of the machine code wazero produces
the wazero wasm byte-code interpreter that allows wazero to execute wasm without compiling to machine code
something else?

It seems from what I read that you either mean the performance of the machine code wazero produces, but that you are concerned about testing specifically Go code and not machine code... I find it very confusing. You want to benchmark arbitrary Go code as opposed to what?

Sorry If I've misunderstood something obvious.

@davidmdm I work on the Go compiler, sometimes the runtime, sometimes the libraries. When we change the compiler, we like to be sure that we are making it better for Go users (in general) not worse. It's not "how good is wazero", but rather "did we accidentally hurt wazero?". Or "hey, we helped wazero, this change might be worth keeping". But we can't run all the benchmarks for all the applications, so I'm looking for a subset that models wazero performance for its users, at least well enough.

Thanks @dr2chase for approaching us (and staying with us), and @davidmdm for asking the right question!

If you're looking for typical usages of wazero, I guess the interpreter is less interesting, because I suspect most users will be on platforms that support the compiler.

So both compilation time (for startup) and runtime are likely (more) important. OTOH, benchmarks that are entirely dominated by compiled ASM, less so, because they won't be sensitive to Go compiler improvement. So something with a mix of Go calls Wasm, Wasm calls Go.

I'll ask a (hopefully final) question, are you particularly interested in testing the quality of the Go to Wasm (wasip1) compiler? So: Go source compiled to Wasm, then compiled by wazero to (e.g.) amd64, and run with wazero? This would need to have the (tip?) Go compiler in the loop, so makes infra harder, but may be what you're looking for, and worth the effort?

I'm mostly asking these clarifying questions because most of our benchmarks focus on things we can improve (e.g. quality of compiled code) not necessarily on stuff the Go team could do for us. So we may end up creating a new benchmark for this.

I think quality of wasip1 is a different benchmark, arguably I need to fiddle with the benchmark runner itself so that it can do that or has an option to do that (the benchmark runner tries to automate all the places where humans will make mistakes or forget to record settings or not use best practices, etc. So if you can tell me the "best way" to run wasip1, that is helpful to that problem. I'm not sure all the benchmarks will work with wasm, I guess we'll see.

I definitely have the tip compiler in the loop.

Otherwise, for this, it sounds like I want to benchmark compilation time for startup, go <--> wasm overhead, and maybe stuff on the system-interface side? And maybe a little bit of the interpreter, for diversity, and just in case (do you use the interpreter in testing, for example? People habitually shortchange testing, tests are less documented, when we changed the loop iteration capture semantics, virtually all the bugs were in tests, etc.)

Thanks so much for your help, I suspect the future has a lot more wasm in it, so we need to pay attention.

Thanks for the help with perf @dr2chase.

For benchmarking wasip1, I think it's simply picking any benchmark (in this case, presumably one that's already in golang/go) and compiling the test binary and running with the wazero CLI. I just tried with go-re2 and it seemed to work without issue

GOOS=wasip1 GOARCH=wasm go test -c .
go run github.com/tetratelabs/wazero/cmd/wazero@v1.7.3 run go-re2.test -test.run=None -test.bench=Benchmark

As for representative benchmarks, sorry if I misunderstood but it sounds like you are looking for more e2e-type of benchmarks rather than microbenchmarks, since the latter would be mostly for wazero development. If that's true, then probably the ones you found under integration_test are confusingly more like microbenchmarks. We can see the test cases are quite small, notably they will not exercise much of the diversity of the compiler I think.

https://github.com/tetratelabs/wazero/blob/main/internal/integration_test/bench/testdata/case.go#L3

For more e2e-type of compiler benchmarks, we have Compilation which @mathetake runs anytime changing the compiler backend.

#2265

For invocation, we have libsodium. Unfortunately due to the size of the Wasm we don't check it in so it requires a build step - and perhaps it's a bit overboard even as an e2e benchmark

wazero/.github/workflows/integration.yaml

Line 317 in 26d3aeb

libsodium:

Otherwise, I wonder if it's an option to pick a downstream library instead as it is a real-world use case with wazero doing all the heavy lifting, I suspect go-sqlite3 would exercise the syscall layer a lot (go-re2, has few syscalls and would be almost entirely within compiled code).

A slight aside, but I wonder if it would make sense to add an environment variable or build tag to force disable CompilerSupported

wazero/internal/platform/platform.go

Line 15 in 26d3aeb

func CompilerSupported() bool {

Then, if wanting to benchmark the interpreter in a downstream project such as Go, it would be much easier to pick between compiler and interpreter without special-casing the benchmark.

Representative benchmarks of wazero for Go compiler perf measurement