wasmi-labs / wasmi

WebAssembly (Wasm) interpreter.

Home Page:https://wasmi-labs.github.io/wasmi/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimization: Special handling for common shadow stack instruction sequences

Robbepop opened this issue · comments

Wasm producers such as LLVM often generate Wasm code with a so-called shadow stack in order to handle parameters and return values that did not fit the Wasm function parameter and result passing model.

Common sequences include:

global.get 0
i32.const $n
i32.sub
local.tee $v
global.set 0

and

global.get 0
i32.const -$n
i32.add
local.tee $v
global.set 0

as well as the counterpart

local.get $v
i32.const $n
i32.add
global.set 0

Where $n denotes a positive i32 integer literal and $v denotes a local variable index.
Currently Wasmi (register) translates this to roughly the following bytecode:

$r <- global.get 0
$v <- i32.sub_imm $r $n
global.set 0 $v

However, with special treatment in the Wasmi translator and introduction of new Wasmi bytecode instructions we could easily translate the entire sequences above to a single instruction.

$v <- global.0.i32.sub_assign_imm $n
  • We no longer state the global index (0) since Wasm producers implicitly always use the zero-th index and we can use this fact to further compress the special bytecode for these sequences.
  • Another optimization that we could apply is that the constant immediate value is always divisible by 4 (since this is about pointer arithmetic on 32-bit systems). So we could store the constants as divided by 4 to increase the effective value range.
  • Instead of supporting both global.0.i32.sub_assign_imm and global.0.i32.add_assign_imm we can use the fact that sequences with i32.add always have a negative constant immediate value and thus convert the i32.add into an i32.sub using a negated constant.

It is to be expected that this optimization yields good results because:

  • the most costly things in an efficient interpreter such as Wasmi is instruction dispatch and this technique results in fewer instructions for the same compute effect
  • past similar optimizations with op-code fusion yielded good results
  • when inspecting Wasm binaries such as spidermonkey.wat or westend.wat (Polkadot runtime) we observed thousands of occurrences of the above Wasm instruction sequences.

Having #924 implemented makes this optimization simpler to implement since we no longer have to differentiate between i32.add r c and i32.sub r c variants of the above instruction sequences.