Support vyper jumptables

Question

Support vyper jumptables

karmacoma-eth opened this issue 4 months ago · comments

Is your feature request related to a problem? Please describe.

Newer versions of vyper use jumptable-based dispatchers. We don't currently handle them very gracefully because of the mod operation.

For instance say a vyper contract implements 3 functions, the dispatcher will do something like this:

jump_table_offset = XXX   # (typically at the end of bytecode)
jump_table_index = input % 3
jump_target = codecopy(jump_table_offset + jump_table_index * 2)
JUMP(jump_target)

Because of the mod operation, halmos will fail with NotConcreteError(f"symbolic JUMP target: {dst}") or a symbolic CODECOPY error

Doing this from memory, so I'm not sure exactly what input is used, but if it is concrete then we should be able to resolve it. If it is symbolic, we may have to run a solver query to know which are the possible values, or identify the pattern and loop over the jumptable values

cc @zobront @charles-cooper

karmacoma · Answer 1 · Sat Jan 27 2024 07:41:43 GMT+0800 (China Standard Time)

Here is an annotated example of a jump table dispatcher, looks like an example with 27 functions. And the input is the usual first 4 bytes of calldata (4byte)

PUSH0 CALLDATALOAD PUSH1 0xE0 SHR    || stack: [4byte]
PUSH1 0x2 PUSH1 0x1B DUP3 MOD        || stack: [4byte % 27, 2, 4byte]
PUSH1 0x1 SHL                        || stack: [2 * (4byte % 27), 2, 4byte]
PUSH2 0x125 ADD                      || stack: [0x125 + 2 * (4byte % 27), 2, 4byte]
PUSH1 0x1E CODECOPY PUSH0 MLOAD JUMP || stack: [4byte] => jump to loc at `0x125 + 2 * (4byte % 27)`

karmacoma · Answer 2 · Sat Jan 27 2024 07:49:30 GMT+0800 (China Standard Time)

Not perfect, but here is a potential idea for how to solve this in CODECOPY:

    elif opcode == EVM.CODECOPY:
        loc: int = ex.st.mloc()
        offset = ex.st.pop()

        if is_concrete(offset):
            concrete_offset = int_of(offset)
            print(green(f"XXX CONCRETE CODECOPY {hexify(ex.this)}@{ex.pc} XXX"))

            size: int = int_of(ex.st.pop(), "symbolic CODECOPY size")
            wextend(ex.st.memory, loc, size)

            codeslice = ex.pgm[concrete_offset : concrete_offset + size]

            actual_size = byte_length(codeslice)
            if actual_size != size:
                raise HalmosException(
                    f"CODECOPY: expected {size} bytes but got {actual_size}"
                )

            ex.st.memory[loc : loc + size] = iter_bytes(codeslice)

        else:
            offsetSolver = SolverFor("QF_BV")
            codesize = con(len(ex.pgm))

            # TODO: we might want to also add a solution check for offset > codesize
            offsetSolver.add(ULE(offset, codesize))

            offsets = set([])
            result = offsetSolver.check()
            while result == sat:
                value = offsetSolver.model().evaluate(offset).as_long()
                offsetSolver.add(offset != value)
                offsets.add(value)
                result = offsetSolver.check()

            if self.options.get("debug"):
                print(f"Splitting execution for CODECOPY with possible offsets {offsets}")

            # say we found 2 possible offset values a and b, we need to
            # to create 2 new executions where we're going to reexecute
            # this CODECOPY instruction (same pc), but one for each offset
            for offset_value in offsets:
                new_ex = self.create_branch(ex, offset == offset_value, ex.pc)

                # replace the items we have consumed on the stack
                new_ex.st.push(con(offset_value))
                new_ex.st.push(con(loc))

                stack.append((new_ex, step_id))

            # we're done with this execution
            continue

note the use of a separate offsetSolver to enumerate possible values

some problems with this:

we may have to deal with uninterpreted functions like evm_bvurem -- might need to replace with the real SMT bvurem operation?
the solver may return unknown, for completeness we may want to keep iterating until result == unsat (meaning that we know there is no other possible solution)

Potentially, we could also fallback to another mechanism. Specifically, we could do pattern matching on the symbolic expression and see that it corresponds to something like offset + evm_bvurem(4byte, num_functions) * 2 and avoid calling a solver for this

Charles Cooper · Answer 3 · Thu May 02 2024 22:07:04 GMT+0800 (China Standard Time)

btw the layout of the dispatcher is in the initcode cbor metadata. using this you should be able to reliably tell what kind (-Ocodesize vs -Ogas) of jumptable it is. here's an example signature with -Ocodesize:

cbor2.loads(bytes.fromhex(initcode.removeprefix("0x"))[-22:])
[1264, [5, 84], 0, {'vyper': [0, 4, 0]}]

and with -Ogas:

(1364, [24], 0, {'vyper': (0, 4, 0)})

(you can ignore the tuple vs list distinction; they are equivalent in cbor).

the codesize optimized bytecode has 2 data sections; the gas-optimized has 1.