RFC: Add RISC-V support to ropr

Question

RFC: Add RISC-V support to ropr

mbbutler opened this issue 8 months ago · comments

mbbutler commented 8 months ago

RFC: Add RISC-V support to ropr

Author: Brady Butler

Organization: Assured Information Security

Introduction

At present, ropr is a Rust-based, multi-threaded ROP-gadget finder for x86_64 binaries using the iced-x86 disassembler. The goal of this RFC is to formalize a design for adding RISC-V support to ropr. This proposal will focus on two areas:

Which RISC-V disassembler to use for this effort
How to standardize the ropr API to prevent the need for a bunch of match(isa){} statements.

Picking a RISC-V Disassembler

There appear to be three main disassembler options for adding RISC-V support to ropr:

Capstone: A multi-platform disassembler engine that supports both x86_64 and RISC-V, as well as many other ISAs (e.g. ARM, Sparc, etc). Rust bindings for capstone are available via the capstone crate.
disc-v: The Oxide Computer Company's fork of a Rust port of a C-based RISC-V disassembler.
We (i.e. I) can roll our own disassembler. RISC-V is much simpler than x86_64 so this might not be too much work.

Capstone

Advantages:

Using capstone would essentially get extra platform support for (almost) free on ropr. Once capstone is integrated for RISC-V, adding other platforms would be a matter of writing new rules based on each ISAs particular instruction mnemonics.
ropr would only need one disassembler for the entire project

Disadvantages:

We would have little to no control over the Disassembler itself. It's basically a blackbox once program bytes cross the FFI boundary.
capstone performance is a concern.

disc-v

Advantages:

It is written in Rust and we would have full control of it up to and including a fork if we need to make changes.
Since RISC-V is open-source, companies have been able to add their own custom instructions to chips that they make (see an issue about it over at the Ghidra repo). Having control over the disassembler means we could find a way to support these out-of-spec instructions (e.g. a macro that parses instructions written in a specified DSL to add disassembler instructions).

Disadvantages:

The disassembler hasn't been updated in almost 3 years. It would likely need some work to get it to an appropriate level.
We would need to have multiple disassemblers as part of the project

Writing our own Disassembler

Advantages:

Same advantages as disc-v.
Writing your own tools is fun.

Disadvantages:

Increases development time.
Getting a disassembler 100% correct is tough.
We would need to have multiple disassemblers as part of the project

How to Manage Multiple Disassemblers within the Project

Even if we decide to just use capstone, we still have the problem of how to abstract away a Disassembler and Instructions that look very different for different architectures without needing to have a ton of match(isa){} expressions. We can start by looking at how ropr functions to see where points of abstraction might help.

ropr's main() function does the following:

1. Parse CLI args
2. Generate a Vec<(usize, usize)> of Address Ranges based on CLI args
3. Generate a Vec<Regex> of regices based on CLI args
4. For each Address Range:
    a. Create a new Disassembly struct from the bytes of the Address Range
    b. For each position in the bytes range of the Disassembly:
        i. If the position is a Gadget tail, disassemble back max_instructions_per_gadget
           and add the Gadget to the resulting Vec
5. For each Gadget found in Step 4:
    a. Format the Gadget's instructions
    b. If the formatted instructions satisfy the regices, add the Gadget to the Vec of 
       final ROP Gadgets
6. Write the ROP Gadgets to stdout

The iced_x86-specific pieces of code within the above functionality are the following:

The Disassembly struct contains a Disassembler that is a wrapper around an iced_x86::Decoder. It also contains and operates on iced_x86::Instructions for most of its core functionality.
The Gadget struct contains Vec<iced_x86::Instruction> and uses iced_x86 formatters to output instruction strings.
The rule functions within rules.rs operate on iced_x86::Instructions.

I think the most effective level of abstraction would involve defining Disassembly and Instruction traits and converting Gadget/GadgetIterator to generic Gadget<T: Instruction>/GadgetIterator<T: Instruction>. Then each newly supported architecture would only need to create structs that implement Disassembly and Instruction for that specific architecture. All of the existing x86_64 code could easily be moved into these new traits since the current iced_x86-specific functions would map one-to-one with the new traits' functions.

With structs that implement these traits, main() could do a single architecture check

A rough draft would look like this:

pub struct Gadget<T: Instruction> {
	instructions: Vec<T>,
	unique_id: usize,
}

pub struct GadgetIterator<'d, T: Instruction> {
	section_start: usize,
	tail_instruction: T,
	predecessors: &'d [T],
	max_instructions: usize,
	noisy: bool,
	uniq: bool,
	start_index: usize,
	finished: bool,
}

pub trait Disassembly {
    /// Return the bytes held by the Disassembler
    fn bytes(&self) -> &[u8];

    /// Return whether or not the index is the tail of a Gadget
    fn is_tail_at(&self,
        index: usize,
        rop: bool,
        sys: bool,
        jop: bool,
        noisy: boo
    ) -> bool;

    /// Return a GadgetIterator from the given tail_index
    pub fn gadgets_from_tail(&self,
        tail_index: usize,
	max_instructions: usize,
	noisy: bool,
	uniq: bool,
    ) -> GadgetIterator
}

pub trait Instruction {
    /// Format and instruction and add it to output
    fn format(&self, output: output: &mut impl FormatterOutput);

    /// Whether or not the Instruction is a Ret
    fn is_ret(&self) -> bool

    /// Whether or not the Instruction is a Syscall
    fn is_sys(&self) -> bool

    /// Whether or not the Instruction is a ROP Gadget head
    fn is_rop_gadget_head(&self, noisy: bool) -> bool;

    /// Whether or not the Instruction is a stack pivot head
    fn is_stack_pivot_head(&self) -> bool;

    /// Whether or not the Instruction is a stack pivot tail
    fn is_stack_pivot_tail(&self) -> bool

    /// Whether or not the Instruction is a base pivot head
    fn is_base_pivot_head(&self) -> bool
    
    /// Whether or not the Instruction is a JOP Gadget
    fn is_jop(&self, noisy: bool) -> bool

    /// Whether or not the Instruction is valid
    fn is_invalid(&self) -> bool;
}

The only part of this that I'm a little uneasy about is using the &mut impl FormatterOutput trait from iced_x86 within the Instruction, but the trait's function, write(), takes in a FormatterTextKind which is an enum that maps really well to any kind of disassembled program.

Next Steps

I would love to get some feedback on the approach I've outlined above. In the meantime, I am going to get some performance tests going to see how capstone compares to disc-v.

B3NNY · Answer 1 · Fri Nov 24 2023 05:51:19 GMT+0800 (China Standard Time)

I like the ideas given. One thing I'm not a big fan of is locking in the API for the Instruction / Disassembly traits.

I'd ideally like to be able to define rules in a more flexible way rather than having things like is_rop_gadget_head as a trait method. My thoughts on this aren't complete, but I'd like to be able to extract "properties" out of instructions, then define rules based of those properties, and define how to search based on those rules (for example a rop gadget will use the property of being exactly the ret instruction, then a rule will be that you have a sequence of instructions which don't branch followed by a ret, and you will search from the back to front).

I do think having multiple disassembler included in the project is one obvious easy way to gain more platform support. Possibly with capstone being used as a fallback.

mbbutler · Answer 2 · Sat Nov 25 2023 09:11:23 GMT+0800 (China Standard Time)

A property-based rules system sounds like an interesting approach. What other kinds of properties do you think would be relevant? A few properties that come to mind are:

If the instruction is a branch/jump
Whether the instruction loads/stores or pushes/pops
Which register an instruction uses or modifies
If the instruction is a syscall instruction