rust-fuzz / arbitrary

Generating structured data from arbitrary, unstructured input.

Home Page:https://docs.rs/arbitrary/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improving support for generating recursive types

mhlakhani opened this issue · comments

Hi!

Would it be possible to update the API to allow implementors of Arbitrary to plumb some context through which would be useful when manually implementing Arbitrary?

I most often need this to implement a recursion limit when handling arbitrary recursive types, e.g. a tree. There is a workaround involving globals (using lazy_static) but it would be much easier to do if the trait supported a method like Arbitrary::with_context() which took a struct of a user-defined type so I could pass along the depth limit.

Here's some abstracted code for what I currently have (with a wrapper struct for making the depth tracking less verbose), if it helps:

enum Expression {
    Constant(f64),
    X,
    Y,
    Add(Box<Expression>, Box<Expression>),
}

lazy_static! {
    static ref ARBITRARY_RECURSION_GUARD: AtomicUsize = AtomicUsize::new(16);
}

impl<'a> Arbitrary<'a> for Expression {
    fn arbitrary(u: &mut Unstructured<'a>) -> arbitrary::Result<Self> {
        let depth = ARBITRARY_RECURSION_GUARD.fetch_sub(1, Ordering::SeqCst);
        if (depth == 0) {
           ARBITRARY_RECURSION_GUARD.fetch_add(1, Ordering::SeqCst);
           Ok(Expression::X)
        } else {
            let expr = // generate stuff here, can involve recursive calls to Expression::arbitrary()
            ARBITRARY_RECURSION_GUARD.fetch_add(1, Ordering::SeqCst);
            Ok(expr)
       }
    }
}

This would be simpler if the Expression::arbitrary() method could take an argument that it could pass down to the recursive calls.

If extending the API in this way isn't possible, do you have suggestions for how best to encode this pattern?

I find that the best approach in this kind of situation is to use a helper function:

impl<'a> Arbitrary<'a> for Expression {
    fn arbitrary(u: &mut Unstructured<'a>) -> arbitrary::Result<Self> {
        arbitrary_expr(u, 0)
    }
}

fn arbitrary_expr(u: &mut Unstructured, depth: u32) -> arbitrary::Result<Expression> {
    if depth >= MAX_DEPTH {
        // non-recursive version here...
    } else {
        match u.int_in_range(0..=4) {
            0 => Ok(Expression::Constant(u.arbitrary()?)),
            1 => Ok(Expression::X),
            2 => Ok(Expression::Y),
            // recursive calls increment depth
            3 => Ok(Expression::Add(
                Box::new(arbitrary_expr(u, depth + 1)?),
                Box::new(arbitrary_expr(u, depth + 1)?),
            ),
            _ => unreachable!(),
        }
    }
}

This approach also doesn't require changes in the arbitrary crate itself, nor complicate the Arbitrary trait any further.

For a real world example, check out wasm-smith which has tons of extra context used when generating Wasm instructions to ensure that they are always valid sequences. While the top level wasm_smith::Module type implements Arbitrary, most of the internal/intermediate types do not, since they pass around that extra context.

Thanks for the suggestion, this should work for me; I'm happy to close this issue out.

Should we put this recommendation in the docs anywhere, to help the next person that might run into this?

Should we put this recommendation in the docs anywhere, to help the next person that might run into this?

If you'd like to make a pull request, that would be much appreciated! I think a subsection in the crate-level docs or the Arbitrary trait's docs would be appropriate. I think whichever place you would have expected it to be or wished it had been would be good.

Thanks!