rust-fuzz / arbitrary

Generating structured data from arbitrary, unstructured input.

Home Page:https://docs.rs/arbitrary/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generate collections via a "continuation byte" rather than reading a length

fitzgen opened this issue · comments

Rather than doing

let mut collection = Collection::new();
let n = u.arbitrary_len()?;
for _ in 0..n {
    collection.insert(u.arbitrary()?);
}

to create a collection, we should consider doing

let mut collection = Collection::new();
loop {
    let continue = u.arbitrary::<bool>();
    if !continue {
        break;
    }
    collection.insert(u.arbitrary()?);
}

This paper found this second approach to make test case reduction much more effective.

We might not want to do this for strings though, since the fuzzer likely has some smarts around utf-8 sequences that we might break.

Also, we might want to use the size hint to still put an upper limit on how many elements we parse.

Seems good to me.

If continuation bytes are fast I wonder if the same can't be done for strings: basically the string is "parse as much of it as you can, stop when no longer UTF8", and the utf8ness of the byte is basically the continuation byte.

I've been experimenting with this locally, and it does have a huge benefit for test case reduction. I'll make a PR in a bit with some other tweaks as well.