near / borsh

Binary Object Representation Serializer for Hashing

Home Page:https://borsh.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fuzz testing

MaksymZavershynskyi opened this issue · comments

We need to write the following fuzz tests for borsh:

A) Generate random type. Creating an object of the type filled with random data. Then serialize it and deserialize it, and compare that structure before and after are the same;
B) Generate random type. Creating an object of the type filled with random data. Serialize it. Randomly flip a subset of bits in the serialized structure. Try deserializing it and assert that it does not panic, but instead either deserializes or returns an error.

The two difficult things to implement would be:

  • Generating random type;
  • Creating an object of the type filled with random data;

As an option, I suggest we do both using procedural macros. We can have a macro random_type!(Name, X, Y, seed) that generates a token stream corresponding to a declaration of some type Name using https://doc.rust-lang.org/reference/procedural-macros.html#function-like-procedural-macros where X would be the max depth (e.g. if we have nested structures) and Y is the max width of each node (e.g. max number of fields in a struct or max number of variants in an enum).

Each type would also be decorated with #[derive(RandomInit)] which implements trait

trait RandomInit {
random_init() -> Self
}

for the type, just like we do with serializers. We then would implement RandomInit for basic types and collections, just like we do with serializers.

Then our test would be something like:

random_type!(T0, 1, 1, 42);
...
random_type!(T42, 10, 12, 42);

#[test]
fn test0() {
  for _ in 0..100 {
   let t0 = T0::init_random();
   let out_t0: T0 = try_from_slice(&t0.try_to_vec().unwrap()).unwrap();
   assert_eq!(t0, out_t0);
  }
}

Note should also look at the fuzzing tools that sigma prime wrote for our borsh, we might not need to write it ourselves.

commented

I looked at what sigma prime did with load testing.
They created 31 Rust types of relatively simple format (30 types have only one field/variant and are not nested). Then they try to deserialize these types from a randomly generated array of bytes. I think this is a good starting point for fuzz testing, but we need to go further:

  • Have types with variable number of fields/variants and depth of nesting;
  • Do two types of checks:
    A) Consistency check (see above). We need to make sure we deserialize into what we serialized;
    B) Minor corruption. Serializing meaningful object and then corrupting a selected set of bits is a more subtle way of testing our code for panics.

Also, we need to serialize objects initialized with different data, see above.

Also would be good to have at least some tests that test serializing with JS and deserializing with Rust and the other way.
Also we will get more languages here (already have Python, prob Go and C# going to be next), so would be good to have some generic way of testing any set of serializers/deserializers.

Generating random type;

We can generate an input for a random nested type, but to generate a proper deserializer we would need to compile it (generate a program, macros will be not enough)..

A main goal of fuzzing is to test that your program will not crash under unexpected input, I'd focus on in and then we could expand it (if needed).

commented

We can generate an input for a random nested type, but to generate a proper deserializer we would need to compile it (generate a program, macros will be not enough)..

Yes, I suggest we have a bash script that runs two binaries:

  • A tools that generates a rust file that contains nested types (nested types are also decorated with #[derive(BorshSerialize, BorshDeserialize)];
  • A test that uses this rust file to serialize/deserialize.