A new model for requesting experiments

Question

A new model for requesting experiments

jonathanmetzman opened this issue a year ago · comments

We should change the model we use to request experiments.
Instead of committing all code to be used in experiments along with a request in experiment-requests.yaml, we will start experiments on pull requests using the gcbrun keyword.
This will have the following benefits:

Experiments will start immediately instead of at 6 AM and 6 PM PST
Experiments will be trivial to reproduce, just a comment (unless there's a db schema change...or some other breaking change this will need to be backported)
It will make fuzzbench more flexible. Experiments can go wild making changes to the fuzzbench infra. Now experiments will be able to add whatever benchmarks they want (and not worry if every other fuzzer in the world can support it. They can now do other interesting things like change the snapshot time from 15 minutes to arbitrary times. Change the OS if you want. I'm actually excited about what people will come up here, I feel like FuzzBench hasn't accepted so many innovations in these areas because we are worried about the effects this will have on others. Now we can provide a solid framework and CPU time and let others do the thinking.
It will be more obvious when experiments fail to start, the current model uses a single instance with a cron job and is quite fragile.
It will reduce maintenance burden on fuzzbench maintainers. I think in hindsight it was a mistake to commit every fuzzer to fuzzbench and give some kind of guarantee that we will maintain them. In the future we should only commit core fuzzers, variants will be kept in people's forks and to experiment on them they can make a PR. This will also make our CI situation a little less ridiculous and wasteful.

I can think of some downsides however:

More fragmentation. If researcher A does an interesting experiment with fuzzer B and then researcher C wants to use fuzzer A in their experiment, they now have to grab that integration code from a pull request instead of the master branch. I don't think this use case is super common however.
Possibly slightly increased burden on power users that will need to do many different kinds of experiments. Now they may need to deal with merging our code.

Oliver Chang · Answer 1 · Fri Jan 06 2023 11:46:29 GMT+0800 (China Standard Time)

This sounds like a pretty awesome improvement!

If I'm understanding correctly, the new experiment model from an outside user will look like:

Create a PR
We review + run /gcbrun to kick it off.
Don't commit it into the repo unless it's a core fuzzer / improvement to an existing fuzzer.

I wonder if there's some way to make the /gcbrun part more self service too (and account for abuse).

@alan32liu for thoughts too.

Dongge Liu · Answer 2 · Sun Jan 08 2023 15:12:09 GMT+0800 (China Standard Time)

Yep, Jonathan mentioned this during our meeting, and I think it is fantastic in many ways.

In particular, I think this is suitable and beneficial for researchers:

This reduces the maintenance burden of researchers and us, as they may not have the time to maintain their fuzzer after the paper is published.
Meanwhile, this also keeps result reproduction relatively simple: if a researcher wants to reproduce the results of other fuzzers, they can easily fork that PR.

Some minor suggestions:

Maybe we could automatically post the report link to the corresponding PR, once the experiment finishes?
In this way, users do not have to repeatedly ask us if their experiment has finished, and other researchers can know the result is trustworthy.
Also, this clearly lists the timeline of improvements, making it easier for researchers to track how each modification affects the result.
If some researchers do not wish to publish the result too early, we could also give them the option of sending the report link to their email. We include the timestamp and commit id in the email for record purposes.

Dongge Liu · Answer 3 · Sun Jan 08 2023 15:22:26 GMT+0800 (China Standard Time)

More fragmentation. If researcher A does an interesting experiment with fuzzer B and then researcher C wants to use fuzzer A in their experiment, they now have to grab that integration code from a pull request instead of the master branch. I don't think this use case is super common however.

We can modify the 'merging with nonprivate' feature to solve this.
For example, we can allow C to use a parameter to include the latest result of B in their experiments.

I recall we fixed this feature in Q4 2022, but it did not consider such use cases when it was designed, hence will need a bit more work.
For example, we do not wish to include all results in the past, especially given we recently updated the fuzzers and benchmarks. We do not wish to include the test-run results either.
Instead, we need to keep track of the version of benchmarks and fuzzers in each experiment and only include the suitable ones.

Dongge Liu · Answer 4 · Sun Jan 08 2023 15:25:45 GMT+0800 (China Standard Time)

Possibly slightly increased burden on power users that will need to do many different kinds of experiments. Now they may need to deal with merging our code.

Sorry, I am not sure if I understand this correctly, could you please elaborate? Thanks : )

Dongge Liu · Answer 5 · Sun Jan 08 2023 16:04:03 GMT+0800 (China Standard Time)

I wonder if there's some way to make the /gcbrun part more self service too (and account for abuse).

This would be fantastic.
This solves one of the main pain points of early-career students/researchers who do not have the resource to test their ideas, and will encourage them to keep using FB in the future.

Some possible ways to avoid abuse or reduce costs:

Only allow users to run experiments with their own fuzzers (and use the 'merge with nonprivate' feature to compare their results with core fuzzers or others).
Limit the number of long experiments (e.g. 12/24 hours) users can run (e.g. 1 in every 3 days?). We can be more generous about short experiments (e.g. <= 3 hours) so that they can double-check their code before requesting a long one.
Limit the number of instances users can run in each experiment. For example, they may not need more than 4 instances in a short experiment to check if their code can work.
Only allow users to run new experiments after their code is changed.

jonathanmetzman · Answer 6 · Wed Jan 25 2023 12:53:04 GMT+0800 (China Standard Time)

not done yet