Rebuild CI system to work more like OSS-Fuzz's trial builds

Question

Rebuild CI system to work more like OSS-Fuzz's trial builds

jonathanmetzman opened this issue 2 years ago · comments

The current system doesn't scale well to having so many projects and makes it difficult to know what went wrong.

Dongge Liu · Answer 1 · Sun Nov 20 2022 09:26:21 GMT+0800 (China Standard Time)

Yes! I was about to create an issue for the same reason before I saw this: )
There are two things in my mind: A major problem and a minor improvement.

Main problem

The major problem is having too many benchmarks in CI tests to finish within the time limit (300 minutes).
It does not scale well because the code attempts to run all benchmarks in each category (oss-fuzz, standard, bug) yet the time limit is fixed.

Solutions

OSS-Fuzz's trial build is the best solution to this (very glad you proposed this).
- Would this allow external users (competition participants) to test-run their new fuzzers by themselves?
- I really like this idea and am more than happy to assign myself to it after I finish the current work on upgrading Ubuntu&Python.
Just in case we cannot do 1, (e.g. insufficient time to implement trial build, competition participants cannot test their fuzzers), a less optimal alternative is to limit the number of benchmarks in CI, e.g. only testing the most commonly supported 20 benchmarks. More specifically, all benchmarks in CI tests should be supported by the core fuzzers (and the new fuzzer added in the PR, if any):

for benchmark in ./benchmarks/*/
do 
    # Not mentioned in `unsupported_fuzzers`.
    if ! git --no-pager grep -qEw 'afl|aflfast|aflplusplus|aflsmart|eclipser|fairfuzz|honggfuzz|libfuzzer|mopt|libafl|centipede' $benchmark/benchmark.yaml
    then
        echo `basename ${benchmark}`; 
    fi
done

The number 20 is purely empirically based on observations on current CI tests that timed out.

Minor improvement

Currently, we test-run all fuzzers on all benchmarks even if only one fuzzer/benchmark changes, which wastes a lot of time and computation power.
Instead, we could automatically detect the fuzzers and benchmarks changed in a PR and:

For each fuzzer changed in the PR, we only test-run all supported benchmarks on that fuzzer.
For each benchmark changed in the PR, we only test-run all supported fuzzers on that benchmark.