corpus archive changes

Question

corpus archive changes

vanhauser-thc opened this issue a year ago · comments

I just noticed that the corpus archive was changed and only contains the corpus and crashes, not other information files.

While I understand that non-corpus/non-crash data is not useful for gathering coverage, it removes our ability to get an in-depth analysis what happened in the fuzzer run, e.g. via default/fuzzer_stats or introspection features we can activate.

This hinders us to actually learn about what works and what not when developing on the fuzzer, because of CPU and randomness fluctuations something like a 0.3% coverage difference (or higher if only two very similar variants are run) is common.

Can this change please be reverted? or an alternative found? otherwise fuzzbench is much less helping us :(

@jonathanmetzman

jonathanmetzman · Answer 1 · Thu Feb 16 2023 22:51:48 GMT+0800 (China Standard Time)

We definitely need the change I made at some point, storing the same file 90X was grossly inefficient. But I want to get this working for you.
Intuitively I don't understand why the stats file is no longer included, because the new approach is to only archive modified files, https://github.com/google/fuzzbench/blob/master/experiment/runner.py#L390 even if the name is the same. Do you know why the stat file might not be considered "modified"? Maybe we are doing something wrong or AFL++ is doing something funky

jonathanmetzman · Answer 2 · Thu Feb 16 2023 22:56:31 GMT+0800 (China Standard Time)

Actually I think there's a different feature at fault here: https://github.com/google/fuzzbench/blame/master/experiment/runner.py#L51 but I'm not sure it's new? Maybe I was using it differently in the past (e.g. to decide if the corpus was "unchanged") but still included it in the zip and now I don't do this?

jonathanmetzman · Answer 3 · Thu Feb 16 2023 22:59:52 GMT+0800 (China Standard Time)

Yup confirmed my theory: https://github.com/google/fuzzbench/blame/1ca79526a3c752f63f79089c73e35bc69d505959/experiment/runner.py#L426

jonathanmetzman · Answer 4 · Thu Feb 16 2023 23:00:19 GMT+0800 (China Standard Time)

I'm not sure I want to push a potentially breaking change before the contest though. I guess because fixing this will only remove code it's probably low risk

Dongge Liu · Answer 5 · Fri Feb 17 2023 19:03:02 GMT+0800 (China Standard Time)

@vanhauser-thc Would you consider this critical for the competition?
As you know, we plan to launch the pre-run this weekend : )

van Hauser · Answer 6 · Sat Feb 18 2023 16:43:41 GMT+0800 (China Standard Time)

@vanhauser-thc Would you consider this critical for the competition? As you know, we plan to launch the pre-run this weekend : )

no that has nothing to do with the competition. I do not need this for the competition run :)

often for testing if a change is good or bad a simple run is enough.
but sometimes we need to collect data why something is working or not and then this need the meta data we collect.