AFLplusplus / AFLplusplus

The fuzzer afl++ is afl with community patches, qemu 5.1 upgrade, collision-free coverage, enhanced laf-intel & redqueen, AFLfast++ power schedules, MOpt mutators, unicorn_mode, and a lot more!

Home Page:https://aflplus.plus

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question re `-D` (deterministic) flag

smoelius opened this issue · comments

A comment in the code describes this option as "partial deterministic":

case 'D': /* partial deterministic */

Is "partial" the correct term, i.e., does this flag no longer provide "full" determinism?

I ask because I have CI runs that use -D and that used to succeed consistently. But now those runs now fail sporadically.

Bisecting suggests the problem began at #1972.

I don't fully understand that PR's changes. But was one of them to remove "full" determinism?

(cc: @kdsjZh)

"partial deterministic" skip some deterministic mutations so it can be faster.

We faced similar issues during the PR, the primary reason for CI failure is that the AFL_CUSTOM_MUTATOR_ONLY detects the -D and reports an error.

Currently, Marc comments the functionality regarding -d, so you can configure AFL++ without -D. in that case, the AFL++ will work the same, while being able to pass the CI.

Yes partial is the right term. Not all are fuzzed deterministicly and those who are might stop doing that early if no success is found.

Thank you both for your responses.

Yes partial is the right term. Not all are fuzzed deterministicly and those who are might stop doing that early if no success is found.

This is a change, correct? Is there a way to enable the behavior previously enabled by -D?

Currently, I remove the vanilla -D stage since current mode is an better alternative. Maybe you could opt to commit eda770f, which is the last commit with vanilla -D

Maybe you could opt to commit eda770f, which is the last commit with vanilla -D

I appreciate the suggestion, but I don't think that would work for my case.

The failures are in test-fuzz, which is a wrapper around cargo-afl, which is a wrapper around AFL++. Right now, cargo-afl uses AFL++ 4.10c. Using a different version of AFL++ would mean changing cargo-afl, which would affect other users besides test-fuzz.

Plus, it is useful to receive updates not related to the determinacy changes.

I am hopeful that there is a solution that is as deterministic as the previous -D was, and doesn't requiring pinning to an older version.

Here is an example of the phenomenon I am describing: https://github.com/trailofbits/test-fuzz/actions/runs/8376101679/job/22934991926#step:12:155

The failing test fuzzes this function: https://github.com/trailofbits/test-fuzz/blob/344eaadc725964a4b22da9f5e57d97493eb97a67/examples/tests/qwerty.rs#L2

The test used to pass consistently. But as of late, it fails sporadically. The output suggests the fuzzer gets stuck pursuing unproductive inputs.

I have a deadline this weekend, but if @vanhauser-thc think it make sense, I can try to work on a patch that enables the original destage mode by AFL++ env variables afterward.

BTW, Mopt mode -L min should be able to pass the CI testing; the destage in Mopt mode is not touched.

I am away on vacation, I can look into this in 1-2 weeks

The failures are in test-fuzz, which is a wrapper around cargo-afl, which is a wrapper around AFL++. Right now, cargo-afl uses AFL++ 4.10c. Using a different version of AFL++ would mean changing cargo-afl, which would affect other users besides test-fuzz.

Plus, it is useful to receive updates not related to the determinacy changes.

I am hopeful that there is a solution that is as deterministic as the previous -D was, and doesn't requiring pinning to an older version.

I don't think cargo-afl should use any deteministic mutations at all(!)
Then it would also not get stuck in them...

(because it's proven to perform worse on almost all targets than just havoc)

I don't think cargo-afl should use any deteministic mutations at all(!)

It does not, at least not by default. Sorry for being unclear.

cargo-afl is a thin wrapper around the afl- tools (ncluding afl-fuzz) and forwards command line arguments to them. test-fuzz invokes cargo-afl with -D because it is helpful, e.g., in CI.

(because it's proven to perform worse on almost all targets than just havoc)

What are you citing when you say this?

(Sorry for the funny account---I am traveling.)

Especially for CI fuzzing it is bad. I mean it has proven to be very ineffective but in a CI there is nothing worse.

Especially for CI fuzzing it is bad. I mean it has proven to be very ineffective but in a CI there is nothing worse.

Why? Wouldn't one want CI to be deterministic?

I am on vacation on my phone, so typing is … well
When you fuzz deterministicly then you are not fuzzing but unit testing right?
But seriously, in the original deterministic fuzzing feature a single mutation was done per fuzz, most of them without impact and therefore just wasting time.
And as each mutation was done once per each byte for each file this is even more pointless

When you fuzz deterministicly then you are not fuzzing but unit testing right?

I am not sure I see your point.

I am willing to accept that there are more effective means of finding bugs. But strictly speaking, the fuzzing that test-fuzz does in CI is part of unit tests.

That value that test-fuzz adds to cargo-afl is automatic corpus and fuzzing harness generation. The purpose of the unit tests is to ensure that those mechanisms work correctly.

But seriously, in the original deterministic fuzzing feature a single mutation was done per fuzz, most of them without impact and therefore just wasting time.
And as each mutation was done once per each byte for each file this is even more pointless

I am still failing to see why the changes make -D less deterministic.

When you fuzz deterministicly then you are not fuzzing but unit testing right?

I am not sure I see your point.

it was a joke, likely not well made :)

I am still failing to see why the changes make -D less deterministic.

it skips the deterministic fuzzing for a queue entry if it does not find anything.

That value that test-fuzz adds to cargo-afl is automatic corpus and fuzzing harness generation. The purpose of the unit tests is to ensure that those mechanisms work correctly.

You can see in the CI output that a lot of new queue items were found. I think the CI checks if a crash was detected?
IMHO the test case is what needs fixing.
If you would use a shorter test case, you will ensure that afl will find it - but still optimization can make this impossible (you know, rust is a very fast moving target.)

This

        !(data.len() == 6
            && data.as_bytes()[0] == b'q'
            && data.as_bytes()[1] == b'w'
            && data.as_bytes()[2] == b'e'
            && data.as_bytes()[3] == b'r'
            && data.as_bytes()[4] == b't'
            && data.as_bytes()[5] == b'y')

could be easily optimized by the compiler (and actually I would expect that to be the case, so I am surprised this actually worked) for a assert!(data.len() == 6 && memcmp(data.as_bytes(), "qwerty", 6) == 0) (well, the llvm IR equvalent to this).

Why don't you use the cmplog.rs example instead and add another byte comparison?

I am still failing to see why the changes make -D less deterministic.

it skips the deterministic fuzzing for a queue entry if it does not find anything.

Could you elaborate on that? What does "it does not find anything" mean?

You can see in the CI output that a lot of new queue items were found. I think the CI checks if a crash was detected?

That's correct.

... could be easily optimized by the compiler (and actually I would expect that to be the case, so I am surprised this actually worked) for a assert!(data.len() == 6 && memcmp(data.as_bytes(), "qwerty", 6) == 0) (well, the llvm IR equvalent to this).

That doesn't appear to be the case. I put the example into Compiler Explorer, and even with -C opt-level=3, the compiled code is a sequence of cmp and jne instructions: https://godbolt.org/z/j5oTW4W4f

Why don't you use the cmplog.rs example instead and add another byte comparison?

I appreciate the suggestion. To be sure I am not misunderstanding, you mean cmplog.rs from afl.rs, correct?

I tried an adaptation of that example, but it fails in essentially the same way (see here and here).

I am still failing to see why the changes make -D less deterministic.

it skips the deterministic fuzzing for a queue entry if it does not find anything.

Could you elaborate on that? What does "it does not find anything" mean?

actually what I said it not true. the deterministic phase has a maximum time it runs. when the time is depleted, the rest of the deterministic phase is skipped.

You can see in the CI output that a lot of new queue items were found. I think the CI checks if a crash was detected?

That's correct.

... could be easily optimized by the compiler (and actually I would expect that to be the case, so I am surprised this actually worked) for a assert!(data.len() == 6 && memcmp(data.as_bytes(), "qwerty", 6) == 0) (well, the llvm IR equvalent to this).

That doesn't appear to be the case. I put the example into Compiler Explorer, and even with -C opt-level=3, the compiled code is a sequence of cmp and jne instructions: https://godbolt.org/z/j5oTW4W4f

Why don't you use the cmplog.rs example instead and add another byte comparison?

I appreciate the suggestion. To be sure I am not misunderstanding, you mean cmplog.rs from afl.rs, correct?

yes

I tried an adaptation of that example, but it fails in essentially the same way (see here and here).

it does the same thing so there is no change :-)

as I said, you can make it simpler. (eg 4 comparisons instead of 6)
just do something like this:

        if data.len() < 7 {
            return;
        }
        if data[0] != b'A' {
            return;
        }
        if data[1] != b'B' {
            return;
        }
        if data[2] != b'C' {
            return;
        }
        if data[3..7] != 0x6969_4141_i32.to_le_bytes() {
            return;
        };
        panic!("boom");

the old deterministic fuzzing is not coming back.

actually what I said it not true. the deterministic phase has a maximum time it runs. when the time is depleted, the rest of the deterministic phase is skipped.

👍

the old deterministic fuzzing is not coming back.

I appreciate your frankness.

But please realize this was a breaking changing. I would like to humbly suggest that a future, similar change at least be called out in the change log as "potentially breaking," if not accompanied by a major version bump.

Would you consider making the deterministic timeout configurable?

I would not consider it breaking as it was not a default and using the command line options don’t error. It breaks your specific usage yes, but so could any code commit depending on how a user is using something.

If you want to be able to configure the timeout then send a PR that uses an env var for that.

again - the issue is the specific testcase :)

again - the issue is the specific testcase :)

I respectfully disagree, but again I appreciate your frankness.

Am I looking at the right code? Because it looks like the default timeout is 15 minutes.

#define MAX_DET_TIMEOUT (15 * 60 * 1000)

I'm asking because the above "qwerty" test used to pass consistently with an overall 60 second timeout.

It is not my code, I just ensured that it improved fuzzing performance, but grepping it seems like it is:

$ grep  MAX_DET_TIMEOUT src/*.c
afl-fuzz-skipdet.c:#define MAX_DET_TIMEOUT (15 * 60 * 1000)
afl-fuzz-skipdet.c:    if (unlikely(get_cur_time() - cur_ms > MAX_DET_TIMEOUT)) return 1;

I don't think the timeout is what is causing the problem.

Bisecting further suggests the problem began at this commit: eb3be74

EDIT: I think this is a different error.

I quickly looked at the code, it is much more complicated than that. there are various checks where it can decide to skip the deterministic fuzzing.

I created a branch here: https://github.com/smoelius/AFLplusplus/tree/d-questions

The branch's changes are roughly a subset of 1d308b8's changes from #1972:

From what I can tell, if I remove any one of the changes, my test passes. But I have to keep them all for my test to fail. (I haven't dug into why I need to comment out plot_profile_data.)

Ultimately, I would like to have a test where:

  • AFL++ can reliably find a crash.
  • The test is robust against future changes to AFL++.

In light of this new data and my elaborated goals, is a shorter test still what you would recommend?

A shorter test and a cmplog check (the integer test from cmplog.rs) that i elaborated in a previous comment solves your issue and shows everything is working

Thank you very much for your help.