google / fuzzbench

FuzzBench - Fuzzer benchmarking as a service.

Home Page:https://google.github.io/fuzzbench/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build process is killed unexpectedly.

DonggeLiu opened this issue · comments

A process that builds Centipede with benchmark freetype2-2017 was killed unexpectedly.
The build log file did not show any error before cut off.
The gcloud log did not show useful info expect that it died with <Signals.SIGKILL: 9>.

The same error also happened on other benchmarks, e.g. harfbuzz.

make test-run-centipede-freetype2-2017 works perfectly.

Other info:
These processes were part of a locally-launched cloud experiment 2023-01-27-test-local-launch, with which we want to test if the latest framework changes will affect experiments launched from locally or requested by requests.yaml.

Could this be due to the OOM killer?

Could this be due to the OOM killer?

I do not recall any change in memory limit or related code since centipede's last successful experiment on the same benchmark freetype2-2017 on 18-01, though.

BTW, this error also affects other fuzzers, e.g. afl++.

Can we see dmesg logs on these machines, or any memory usage stats via the GCP console for any indication of OOM?

Can we see dmesg logs on these machines, or any memory usage stats via the GCP console for any indication of OOM?

Thanks!
I did not notice any from the gcloud log.
Would you know any way to access gcloud build instances? They are built with gcloud builds submit /work/src --config=/tmp/tmpg6woqb8o --timeout=46800s --worker-pool=projects/fuzzbench/locations/us-central1/workerPools/buildpool and are not the list of VM instances. I did not find a way to access them in the docs either.
Otherwise, we can probably add some debug log in the code to print memory info.

Lookign around, I'm not sure if there's a way to view system logs in a CloudBuild worker pool.

Is this happening consistently? If so, might be worth changing the instance type to something with a bit more memory (e.g. from "e2-highcpu-32" to "e2-standard-32" temporarily to see if it fixes things.

I don't recall seeing this in the past and am unsure if it is flaky.
A silly question: We have always been running those experiments in gcloud and it has always worked alright.
Is there a way to see if some default configuration was changed by gcloud?

At the same of running this experiment, I also request another one from GitHub.
In that experiment, the build logs of (centipede, freetype2-2017) terminated at the same step, while (afl++, libxml2) also terminated at similar steps.

Temporarily changing the instance type is a good idea. I will see if I can tweak that and launch another experiment on some selected (fuzzer, benchmark) pairs.

Testing this in #1650.

I see the same issue in my experiment:
https://www.fuzzbench.com/reports/experimental/2023-01-27-aflpp/index.html
80% !! of the experiments are failing to build

In the CI they all went green:
https://github.com/google/fuzzbench/actions/runs/4025274106/jobs/6918292861
(only one failing and that is one that is not in the coverage benchmark)

The build logs of the failing experiments all end in the middle of the build process without any errors being seen (gs://fuzzbench-data/2023-01-27-aflpp/build-logs/benchmark-libjpeg-turbo-07-2017-fuzzer-aflplusplus_at_cm.txt):

c655d92adaf3: Layer already exists
13bbbaf28a73: Layer already exists
546f9db501ca: Layer already exists
31fb99fed15d: Layer already exists
e65a3ff4b09d: Layer already exists
4dc14efe7306: Layer already exists
0002c93bdb37: Layer already exists

:-(

maybe a feature that a benchmark run is aborted if > 25% of the build targets fail would be good? would prevent wasting resources.

maybe a feature that a benchmark run is aborted if > 25% of the build targets fail would be good? would prevent wasting resources.

Eh...I think there's too many smart features in FB as is.

  1. I'm not sure this is a centipede issue. Centipede doesn't do anything too exotic in builds since it just uses clang, also Marc is pointing out issues with AFL++'s builds.
  2. I'm not sure we should switch to the fancier instances. As we saw in oss-fuzz they can cost a lot more.
  1. I'm not sure this is a centipede issue. Centipede doesn't do anything too exotic in builds since it just uses clang, also Marc is pointing out issues with AFL++'s builds.

Yep, I don't think it is either.

  1. I'm not sure we should switch to the fancier instances. As we saw in oss-fuzz they can cost a lot more.

We did not have this issue before, is there something that changed and caused this?
I switched to the instance with the highest memory for testing purposes.
Happy to test lower configurations to reduce cost.

Build failure disappeared after using a higher memory worker pool instance.
More details about the sequence of experiments: #1626 (comment).

maybe an update in the image (Ubuntu packages) increased the memory footprint? e.g. a docker update now needing more resourcen or something...
Are the docker images builds being performed with make -j? reducing the number of parallel processes would also reduce the memory required but of course lengthen the build process.

can someone please kill the experiment 2023-01-29-aflpp please? again way too many build errors to be useful, so this is just wasting resources. thank you

I noticed that an experiment with just 2 fuzzers built & ran cleanly, whereas one with 4 fuzzers had ~20% build failures and one with 5 fuzzers ~70% build failures.

maybe building less fuzzers in parallel on one instance is the easy solution?

2023-01-29-aflpp

Done.

I noticed that an experiment with just 2 fuzzers built & ran cleanly, whereas one with 4 fuzzers had ~20% build failures and one with 5 fuzzers ~70% build failures.

maybe building less fuzzers in parallel on one instance is the easy solution?

I don't think we are building multiple fuzzers on one instance though

I noticed that an experiment with just 2 fuzzers built & ran cleanly, whereas one with 4 fuzzers had ~20% build failures and one with 5 fuzzers ~70% build failures.
maybe building less fuzzers in parallel on one instance is the easy solution?

I don't think we are building multiple fuzzers on one instance though

there is however a correlation between number of fuzzers in the benchmark and build failures.
I pushed a run with 2 fuzzers this morning - and it is running fine without any build failures, like the other one.
Where I pushed a run with 5 runners it again had 70% failures (the one you canceled for me).

As I am doing lots of tests at the moment I can tell you this:

if more than 3 fuzzer builds are running (in one benchmark request or more, total count is what is important), then build failures occur. so if I requests 2 experiments at once, both with just 2 variants in there, or if I request 1 experiment with 4 variants, then there are fails. the more variants, the more fails, exponentially.

if I wait until the first requested benchmark is built and then request another (with max 2-3 variants) everything works fine.

What's the most recent experiment where this was an issue?

Well I'm quite sure this is a quota issue. These are appearing all over the logs (gs://fuzzbench-data/2023-01-29-aflpp/build-logs/) in experiments with build failures.

Looks like we regularly exceed this
image

We get higher than oss-fuzz wrt this metric. There are two relevant (i think) differences between how we do builds.

  1. OSS-Fuzz (in trial builds) has a 1 second sleep.
  2. OSS-Fuzz uses python libraries for submitting builds, not gcloud

that could be the reason.
for the competition this will be an issue as these have to run in parallel?

that could be the reason. for the competition this will be an issue as these have to run in parallel?

I'm gonna try my best to fix this before the competition.

Thanks!

I didn't see this error again in the past two week's experiment, shall we close this?

Let's close. We can always re-open if we see it again. @jonathanmetzman you had a better longer term fix in mind, let's track that in a new bug.