apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store

Home Page:https://apple.github.io/foundationdb/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

splitStorageMetrics stuck when waiting for locations reduce at consistency check

xis19 opened this issue · comments

When running consistency check, we check for splits storage metrics. In #10066 , we calls splitStorageMetrics to get the number of splits. During the call, we check the number of key range locations by calling getKeyRangeLocations.

When the number of locations (NoL) is larger than a given value, CLIENT_KNOBS->STORAGE_METRICS_SHARD_LIMIT, we will wait and retry checking NoL, until it is smaller than the limit. By default, the limit is 100, yet when buggify is on, it will have a chance to be set at 3.

Now, in the consistency check, the database has no changes, when NoL reaches 3 or more, it is possible that the whole process stuck at splitStorageMetrics waiting the change of key locations.

A quick fix might be increase the number of NoL limit, but we may need more insight on this before brutally fixing this.

This issue is found in #10140, at git hash 0a3ffbf86815cf9e78360a6e6f0d44687b791418 with tests:

bin/fdbserver -r simulation --crash -s 1404029917 -b on -f /root/src/tests/fast/MutationLogReaderCorrectness.toml
bin/fdbserver -r simulation --crash -s 2151331495 -b on -f /root/src/tests/fast/MutationLogReaderCorrectness.toml