apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store

Home Page:https://apple.github.io/foundationdb/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Injected disk error in AsyncFileChaos is not tracked and causing StorageServerDurabilityError

jzhou77 opened this issue · comments

The fault injection is here

pdata[corruptedPos] ^= (1 << deterministicRandom()->randomInt(0, 8));
// mark the block as corrupted
corruptedBlock = (offset + corruptedPos) / 4096;
TraceEvent("CorruptedBlock")
.detail("Filename", file->getFilename())
.detail("Block", corruptedBlock)
.log();

This error can later cause storage server SevError of StorageServerDurabilityError here

debug_checkRestoredVersion(data->thisServerID, version, "StorageServer");

In 7.2 cherrypicks #9732, commit 79d0687, seed
-f ./tests/slow/DiskFailureCycle.toml -s 282036857 -b off. I found that the corruption happened for one of the disk queue file, but the page was copied to another file and was discarded after the reboot, which makes it hard to track the dirty page.

120.205235 CorruptedByteInjection ID=0000000000000000 Filename=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-1.fdq Position=293707 BK,CC,CP,SS

126.428277 DQRecInvalidPage ID=fb3b1d127a41c6af NextReadLocation=1093668 HashCheck=0 Seq=1093632 Expect=1093632 File0Name=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-0.fdq SS
126.428277 DQTruncateFile ID=fb3b1d127a41c6af File=1 Pos=290816 File0Name=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-0.fdq SS
126.428477 FindPhysicalLocation ID=fb3b1d127a41c6af Page0Valid=1 Page0Seq=425984 Page1Valid=1 Page1Seq=802816 Location=946986 Context=lastPoppedSeq File0Name=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-0.fdq SS
126.428477 FoundPhysicalLocation ID=fb3b1d127a41c6af PageIndex=1 PageLocation=35 SizeofPage=4096 PageSequence=802816 Location=946986 Context=lastPoppedSeq File0Name=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-0.fdq SS
126.428477 DQTruncateFile ID=fb3b1d127a41c6af File=0 Pos=0 File0Name=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-0.fdq SS

126.428877 StorageServerDurabilityError ID=fb3b1d127a41c6af RestoredVersion=302130779 Checking=min MinVersion=303134785 MaxVersion=303722114 Backtrace=addr2line -e fdbserver.debug -p -C -f -i 0x47a9867 0x47a9b15 0x47a44a4 0x46c42f5 0x46c455a 0x2c9876c 0x2c97e19 0x2c97750 0x2c16c2e 0x2cc4fa6 0x2cc3cd7 0x22727c8 0x21872f8 0x21871a6 0x2187079 0x2187aae 0x2176c38 0x46b0888 0x46b0594 0x288ce38 0x17278a8 0x4696e03 0x469691a 0x2b7e6bc 0x7f0b6fe49555 SS

this was good i see the way it works