db: WAL failover to deal with transient unavailability
sumeerbhola opened this issue · comments
We see transient write unavailability of block devices in the cloud (< 60s) that are sometimes detected as disk stalls resulting in node crashes. Whether the node crashes or not, this negatively impacts the user workload. Read have not been observed to stall in this manner, and additionally reads can often be satisfied using the Pebble block cache, or the OS page cache.
WAL failover relies on more than one block devices configured for the node, say two block devices and two Pebble DBs. The WAL for one Pebble DB can temporarily failover to the block device of the other. Flushes and compactions will stall, but most workloads are writing at a rate that we can afford to buffer 60s of data in memtables. More details in https://docs.google.com/document/d/1vAsftzyPG-kDy-A2Ic1fZeKd4OKJIf6N7KvNXRpDAFA/edit#heading=h.8n1r6sehoqgk (internal doc).
Also see CRDB-35401