Corrupt BTree merge results when duplicate keys leak across common subtree boundaries
lorentey opened this issue · comments
Most of BTree's merge methods have subtle bugs in their common subtree detection logic, which can sometimes lead to corrupt results and/or assertions when duplicate keys cross the boundaries of a common subtree.
It is not easy to reproduce this bug using public API, but the following test manages to do it:
func test_Intersection_withModifiedSelf() {
var tree = BTree<Int, Int>(order: 5)
for i in 0 ..< 10 {
for j in 0 ..< 2 {
tree.insert((i, j))
}
}
for i in 0 ..< 10 {
var other = tree
other.withCursor(atOffset: 2 * i) { cursor in
cursor.remove(2)
}
other.assertValid()
let test = tree.intersection(other)
XCTAssertEqual(test.map { $0.0 }, other.map { $0.0 })
}
}
Results on current master branch:
Test Case '-[BTreeTests.BTreeMergeTests test_Intersection_withModifiedSelf]' started.
XCTAssertEqual failed: ("[1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") -
XCTAssertEqual failed: ("[0, 0, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 0, 0, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") -
XCTAssertEqual failed: ("[0, 0, 1, 1, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 0, 0, 1, 1, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 1, 1, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") -
XCTAssertEqual failed: ("[0, 0, 1, 1, 2, 2, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 0, 0, 1, 1, 2, 2, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 1, 1, 2, 2, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") -
XCTAssertEqual failed: ("[0, 0, 1, 1, 2, 2, 3, 3, 5, 5, 6, 6, 7, 7, 8, 8, 0, 0, 1, 1, 2, 2, 3, 3, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 1, 1, 2, 2, 3, 3, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") -
XCTAssertEqual failed: ("[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 6, 6, 7, 7, 8, 8, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 6, 6, 7, 7, 8, 8, 9, 9]") -
XCTAssertEqual failed: ("[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 7, 7, 8, 8, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 7, 7, 8, 8, 9, 9]") -
Test Case '-[BTreeTests.BTreeMergeTests test_Intersection_withModifiedSelf]' failed (0.076 seconds).
Note how the returned results are often not even sorted correctly.
The bug affects distinctUnion
, subtracting
, symmetricDifference
, intersection
, and the new bag variants, but not union
.
There were two distinct sub-bugs:
- Sometimes BTree merges considered an entire subtree shared even after they have already processed a non-empty prefix of it. As a result, the prefix tree would be inserted twice into the result, breaking the ordering invariant. This happened when the shared subtree was a suffix tree of one of the input trees. Affected
BTree
methods:distinctUnion
,intersection
, andbagIntersection
. - When a merging method detected a shared subtree S, and exactly one of the input trees had at least one element immediately following S with the same key as
S.last!.key
, these extra elements would not be counted as having matching keys across the two input trees. AffectedBTree
methods:distinctUnion
,subtracting
,symmetricDifference
andintersection
.
(Unit tests to catch these errors have been added in 8680703.)