concurrency: (*inversionTree).GetInvertedMatrix holds a read-lock before returning a slice to naked usage yet slices are mutable and without the lock hence susceptible to using the wrong data/data races
odeke-em opened this issue · comments
While examining reconstructSome and reading through neighboring code I noticed this code
Lines 41 to 58 in 2124328
in which we firstly hold the read lock and after we've found the matrix we release the lock. When that code gets released to the outside world, it is no longer under the lock and if there is any concurrency in the usage of which there is, all users will be reading from stale data and susceptible to a data race because despite InsertInvertedTree holding a read-write lock, reads and iterations were not under a lock for example here
Line 1526 in 2124328
Suggestion
For maximum protection and to ensure reads/writes are performed under a lock and mutable safely, I suggest this seemingly awkward API
// iterateMatrix is meant to safely mutate the matrix under concurrency.
// consumeAndOK should return false if it needs to stop the iteration; it MUST not mutate row
func (t *inversionTree) iterateMatrix(consumeAndOK func(immutableRow []byte) bool) {
t.mutex.Lock()
defer t.mutex.Unlock()
for _, row := range t.matrix {
if !consumeAndOK(row) {
break
}
}
}
func (t *inversionTree) invertedMatrixIsCached() bool {
t.mutex.Lock()
defer t.mutex.Unlock()
return t.matrix != nil
}
func (t *inversionTree) matrixAt(i int) []byte {
t.mutex.Lock()
defer t.mutex.Unlock()
return t.matrix[i]
}
I have found such data race bugs in other projects that crept up due to innocent usage cometbft/cometbft#2158 but cause a serious nuisance especially for highly sensitive code (in that case some people used to complain about oddities in p2p connections for 7 years and my bug finding confirmed it)
Hi! Thanks for the report. I am on a one week holiday, so it may take a little time before I can review thoroughly.
In my recollection the inversion matrices themselves would be immutable - but I may easily be misremembering. I will look through it.
@odeke-em I fail to see where the race can take place.
GetInvertedMatrix
holds a read lock as you point out only while it is looking for a cached entry. If a cached entry is returned I don't see any mutations happening to it.
InsertInvertedMatrix
holds a write lock - and it is only called with "fresh" entries. While it may overwrite an entry in the cache, but it will never mutate an existing. The RW lock should adequately protect this.
I am against changing code that doesn't have a failure path that has been working for 8 years. For new code, no problem. This is not part of any exposed API.
If any of what I write above is wrong - feel free to correct me.