klauspost / reedsolomon

Reed-Solomon Erasure Coding in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

concurrency: (*inversionTree).GetInvertedMatrix holds a read-lock before returning a slice to naked usage yet slices are mutable and without the lock hence susceptible to using the wrong data/data races

odeke-em opened this issue · comments

While examining reconstructSome and reading through neighboring code I noticed this code

func (t *inversionTree) GetInvertedMatrix(invalidIndices []int) matrix {
if t == nil {
return nil
}
// Lock the tree for reading before accessing the tree.
t.mutex.RLock()
defer t.mutex.RUnlock()
// If no invalid indices were give we should return the root
// identity matrix.
if len(invalidIndices) == 0 {
return t.root.matrix
}
// Recursively search for the inverted matrix in the tree, passing in
// 0 as the parent index as we start at the root of the tree.
return t.root.getInvertedMatrix(invalidIndices, 0)
}

in which we firstly hold the read lock and after we've found the matrix we release the lock. When that code gets released to the outside world, it is no longer under the lock and if there is any concurrency in the usage of which there is, all users will be reading from stale data and susceptible to a data race because despite InsertInvertedTree holding a read-write lock, reads and iterations were not under a lock for example here

dataDecodeMatrix := r.tree.GetInvertedMatrix(invalidIndices)

Suggestion

For maximum protection and to ensure reads/writes are performed under a lock and mutable safely, I suggest this seemingly awkward API

// iterateMatrix is meant to safely mutate the matrix under concurrency.
// consumeAndOK should return false if it needs to stop the iteration; it MUST not mutate row
func (t *inversionTree) iterateMatrix(consumeAndOK func(immutableRow []byte) bool) {
    t.mutex.Lock()
    defer t.mutex.Unlock()

    for _, row := range t.matrix {
             if !consumeAndOK(row) {
                 break
             }
    }
}

func (t *inversionTree) invertedMatrixIsCached() bool {
    t.mutex.Lock()
    defer t.mutex.Unlock()

    return t.matrix != nil
}

func (t *inversionTree) matrixAt(i int) []byte {
    t.mutex.Lock()
    defer t.mutex.Unlock()

    return t.matrix[i]
}

I have found such data race bugs in other projects that crept up due to innocent usage cometbft/cometbft#2158 but cause a serious nuisance especially for highly sensitive code (in that case some people used to complain about oddities in p2p connections for 7 years and my bug finding confirmed it)

/cc @elias-orijtech @klauspost

Hi! Thanks for the report. I am on a one week holiday, so it may take a little time before I can review thoroughly.

In my recollection the inversion matrices themselves would be immutable - but I may easily be misremembering. I will look through it.

@odeke-em I fail to see where the race can take place.

GetInvertedMatrix holds a read lock as you point out only while it is looking for a cached entry. If a cached entry is returned I don't see any mutations happening to it.

InsertInvertedMatrix holds a write lock - and it is only called with "fresh" entries. While it may overwrite an entry in the cache, but it will never mutate an existing. The RW lock should adequately protect this.

I am against changing code that doesn't have a failure path that has been working for 8 years. For new code, no problem. This is not part of any exposed API.

If any of what I write above is wrong - feel free to correct me.