Flaky TestLabelNames_Cancelled
pracucci opened this issue · comments
Marco Pracucci commented
In this CI run I've seen TestLabelNames_Cancelled
being flaky:
--- FAIL: TestLabelNames_Cancelled (0.07s)
bucket_test.go:2909: Creating 2 1-sample series with 1ms interval in /tmp/TestLabelNames_Cancelled3562416338/001/0
bucket_test.go:2909: Creating 2 1-sample series with 1ms interval in /tmp/TestLabelNames_Cancelled3562416338/001/1
testing.go:1231: TempDir RemoveAll cleanup: unlinkat /tmp/TestLabelNames_Cancelled3562416338/001: directory not empty
level=info msg="created in-memory index cache" maxItemSizeBytes=13421[77](https://github.com/grafana/mimir/actions/runs/9585997214/job/26433070203?pr=8424#step:8:78)28 maxSizeBytes=1073741824 maxItems=maxInt
level=info msg="ring doesn't exist in KV store yet"
level=info msg="instance not found in the ring" instance=test ring=store-gateway
level=info msg="not loading tokens from file, tokens file path is empty"
level=info msg="waiting until store-gateway is JOINING in the ring"
level=info msg="store-gateway is JOINING in the ring"
level=info msg="synchronizing TSDB blocks for all users"
level=warn msg="failed to synchronize TSDB blocks" err="assert.AnError general error for testing"
level=info msg="ring lifecycler is shutting down" ring=store-gateway
level=info msg="unregistering instance from ring" ring=store-gateway
level=info msg="instance removed from the ring" ring=store-gateway
FAIL
FAIL github.com/grafana/mimir/pkg/storegateway 374.966s
Vladimir Varankin commented
testing.go:1231: TempDir RemoveAll cleanup: unlinkat /tmp/TestLabelNames_Cancelled3562416338/001: directory not empty```
I will have a look later but from the first look, it might be that there is a race between a clean-up in testing.T.TmpDir
and the internals of BucketStore.RemoveBlocksAndClose
. The latter doesn't wait for the goroutine inside snapshotter (and indexReaderPool) to actually stop, so the goroutine can write a lazy-loaded
index, while the bucket's directory is being cleaned out by the test.