read-after-free panic on M3DB=v1.3.0

Question

read-after-free panic on M3DB=v1.3.0

ful09003 opened this issue 3 years ago · comments

Hello M3DB team!

During 'normal' operation of M3DB (tag v1.3.0, running in Docker) I experienced a runtime panic, and thought to file an issue in case it's noteworthy for your team. If anyone has a great lead on what may have happened, I'd appreciate that very much!

Stack trace and requested details for a new issue are below.

panic: read after free: reads=1, ref=-2147483648

goroutine 6732 [running]:
github.com/m3db/m3/src/x/checked.defaultPanic(0x2321340, 0xc92791c000)
	/go/src/github.com/m3db/m3/src/x/checked/debug.go:134 +0x3e
github.com/m3db/m3/src/x/checked.panicRef(0xc00089cb90, 0x2321340, 0xc92791c000)
	/go/src/github.com/m3db/m3/src/x/checked/debug.go:143 +0x49
github.com/m3db/m3/src/x/checked.(*RefCount).IncReads(0xc00089cb90)
	/go/src/github.com/m3db/m3/src/x/checked/ref.go:144 +0x11b
github.com/m3db/m3/src/x/checked.(*bytesRef).Bytes(0xc00089cb90, 0xc50ca4ff40, 0x36, 0x40)
	/go/src/github.com/m3db/m3/src/x/checked/bytes.go:83 +0x2d
github.com/m3db/m3/src/dbnode/ts.(*Segment).CalculateChecksum(0xc1bd87c4c0, 0x1)
	/go/src/github.com/m3db/m3/src/dbnode/ts/segment.go:69 +0x7b
github.com/m3db/m3/src/dbnode/persist/fs.persistSegment(0xc3daf17440, 0x57, 0x57, 0xc3daf33440, 0x3, 0x3, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/m3db/m3/src/dbnode/persist/fs/merger.go:375 +0x45
github.com/m3db/m3/src/dbnode/persist/fs.persistIter(0xc3daf17440, 0x57, 0x57, 0xc3daf33440, 0x3, 0x3, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/m3db/m3/src/dbnode/persist/fs/merger.go:355 +0x338
github.com/m3db/m3/src/dbnode/persist/fs.persistSegmentReaders(0xc3daf17440, 0x57, 0x57, 0xc3daf33440, 0x3, 0x3, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/m3db/m3/src/dbnode/persist/fs/merger.go:330 +0x1a5
github.com/m3db/m3/src/dbnode/persist/fs.(*merger).Merge.func2(0xc3daf17440, 0x57, 0x57, 0xc3daf33440, 0x3, 0x3, 0x0, 0x0, 0x16a7b6742b0fa000, 0x16b5b4f87f3f33e9, ...)
	/go/src/github.com/m3db/m3/src/dbnode/persist/fs/merger.go:242 +0x308
github.com/m3db/m3/src/dbnode/storage.(*fsMergeWithMem).ForEachRemaining(0xc5dddf0bc0, 0x2379040, 0xc5c43f62a0, 0x16a7b6742b0fa000, 0xc538cf6fa0, 0x236a4d0, 0xc0057a3230, 0x0, 0x0, 0xc000118070, ...)
	/go/src/github.com/m3db/m3/src/dbnode/storage/fs_merge_with_mem.go:137 +0x2e4
github.com/m3db/m3/src/dbnode/persist/fs.(*merger).Merge(0xc3a38add40, 0x0, 0x236a4d0, 0xc0057a3230, 0x16a7b6742b0fa000, 0x20, 0x0, 0x23424c0, 0xc5dddf0bc0, 0x1, ...)
	/go/src/github.com/m3db/m3/src/dbnode/persist/fs/merger.go:235 +0x105b
github.com/m3db/m3/src/dbnode/storage.(*dbShard).ColdFlush(0xc048ee1b00, 0x23423a8, 0xc0491a7800, 0xda49fe7080, 0xda49fe70e0, 0xc8cd59d140, 0x237c048, 0xc4da2c4000, 0x236a4d0, 0xc0057a3230, ...)
	/go/src/github.com/m3db/m3/src/dbnode/storage/shard.go:2389 +0xb7a
github.com/m3db/m3/src/dbnode/storage.(*dbNamespace).ColdFlush(0xc048e2e000, 0x23423a8, 0xc0491a7800, 0x0, 0x0)
	/go/src/github.com/m3db/m3/src/dbnode/storage/namespace.go:1350 +0x57c
github.com/m3db/m3/src/dbnode/storage.(*coldFlushManager).coldFlush(0xc04908f700, 0x2361ed8, 0xc00d74bda0)
	/go/src/github.com/m3db/m3/src/dbnode/storage/coldflush.go:190 +0x14d
github.com/m3db/m3/src/dbnode/storage.(*coldFlushManager).trackedColdFlush(0xc04908f700, 0x16b5b4ebeb568148, 0x0)
	/go/src/github.com/m3db/m3/src/dbnode/storage/coldflush.go:167 +0x67
github.com/m3db/m3/src/dbnode/storage.(*coldFlushManager).Run(0xc04908f700, 0x16b5b4ebeb568148, 0x20f2200)
	/go/src/github.com/m3db/m3/src/dbnode/storage/coldflush.go:128 +0x38c
github.com/m3db/m3/src/dbnode/storage.(*mediator).runColdFlushProcesses(0xc0057fc160)
	/go/src/github.com/m3db/m3/src/dbnode/storage/mediator.go:327 +0x4b
github.com/m3db/m3/src/dbnode/storage.(*mediator).ongoingColdFlushProcesses(0xc0057fc160)
	/go/src/github.com/m3db/m3/src/dbnode/storage/mediator.go:279 +0x31
github.com/m3db/m3/src/dbnode/storage.FileOpsProcessFn.Start(0xc0490f58e0)
	/go/src/github.com/m3db/m3/src/dbnode/storage/types.go:988 +0x25
created by github.com/m3db/m3/src/dbnode/storage.(*mediator).Open
	/go/src/github.com/m3db/m3/src/dbnode/storage/mediator.go:169 +0xf1

Container logs on this node were already set to debug, though unfortunately yielded nothing of note:

2021-11-08 22:58:56.773506215 +0000 UTC m=+446827.730970196 Unsafe CheckedEntry re-use near Entry {Level:debug Time:2021-11-08 22:58:56.773506215 +0000 UTC m=+446827.730970196 LoggerName: Message:cold flush run Caller:undefined Stack:}.
{"level":"debug","ts":1636412650.8216176,"msg":"cold flush run","status":"starting cold flush","time":1636412650.8211389}
2021-11-08 23:04:10.82161768 +0000 UTC m=+447141.779081671 Unsafe CheckedEntry re-use near Entry {Level:debug Time:2021-11-08 23:04:10.82161768 +0000 UTC m=+447141.779081671 LoggerName: Message:cold flush run Caller:undefined Stack:}.
{"level":"debug","ts":1636412964.9778001,"msg":"cold flush run","status":"starting cold flush","time":1636412964.9776723}

General Issues

What service is experiencing the issue? (M3Coordinator, M3DB, M3Aggregator, etc): M3DB (potentially - read on).
What is the configuration of the service? Please include any YAML files, as well as namespace / placement configuration (with any sensitive information anonymized if necessary).

m3db config:

db:
  logging:
    level: debug
  hostID:
    resolver: config
    value: ${M3DB_HOST_ID:""}
  discovery:
    type: m3db_cluster
    m3dbCluster:
      env: default_env
      zone: "embedded"
      endpoints:
        - etcd01-fqdn:2379
        - etcd02-fqdn:2379
        - etcd03-fqdn:2379
  metrics:
    prometheus:
      handlerPath: "/metrics"
      listenAddress: "0.0.0.0:9004"
      onError: none
    sanitization: prometheus
    samplingRate: 1.0
    extended: detailed

namespace config:

{
    "name": "default",
    "options": {
      "bootstrapEnabled": true,
      "flushEnabled": true,
      "writesToCommitLog": true,
      "cleanupEnabled": true,
      "snapshotEnabled": true,
      "repairEnabled": false,
      "retentionOptions": {
        "retentionPeriodDuration": "8766h",
        "blockSizeDuration": "1h",
        "bufferFutureDuration": "10m",
        "bufferPastDuration": "1m",
        "blockDataExpiry": true
      },
      "coldWritesEnabled": true,
      "indexOptions": {
        "enabled": true,
        "blockSizeDuration": "1h"
      },
      "aggregationOptions": {
        "aggregations": [
          { "aggregated": false }
        ]
      }
    }
}

I am running M3DB as a Docker container on three separate bare-metal machines (64 cores/512GB RAM each). For brevity, these may be called node0[1-3]. In particular, the M3DB instance referenced to as node02 throughout the remainder of this issue is the node on which I experienced this panic. I am assuming M3DB on node02 was the component to experience this panic, though am not familiar enough with the m3db/m3coordinator codebase to be certain.

node01 runs a second service responsible for generating historic data representative of a future dataset, and this data is loaded into M3DB exclusively using cold writes targeted to node01's embedded m3coordinator Prometheus endpoint, e.g. http://localhost:7201/api/v1/prom/remote/write.

node02 and node03 are simply cluster members; while they also have embedded m3coordinator enabled, no read/write activity is directed towards those nodes.

All three nodes are coordinated via an external etcd cluster.

How are you using the service? For example, are you performing read/writes to the service via Prometheus, or are you using a custom script?

Write-heavy (cold writes targeting node01's embedded m3coordinator) to its Prometheus remote-write endpoint
Very few reads (any reads are done via node01's embedded m3coordinator, wired up to Grafana and targeting the Prometheus-compatible endpoint)
About 1 invocation/day of the /debug/dump endpoint on node01.

Is there a reliable way to reproduce the behavior? If so, please provide detailed instructions.

Unfortunately, none that I could provide. During the time of the panic, I was attempting to access the /debug/dump API on node01, node02, and node03 to gather heap details. I unfortunately do not have the timestamp for when I requested /debug/dump on node02, but it was definitely "near" the panic. Neither node01 or node03 experienced the same symptom - node01 was the only node which I had gathered debug information for.

CPU / Heap Profiles

I was ultimately unable to gather a profile of node02 - I could provide the profile for node01 but I don't believe that would be beneficial. Happy to provide it if you'd like, however :)

Wesley Kim · Answer 1 · Thu Nov 18 2021 05:31:49 GMT+0800 (China Standard Time)

@ful09003 Thanks for reporting! We're taking a look, in the meantime, has this been reproducing reliably since?

Michael Fuller · Answer 2 · Thu Nov 18 2021 07:37:47 GMT+0800 (China Standard Time)

@ful09003 Thanks for reporting! We're taking a look, in the meantime, has this been reproducing reliably since?

You're welcome! Regrettably, this was not been reproducible. Since filing this issue, I've also torn down my M3DB cluster where this was observed. It will be no hurt to me if this is closed, and I'll let you know if I can ever reproduce it!

Wesley Kim · Answer 3 · Thu Nov 18 2021 07:39:45 GMT+0800 (China Standard Time)

We are in the process of cutting a new release, I'd say we close this and re-open if you do trigger a re-pro!