[Bug]: logservice crashed by "no space left on device" during regression on TKE
aressu1985 opened this issue · comments
Is there an existing issue for the same bug?
- I have checked the existing issues.
Branch Name
1.2-dev
Commit ID
Other Environment Information
- Hardware parameters:
3*CN: 16C 64G
1*DN: 16C 64G
3*LOG: 4C 16G
3*PROXY: 3C 7G
- OS type:
- Others:
Actual Behavior
job link:
https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9467464435/job/26105458866
During benchmark regression on TKE, the log service crashed by "no space left on device":
/usr/local/go/src/runtime/proc.go:271"}
{"level":"WARN","time":"2024/06/12 02:26:28.616659 +0000","caller":"fileservice/disk_cache.go:343","msg":"write disk cache error","error":"mkdir /var/lib/matrixone/data/etl-cache/fullsys/logs/2024/06/12: no space left on device"}
{"level":"INFO","time":"2024/06/12 02:26:28.616741 +0000","caller":"motrace/syncer.go:89","msg":"Wait signal done."}
panic: write /var/lib/matrixone/data/logservice-data/00000000-0000-0000-0000-000000000000/nightly-regression-dis-log-0/06166173447481204388/tandb/node-0-131072/000005.idxtmp: no space left on device
goroutine 1 gp=0xc0000081c0 m=7 mp=0xc000506008 [running]:
panic({0x3e772c0?, 0xc00841e830?})
/usr/local/go/src/runtime/panic.go:779 +0x158 fp=0xc00bb8cbb0 sp=0xc00bb8cb00 pc=0x443ab8
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x0?, 0x0?, {0x0?, 0x0?, 0xc00268a060?})
/go/pkg/mod/go.uber.org/zap@v1.24.0/zapcore/entry.go:198 +0x54 fp=0xc00bb8cbd0 sp=0xc00bb8cbb0 pc=0x6d0034
go.uber.org/zap/zapcore.(*CheckWriteAction).OnWrite(0x0?, 0x0?, {0x0?, 0x0?, 0x477a42b?})
:1 +0x2d fp=0xc00bb8cc08 sp=0xc00bb8cbd0 pc=0x6dcaad
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0026c81a0, {0x0, 0x0, 0x0})
/go/pkg/mod/go.uber.org/zap@v1.24.0/zapcore/entry.go:264 +0x24e fp=0xc00bb8cd98 sp=0xc00bb8cc08 pc=0x6d03ae
go.uber.org/zap.(*SugaredLogger).log(0xc000114008, 0x4, {0x476c49d?, 0x7f8ae870a108?}, {0xc00841e2a0?, 0xc000ddfaf0?, 0xc00bb8ce10?}, {0x0, 0x0, 0x0})
/go/pkg/mod/go.uber.org/zap@v1.24.0/sugar.go:295 +0xec fp=0xc00bb8cdd8 sp=0xc00bb8cd98 pc=0x86750c
go.uber.org/zap.(*SugaredLogger).Panicf(...)
/go/pkg/mod/go.uber.org/zap@v1.24.0/sugar.go:189
github.com/matrixorigin/matrixone/pkg/logutil.DragonboatAdaptLogger.Panicf(...)
/go/src/github.com/matrixorigin/matrixone/pkg/logutil/dragonboat.go:65
github.com/matrixorigin/matrixone/pkg/logutil.(*DragonboatAdaptLogger).Panicf(0xc000ddfad0?, {0x476c49d?, 0x418525?}, {0xc00841e2a0?, 0x3ee2340?, 0x20001?})
:1 +0x55 fp=0xc00bb8ce38 sp=0xc00bb8cdd8 pc=0xa3a6f5
github.com/lni/dragonboat/v4/logger.(*dragonboatLogger).Panicf(0xc003990af0?, {0x476c49d, 0x3}, {0xc00841e2a0, 0x1, 0x1})
/go/pkg/mod/github.com/matrixorigin/dragonboat/v4@v4.0.0-20240312080931-1b40809d7cea/logger/logger.go:132 +0x51 fp=0xc00bb8ce78 sp=0xc00bb8ce38 pc=0xa300d1
github.com/lni/dragonboat/v4.panicNow(...)
/go/pkg/mod/github.com/matrixorigin/dragonboat/v4@v4.0.0-20240312080931-1b40809d7cea/nodehost.go:2230
github.com/lni/dragonboat/v4.(*NodeHost).startShard(0xc000566408, 0x0, 0x0, 0xc00bb8d648, {0x20000, 0x0, 0x1, 0x1, 0xa, 0x1, ...}, ...)
/go/pkg/mod/github.com/matrixorigin/dragonboat/v4@v4.0.0-20240312080931-1b40809d7cea/nodehost.go:1649 +0xd88 fp=0xc00bb8d5c8 sp=0xc00bb8ce78 pc=0x1615388
github.com/lni/dragonboat/v4.(*NodeHost).StartReplica(0xc00265e808?, 0xc00297d790?, 0xbe?, 0xb0?, {0x20000, 0x0, 0x1, 0x1, 0xa, 0x1, ...})
/go/pkg/mod/github.com/matrixorigin/dragonboat/v4@v4.0.0-20240312080931-1b40809d7cea/nodehost.go:508 +0xe5 fp=0xc00bb8d6c0 sp=0xc00bb8d5c8 pc=0x160e585
github.com/matrixorigin/matrixone/pkg/logservice.(*store).startHAKeeperReplica(0xc0047fce08, 0x20000, 0x4?, 0x8?)
And the "Volume Space Usage" of logservice was continuously increasing from a point:
Maybe this was caused by some bugs in truncating log record。
Expected Behavior
No response
Steps to Reproduce
not sure
Additional information
No response
obj应该是insert产生的。还在找复现方法。
这个没有复现,先降级,DELAY到1.2.2
还没复现