[Bug]: cn crashed by fatal "wait latest commit ts failed" during statbility test on distributed mode

Question

[Bug]: cn crashed by fatal "wait latest commit ts failed" during statbility test on distributed mode

aressu1985 opened this issue 5 months ago · comments

aressu1985 commented 5 months ago

Is there an existing issue for the same bug?

I have checked the existing issues.

Branch Name

1.2-dev

Commit ID

e6b2868

Other Environment Information

- Hardware parameters:
3*CN: 16C 64G
1*DN: 16C 64G
3*LOG: 4C 16G
2*PROXY: 3C 6G
- OS type:
- Others:

Actual Behavior

During statbility test on distributed mode, cn was crashed by fatal :
{"level":"FATAL","time":"2024/06/05 22:09:44.081179 +0000","name":"cn-service.txn","caller":"client/client.go:434","msg":"wait latest commit ts failed","uuid":"65393636-3165-6662-6631-633163326338","error":"waiter is paused","stacktrace":"github.com/matrixorigin/matrixone/pkg/txn/client.(*txnClient).SyncLatestCommitTS\n\t/go/src/github.com/matrixorigin/matrixone/pkg/txn/client/client.go:434\ngithub.com/matrixorigin/matrixone/pkg/sql/compile.(*sqlExecutor).maybeWaitCommittedLogApplied\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/sql_executor.go:154\ngithub.com/matrixorigin/matrixone/pkg/sql/compile.(*sqlExecutor).ExecTxn\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/sql_executor.go:144\ngithub.com/matrixorigin/matrixone/pkg/incrservice.(*sqlStore).Allocate\n\t/go/src/github.com/matrixorigin/matrixone/pkg/incrservice/store_sql.go:160\ngithub.com/matrixorigin/matrixone/pkg/incrservice.(*allocator).doAllocate\n\t/go/src/github.com/matrixorigin/matrixone/pkg/incrservice/allocator.go:164\ngithub.com/matrixorigin/matrixone/pkg/incrservice.(*allocator).run\n\t/go/src/github.com/matrixorigin/matrixone/pkg/incrservice/allocator.go:151\ngithub.com/matrixorigin/matrixone/pkg/common/stopper.(*Stopper).doRunCancelableTask.func1\n\t/go/src/github.com/matrixorigin/matrixone/pkg/common/stopper/stopper.go:277"}

mo-log:
https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22Jyy%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-e6b2868-20240605224953%5C%22%7D%20%7C%3D%20%60FATAL%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221717623352647%22,%22to%22:%221717626935791%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

1. run a mo cluster with config in this issue
2. run tpch 10G loop test processes in one independant tenant
3. run tpcc 10 warehouse and 10 ternimals longrunnig test processes in one independant tenant, prepare mode
4. run sysbench mixed cases(insert/delete/update/select) longrunnig test processes with 75 terminals in one independant tenant,non-prepare mode
5. run another sysbench mixed cases(insert/delete/update/select) longrunnig test processe with  75 terminals in one independant tenant,non-prepare mode