matrixorigin / matrixone

Hyperconverged cloud-edge native database

Home Page:https://docs.matrixorigin.cn/en

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: [0605 big data regression] insert into select oom.

Ariznawlll opened this issue · comments

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch Name

main

Commit ID

43ebd75

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job url:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9366247347/job/25799726334 (load and insert test-> insert into select)

image image image

log:https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22vSV%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240604%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-12h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1

profile:
2024-06-04_16_56_47.zip
2024-06-04_16_55_08.zip

Expected Behavior

No response

Steps to Reproduce

trigger big data test on tke.
If you need some test details, please contact me.

Additional information

No response

@ouyuanning 麻烦看一下

辛苦锦赛统一看看吧

无进展

上面issue message中的profile文件夹中没有oom节点的pprof信息, 魏璐帮忙找了一下grafana抓的几天前跑的结果, 根据
https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9520234425/job/26245691714
的时间戳 Fri, 14 Jun 2024 22:19:17 GMT, 用grafana看这个时间段前10s的inuse space和alloc space

  • inuse space :
Pasted Graphic 4
  • alloc space :
Pasted Graphic 5

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22cg0%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240614%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22profileTypeId%22:%22memory:inuse_space:bytes:space:bytes%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D%7D%5D,%22range%22:%7B%22from%22:%221718403547000%22,%22to%22:%221718403557000%22%7D%7D%7D&schemaVersion=1&orgId=1


  • 向魏璐确认了一下配置是14c 55g, 不理解为什么这里的单位是TB,
  • 从cpu结果上看类型转换的开销比较大? 不知道为什么没有去重的开销, 以往经验去重开销都蛮大的..
  • 以及看需要向徐鹏哥请教一下inuse space的内存占用是否合理

还需要继续优化, 现在有最新的问题 : #17143 (comment)

1.2-dev上的fix, 等1.2.2打了tag之后合并

等待合并

最近没有再出现这个问题,先关闭