longhorn / longhorn

Cloud-Native distributed storage built on and for Kubernetes

Home Page:https://longhorn.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to debug what's causing the pods to fail on every backup

DamiaSan opened this issue · comments

Discussed in #8765

Originally posted by mike-code June 15, 2024
Hi there.
I have a scheduled (hourly) backup of one of the volumes, but every time the backup happens, my pod is getting Killed and recreated. Below there's an (long) event list.. the PVC at question is 78832860 and the pod is mysql-0.

Which longhorn pod should I look at for errors? I couldn't find anything in particular which would explain this behaviour

longhorn-system                          2m54s                               Normal                           SuccessfulCreate                                 cronjob/gcp-storage-backup
longhorn-system                          2m54s                               Normal                           SuccessfulCreate                                 job/gcp-storage-backup-28640220
longhorn-system                          2m54s                               Normal                           Scheduled                                        pod/gcp-storage-backup-28640220-qw8kl
longhorn-system                          2m54s                               Normal                           Started                                          pod/gcp-storage-backup-28640220-qw8kl
longhorn-system                          2m54s                               Normal                           Created                                          pod/gcp-storage-backup-28640220-qw8kl
longhorn-system                          2m54s                               Normal                           Pulled                                           pod/gcp-storage-backup-28640220-qw8kl
longhorn-system                          2m52s                               Normal                           SnapshotUpdate                                   snapshot/gcp-stor-f38a9634-5c92-4ca1-848b-9539df27e71f
longhorn-system                          2m52s                               Normal                           SnapshotCreate                                   snapshot/gcp-stor-f38a9634-5c92-4ca1-848b-9539df27e71f
longhorn-system                          2m52s                               Normal                           SnapshotCreate                                   snapshot/gcp-stor-c0237c45-eda0-4cb0-a7e5-678e2eaebed0
longhorn-system                          2m52s                               Normal                           SnapshotUpdate                                   snapshot/gcp-stor-c0237c45-eda0-4cb0-a7e5-678e2eaebed0
longhorn-system                          2m52s                               Normal                           SnapshotUpdate                                   snapshot/gcp-stor-b47e2c4f-9872-42df-a014-85b317aea8c3
longhorn-system                          2m52s                               Normal                           SnapshotCreate                                   snapshot/gcp-stor-b47e2c4f-9872-42df-a014-85b317aea8c3
longhorn-system                          2m52s                               Normal                           SnapshotUpdate                                   snapshot/gcp-stor-08020428-2b4f-452e-b5cd-1af45b46aabb
longhorn-system                          2m52s                               Normal                           SnapshotCreate                                   snapshot/gcp-stor-08020428-2b4f-452e-b5cd-1af45b46aabb
longhorn-system                          2m37s                               Normal                           Stop                                             replica/pvc-78832860-9165-4e80-9951-795aa5f5d067-r-b42223ce
longhorn-system                          2m37s                               Warning                          Faulted                                          engine/pvc-78832860-9165-4e80-9951-795aa5f5d067-e-0
longhorn-system                          2m37s                               Warning                          SnapshotError                                    snapshot/gcp-stor-f38a9634-5c92-4ca1-848b-9539df27e71f
longhorn-system                          2m37s                               Warning                          SnapshotUpdate                                   snapshot/gcp-stor-f38a9634-5c92-4ca1-848b-9539df27e71f
longhorn-system                          2m37s                               Warning                          SnapshotUpdate                                   snapshot/gcp-stor-a1da568c-fabc-4838-9de6-378dfc13b3b1
longhorn-system                          2m37s                               Warning                          SnapshotError                                    snapshot/gcp-stor-a1da568c-fabc-4838-9de6-378dfc13b3b1
longhorn-system                          2m36s                               Warning                          AutoSalvaged                                     volume/pvc-78832860-9165-4e80-9951-795aa5f5d067
longhorn-system                          2m36s                               Normal                           Detached                                         volume/pvc-78832860-9165-4e80-9951-795aa5f5d067
longhorn-system                          2m36s                               Normal                           Remount                                          volume/pvc-78832860-9165-4e80-9951-795aa5f5d067
longhorn-system                          2m36s                               Normal                           Start                                            replica/pvc-78832860-9165-4e80-9951-795aa5f5d067-r-b42223ce
longhorn-system                          2m36s                               Normal                           Stop                                             engine/pvc-78832860-9165-4e80-9951-795aa5f5d067-e-0
longhorn-system                          2m35s                               Normal                           Start                                            engine/pvc-78832860-9165-4e80-9951-795aa5f5d067-e-0
longhorn-system                          2m35s                               Normal                           Completed                                        volume/pvc-9a827766-dede-4281-9c1d-afad9238e77c
longhorn-system                          2m33s                               Normal                           Attached                                         volume/pvc-78832860-9165-4e80-9951-795aa5f5d067
longhorn-system                          2m33s                               Normal                           Completed                                        volume/pvc-3b3fcab9-2571-4da2-9a9d-c9ea5feff9e0
longhorn-system                          2m31s                               Normal                           Completed                                        volume/pvc-6759015d-1a6a-4f8e-be32-7f4d54e29398
longhorn-system                          2m31s                               Warning                          SnapshotUpdate                                   snapshot/gcp-stor-2c060974-983c-421e-84e9-650794108c9d
longhorn-system                          2m30s                               Warning                          SnapshotUpdate                                   snapshot/gcp-stor-ed54754e-8d19-40fd-b758-159e10c38a11
longhorn-system                          2m30s                               Warning                          SnapshotUpdate                                   snapshot/gcp-stor-972ac1c0-c3bb-419c-98ce-b816a9014c0b
longhorn-system                          2m28s                               Normal                           SnapshotUpdate                                   snapshot/gcp-stor-f38a9634-5c92-4ca1-848b-9539df27e71f
longhorn-system                          2m28s                               Normal                           SnapshotUpdate                                   snapshot/gcp-stor-a1da568c-fabc-4838-9de6-378dfc13b3b1
applications                             2m24s                               Warning                          Unhealthy                                        pod/mysql-0
applications                             2m24s                               Normal                           Killing                                          pod/mysql-0
applications                             2m21s                               Warning                          RecreatingFailedPod                              statefulset/mysql
applications                             2m21s                               Normal                           SuccessfulDelete                                 statefulset/mysql
applications                             2m21s                               Normal                           Scheduled                                        pod/mysql-0
applications                             2m20s                               Normal                           Started                                          pod/mysql-0
applications                             2m20s                               Normal                           Created                                          pod/mysql-0
applications                             2m20s                               Normal                           Pulled                                           pod/mysql-0
applications                             2m19s                               Normal                           Created                                          pod/mysql-0
applications                             2m19s                               Normal                           Pulled                                           pod/mysql-0
applications                             2m18s                               Normal                           Started                                          pod/mysql-0
```</div>

cannot check job log and longhorn-manager log?

cc @ChanYiLin

commented

@derekbit

I've narrowed it down by turning off recurring backups, but the issue is still present. The backups might have been a coincidence because they've ran every hour (at 0 minutes mark) while this issue happens every 30 minutes (:00 and :30 minutes mark).

The 8s timeout thing appears in all logs, though I can't figure out what's the culprit of this issue. The instance manager log says it couldn't read from the tcp://10.42.0.225:10000 (which is IP address of the very same instance manager pod). The k3s log says something about being unable to read/write to its database (slow queries).

The dmesg log says about faulty block devices. The dev sdf device, this log mentions, does not exist right now. I assume it's some kind of temporary drive that spawns during some periodic event which happens every 30 minutes?

The k8s StorageClass used in all PVCs

---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-persistent
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "1"
  fromBackup: ""
  fsType: "ext4"

Attaching:

instance-manager POD log
2024-06-18T14:30:12.652889559+02:00 [pvc-78832860-9165-4e80-9951-795aa5f5d067-e-0] time="2024-06-18T12:30:12Z" level=error msg="R/W Timeout. No response received in 8s" func="dataconn.(*Client).loop" file="client.go:148"
2024-06-18T14:30:12.653003057+02:00 response_process: Receive error for response 3 of seq 11894
2024-06-18T14:30:12.662595124+02:00 tgtd: bs_longhorn_request(97) fail to write at 268435456 for 4096
2024-06-18T14:30:12.662610595+02:00 tgtd: bs_longhorn_request(210) io error 0x21d0a10 2a -14 4096 268435456, Success
2024-06-18T14:30:12.662612384+02:00 response_process: Receive error for response 3 of seq 11895
2024-06-18T14:30:12.662613736+02:00 tgtd: bs_longhorn_request(97) fail to write at 268435456 for 4096
2024-06-18T14:30:12.662615047+02:00 tgtd: bs_longhorn_request(210) io error 0x21d0a10 2a -14 4096 268435456, Success
2024-06-18T14:30:12.670567351+02:00 [pvc-78832860-9165-4e80-9951-795aa5f5d067-e-0] time="2024-06-18T12:30:12Z" level=error msg="Setting replica tcp://10.42.0.225:10000 to ERR" func="controller.(*Controller).handleErrorNoLock" file="control.go:1110" error="tcp://10.42.0.225:10000: r/w timeout"
2024-06-18T14:30:12.670581158+02:00 time="2024-06-18T12:30:12Z" level=info msg="Setting replica tcp://10.42.0.225:10000 to mode ERR" func="controller.(*Controller).setReplicaModeNoLock" file="control.go:463"
2024-06-18T14:30:12.670582586+02:00 time="2024-06-18T12:30:12Z" level=error msg="I/O error" func="controller.(*Controller).handleErrorNoLock" file="control.go:1129" error="tcp://10.42.0.225:10000: r/w timeout"
2024-06-18T14:30:12.670584088+02:00 time="2024-06-18T12:30:12Z" level=info msg="Monitoring stopped tcp://10.42.0.225:10000" func="controller.(*Controller).monitoring" file="control.go:1214"
2024-06-18T14:30:12.670585626+02:00 Jun 18 12:00:59.892882: ->10.42.0.225:10001 W[ 64kB]      253us
2024-06-18T14:30:12.670586945+02:00 Jun 18 12:00:59.892890: ->10.42.0.225:10001 W[ 16kB]      304us
2024-06-18T14:30:12.670588232+02:00 Jun 18 12:00:59.893340: ->10.42.0.225:10001 W[  8kB]      137us
2024-06-18T14:30:12.670589566+02:00 Jun 18 12:00:59.893609: ->10.42.0.225:10001 W[  4kB]      147us
2024-06-18T14:30:12.670590840+02:00 Jun 18 12:00:59.893958: ->10.42.0.225:10001 W[ 16kB]      137us

[...........]

2024-06-18T14:30:12.676404801+02:00 Jun 18 12:30:08.654114: ->10.42.0.225:10001 P[  0kB]      202us
2024-06-18T14:30:12.676406061+02:00 Jun 18 12:30:10.654103: ->10.42.0.225:10001 P[  0kB]      176us
2024-06-18T14:30:12.676407568+02:00 Jun 18 12:30:03.901872: ->10.42.0.225:10001 W[  4kB]  8750903us failed
2024-06-18T14:30:12.676409064+02:00 time="2024-06-18T12:30:12Z" level=error msg="I/O error" func="controller.(*Controller).handleErrorNoLock" file="control.go:1129" error="no backend available"
2024-06-18T14:30:12.676410565+02:00 time="2024-06-18T12:30:12Z" level=warning msg="Received response message id 0 seq 12786 type 2 for non pending request" func="dataconn.(*Client).loop" file="client.go:173"
2024-06-18T14:30:17.656238634+02:00 [longhorn-instance-manager] time="2024-06-18T12:30:17Z" level=info msg="Getting instance log" func="instance.(*Server).InstanceLog" file="instance.go:397" dataEngine=DATA_ENGINE_V1 name=pvc-78832860-9165-4e80-9951-795aa5f5d067-r-b42223ce type=re
plica
2024-06-18T14:30:17.656680488+02:00 [longhorn-instance-manager] time="2024-06-18T12:30:17Z" level=info msg="Process Manager: start getting logs for process pvc-78832860-9165-4e80-9951-795aa5f5d067-r-b42223ce" func="process.(*Manager).ProcessLog" file="process_manager.go:338"
2024-06-18T14:30:17.781751069+02:00 [longhorn-instance-manager] time="2024-06-18T12:30:17Z" level=info msg="Process Manager: got logs for process pvc-78832860-9165-4e80-9951-795aa5f5d067-r-b42223ce" func="process.(*Manager).ProcessLog" file="process_manager.go:355"
2024-06-18T14:30:17.820842601+02:00 [longhorn-instance-manager] time="2024-06-18T12:30:17Z" level=info msg="Getting instance log" func="instance.(*Server).InstanceLog" file="instance.go:397" dataEngine=DATA_ENGINE_V1 name=pvc-78832860-9165-4e80-9951-795aa5f5d067-e-0 type=engine
2024-06-18T14:30:17.821283768+02:00 [longhorn-instance-manager] time="2024-06-18T12:30:17Z" level=info msg="Process Manager: start getting logs for process pvc-78832860-9165-4e80-9951-795aa5f5d067-e-0" func="process.(*Manager).ProcessLog" file="process_manager.go:338"
2024-06-18T14:30:17.827509705+02:00 [longhorn-instance-manager] time="2024-06-18T12:30:17Z" level=info msg="Deleting instance" func="instance.(*Server).InstanceDelete" file="instance.go:174" cleanupRequired=true dataEngine=DATA_ENGINE_V1 diskUuid=a756b422-6dac-4b6f-bb8f-6d4d98b466
c8 name=pvc-78832860-9165-4e80-9951-795aa5f5d067-r-b42223ce type=replica
2024-06-18T14:30:17.827532149+02:00 [longhorn-instance-manager] time="2024-06-18T12:30:17Z" level=info msg="Process Manager: prepare to delete process pvc-78832860-9165-4e80-9951-795aa5f5d067-r-b42223ce" func="process.(*Manager).ProcessDelete" file="process_manager.go:230"
k3s log
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:03.524477902 +0200 CEST m=+521992.501768635) (total time: 8.14273359s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500 : [[% 1529469]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:03.685309152 +0200 CEST m=+521992.662599903) (total time: 7.982377818s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/external-resizer-driver-longhorn-io false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:09.79684539 +0200 CEST m=+521998.774136149) (total time: 1.886316475s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/longhorn.io/backingimagemanagers/% false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:10.156180023 +0200 CEST m=+521999.133470775) (total time: 1.528673285s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/roles/% false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:03.28640094 +0200 CEST m=+521992.263691684) (total time: 8.402157844s): INSERT INTO kine(name, created, deleted, create_revision, prev_revision, lease, value, old_value) values(?, ?, ?, ?, ?, ?, ?, ?) : [[/registry/masterleases/10.6.100.50 0 0 1488849 1529448 15 [107 56 115 0 10 15 10 2 118 49 18 9 69 110 100 112 111 105 110 116 115 18 38 10 17 10 0 18 0 26 0 34 0 42 0 50 0 56 213 12 66 0 18 17 10 15 10 11 49 48 46 54 46 49 48 48 46 53 48 26 0 26 0 34 0] [107 56 115 0 10 15 10 2 118 49 18 9 69 110 100 112 111 105 110 116 115 18 38 10 17 10 0 18 0 26 0 34 0 42 0 50 0 56 212 12 66 0 18 17 10 15 10 11 49 48 46 54 46 49 48 48 46 53 48 26 0 26 0 34 0]]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:04.968096536 +0200 CEST m=+521993.945387292) (total time: 6.720976313s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/leases/% false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:08.286632417 +0200 CEST m=+521997.263923151) (total time: 3.403848403s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC LIMIT 1 : [[/registry/masterleases/10.6.100.50 true]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:06.647026973 +0200 CEST m=+521995.624317717) (total time: 5.048602224s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/storageclasses/% false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:05.623484717 +0200 CEST m=+521994.600775455) (total time: 6.076301199s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/external-snapshotter-leader-driver-longhorn-io false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:05.258554184 +0200 CEST m=+521994.235844937) (total time: 6.442665475s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/external-resizer-driver-longhorn-io false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:03.919932182 +0200 CEST m=+521992.897222931) (total time: 7.785392846s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC LIMIT 1 : [[/registry/leases/operations/acc50d8e.haproxy.com false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:09.583278574 +0200 CEST m=+521998.560569330) (total time: 2.123930445s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/longhorn.io/orphans/% false]]"
k3s[547]: I0618 16:30:11.712609     547 trace.go:236] Trace[1987240832]: "Update" accept:application/json, */*,audit-id:bd94380d-596d-4798-b99a-e08786d48a60,client:10.42.0.60,api-group:coordination.k8s.io,api-version:v1,name:acc50d8e.haproxy.com,subresource:,namespace:operations,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/operations/leases/acc50d8e.haproxy.com,user-agent:haproxy-operator/v0.0.0 (linux/amd64) kubernetes/$Format/leader-election,verb:PUT (18-Jun-2024 16:30:03.919) (total time: 7793ms):
k3s[547]: Trace[1987240832]: ["GuaranteedUpdate etcd3" audit-id:bd94380d-596d-4798-b99a-e08786d48a60,key:/leases/operations/acc50d8e.haproxy.com,type:*coordination.Lease,resource:leases.coordination.k8s.io 7793ms (16:30:03.919)
k3s[547]: Trace[1987240832]:  ---"Txn call completed" 7792ms (16:30:11.712)]
k3s[547]: Trace[1987240832]: [7.793199703s] [7.793199703s] END
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:09.656290617 +0200 CEST m=+521998.633581376) (total time: 2.05141165s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/external-snapshotter-leader-driver-longhorn-io false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:06.541237825 +0200 CEST m=+521995.518528575) (total time: 5.192818101s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/driver-longhorn-io false]]"
k3s[547]: I0618 16:30:11.712733     547 trace.go:236] Trace[2102565097]: "GuaranteedUpdate etcd3" audit-id:,key:/masterleases/10.6.100.50,type:*v1.Endpoints,resource:apiServerIPInfo (18-Jun-2024 16:30:03.284) (total time: 8428ms):
k3s[547]: Trace[2102565097]: ---"Txn call completed" 8427ms (16:30:11.712)
k3s[547]: Trace[2102565097]: [8.428230545s] [8.428230545s] END
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:04.624767343 +0200 CEST m=+521993.602058098) (total time: 7.087399063s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/external-snapshotter-leader-driver-longhorn-io false]]"
k3s[547]: I0618 16:30:11.712879     547 trace.go:236] Trace[2057264783]: "Get" accept:application/json, */*,audit-id:8ef70e42-1093-494e-931c-08240332e305,client:10.42.0.219,api-group:coordination.k8s.io,api-version:v1,name:external-resizer-driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/external-resizer-driver-longhorn-io,user-agent:csi-resizer/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:03.685) (total time: 8027ms):
k3s[547]: Trace[2057264783]: ---"About to write a response" 8027ms (16:30:11.712)
k3s[547]: Trace[2057264783]: [8.027761628s] [8.027761628s] END
k3s[547]: I0618 16:30:11.713000     547 trace.go:236] Trace[282740605]: "Update" accept:application/vnd.kubernetes.protobuf,application/json,audit-id:e6acfa33-72ff-46c5-ad8b-efc526543d64,client:127.0.0.1,api-group:coordination.k8s.io,api-version:v1,name:tw1,subresource:,namespace:kube-node-lease,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/tw1,user-agent:k3s/v1.29.2+k3s1 (linux/amd64) kubernetes/86f1021,verb:PUT (18-Jun-2024 16:30:11.203) (total time: 509ms):
k3s[547]: Trace[282740605]: ["GuaranteedUpdate etcd3" audit-id:e6acfa33-72ff-46c5-ad8b-efc526543d64,key:/leases/kube-node-lease/tw1,type:*coordination.Lease,resource:leases.coordination.k8s.io 509ms (16:30:11.203)
k3s[547]: Trace[282740605]:  ---"Txn call completed" 508ms (16:30:11.712)]
k3s[547]: Trace[282740605]: [509.235027ms] [509.235027ms] END
k3s[547]: I0618 16:30:11.713799     547 trace.go:236] Trace[225275297]: "Get" accept:application/json, */*,audit-id:1b3a8300-34e7-4288-8504-ec5f3990e4cb,client:10.42.0.223,api-group:coordination.k8s.io,api-version:v1,name:external-snapshotter-leader-driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/external-snapshotter-leader-driver-longhorn-io,user-agent:csi-snapshotter/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:05.623) (total time: 6090ms):
k3s[547]: Trace[225275297]: ---"About to write a response" 6090ms (16:30:11.713)
k3s[547]: Trace[225275297]: [6.090532006s] [6.090532006s] END
k3s[547]: I0618 16:30:11.725955     547 trace.go:236] Trace[540331988]: "Get" accept:application/json, */*,audit-id:c08e2398-2a9a-4e49-ab43-f35b3499d465,client:10.42.0.220,api-group:coordination.k8s.io,api-version:v1,name:external-resizer-driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/external-resizer-driver-longhorn-io,user-agent:csi-resizer/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:05.258) (total time: 6467ms):
k3s[547]: Trace[540331988]: ---"About to write a response" 6467ms (16:30:11.725)
k3s[547]: Trace[540331988]: [6.46735296s] [6.46735296s] END
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:05.020618371 +0200 CEST m=+521993.997909126) (total time: 6.691603672s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/external-attacher-leader-driver-longhorn-io false]]"
k3s[547]: I0618 16:30:11.736330     547 trace.go:236] Trace[1718407090]: "Get" accept:application/json, */*,audit-id:0b87762c-bc62-4d5d-9178-c34f1b39c169,client:10.42.0.217,api-group:coordination.k8s.io,api-version:v1,name:driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/driver-longhorn-io,user-agent:csi-provisioner/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:06.541) (total time: 5195ms):
k3s[547]: Trace[1718407090]: ---"About to write a response" 5195ms (16:30:11.736)
k3s[547]: Trace[1718407090]: [5.195283206s] [5.195283206s] END
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:10.525196397 +0200 CEST m=+521999.502487146) (total time: 1.187053205s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/controllerrevisions/% false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:09.053419501 +0200 CEST m=+521998.030710250) (total time: 2.658860502s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/flowschemas/% false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:04.88548749 +0200 CEST m=+521993.862778228) (total time: 6.827723785s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/external-resizer-driver-longhorn-io false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:04.756980834 +0200 CEST m=+521993.734271589) (total time: 6.956483659s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/longhorn.io/systemrestores/% false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:04.391821569 +0200 CEST m=+521993.369112321) (total time: 7.321999808s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/deployments/operations/traefik-lan false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:04.52891275 +0200 CEST m=+521993.506203499) (total time: 7.178968167s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/hub.traefik.io/apiportals/% false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:06.476866101 +0200 CEST m=+521995.454156860) (total time: 5.231519935s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/longhorn.io/nodes/longhorn-system/tw1 false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:06.107144383 +0200 CEST m=+521995.084435133) (total time: 5.601456952s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/acme.cert-manager.io/challenges/% false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:06.272841691 +0200 CEST m=+521995.250132437) (total time: 5.435921954s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/external-attacher-leader-driver-longhorn-io false]]"
k3s[547]: I0618 16:30:11.750612     547 trace.go:236] Trace[1036716170]: "Get" accept:application/json, */*,audit-id:e3ab9292-a679-4a57-ae51-ba91a5fc1cbb,client:10.42.0.221,api-group:coordination.k8s.io,api-version:v1,name:external-resizer-driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/external-resizer-driver-longhorn-io,user-agent:csi-resizer/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:04.885) (total time: 6865ms):
k3s[547]: Trace[1036716170]: ---"About to write a response" 6865ms (16:30:11.750)
k3s[547]: Trace[1036716170]: [6.865330586s] [6.865330586s] END
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:04.57262864 +0200 CEST m=+521993.549919387) (total time: 7.136338163s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/mutatingwebhookconfigurations/% false]]"
k3s[547]: I0618 16:30:11.751639     547 trace.go:236] Trace[1522675277]: "Get" accept:application/json, */*,audit-id:bf5e8202-ebfe-4697-898b-27e540a0d446,client:10.42.0.224,api-group:coordination.k8s.io,api-version:v1,name:external-snapshotter-leader-driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/external-snapshotter-leader-driver-longhorn-io,user-agent:csi-snapshotter/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:04.624) (total time: 7127ms):
k3s[547]: Trace[1522675277]: ---"About to write a response" 7127ms (16:30:11.751)
k3s[547]: Trace[1522675277]: [7.12710106s] [7.12710106s] END
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:04.968103642 +0200 CEST m=+521993.945394388) (total time: 6.741752988s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/longhorn.io/recurringjobs/% false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:07.565137275 +0200 CEST m=+521996.542428031) (total time: 4.157532661s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/external-attacher-leader-driver-longhorn-io false]]"
k3s[547]: I0618 16:30:11.751744     547 trace.go:236] Trace[428342999]: "Update" accept:application/json, */*,audit-id:37b9f4fe-18fe-49e4-835d-19d78c407440,client:10.42.0.207,api-group:longhorn.io,api-version:v1beta2,name:tw1,subresource:status,namespace:longhorn-system,protocol:HTTP/2.0,resource:nodes,scope:resource,url:/apis/longhorn.io/v1beta2/namespaces/longhorn-system/nodes/tw1/status,user-agent:longhorn-manager/v0.0.0 (linux/amd64) kubernetes/$Format,verb:PUT (18-Jun-2024 16:30:06.474) (total time: 5276ms):
k3s[547]: Trace[428342999]: ["GuaranteedUpdate etcd3" audit-id:37b9f4fe-18fe-49e4-835d-19d78c407440,key:/longhorn.io/nodes/longhorn-system/tw1,type:*unstructured.Unstructured,resource:nodes.longhorn.io 5276ms (16:30:06.475)]
k3s[547]: Trace[428342999]: [5.276867677s] [5.276867677s] END
k3s[547]: I0618 16:30:11.751928     547 trace.go:236] Trace[764575436]: "Get" accept:application/json, */*,audit-id:f0a559de-670f-4ba2-9ec1-29085ab7aff8,client:10.42.0.222,api-group:coordination.k8s.io,api-version:v1,name:external-snapshotter-leader-driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/external-snapshotter-leader-driver-longhorn-io,user-agent:csi-snapshotter/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:09.656) (total time: 2095ms):
k3s[547]: Trace[764575436]: ---"About to write a response" 2095ms (16:30:11.751)
k3s[547]: Trace[764575436]: [2.095831811s] [2.095831811s] END
k3s[547]: I0618 16:30:11.751999     547 trace.go:236] Trace[2103404249]: "Update" accept:application/vnd.kubernetes.protobuf, */*,audit-id:0cb9e119-d38f-406a-a687-bce3cbc7cb64,client:127.0.0.1,api-group:coordination.k8s.io,api-version:v1,name:apiserver-jyuxay2ievjzlx645iuk27itd4,subresource:,namespace:kube-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/apiserver-jyuxay2ievjzlx645iuk27itd4,user-agent:k3s/v1.29.2+k3s1 (linux/amd64) kubernetes/86f1021,verb:PUT (18-Jun-2024 16:30:11.075) (total time: 676ms):
k3s[547]: Trace[2103404249]: ["GuaranteedUpdate etcd3" audit-id:0cb9e119-d38f-406a-a687-bce3cbc7cb64,key:/leases/kube-system/apiserver-jyuxay2ievjzlx645iuk27itd4,type:*coordination.Lease,resource:leases.coordination.k8s.io 676ms (16:30:11.075)
k3s[547]: Trace[2103404249]:  ---"Txn call completed" 676ms (16:30:11.751)]
k3s[547]: Trace[2103404249]: [676.606991ms] [676.606991ms] END
k3s[547]: I0618 16:30:11.752059     547 trace.go:236] Trace[1056095102]: "Get" accept:application/json, */*,audit-id:5330e04b-6404-41fa-9519-b15df241389f,client:10.42.0.213,api-group:coordination.k8s.io,api-version:v1,name:external-attacher-leader-driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/external-attacher-leader-driver-longhorn-io,user-agent:csi-attacher/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:05.020) (total time: 6731ms):
k3s[547]: Trace[1056095102]: ---"About to write a response" 6731ms (16:30:11.752)
k3s[547]: Trace[1056095102]: [6.731612715s] [6.731612715s] END
k3s[547]: I0618 16:30:11.753911     547 trace.go:236] Trace[698237668]: "Get" accept:application/json, */*,audit-id:e0c1d21f-eeea-4f59-a0a5-b643e3191732,client:10.42.0.214,api-group:coordination.k8s.io,api-version:v1,name:external-attacher-leader-driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/external-attacher-leader-driver-longhorn-io,user-agent:csi-attacher/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:06.272) (total time: 5481ms):
k3s[547]: Trace[698237668]: ---"About to write a response" 5481ms (16:30:11.753)
k3s[547]: Trace[698237668]: [5.481281211s] [5.481281211s] END
k3s[547]: I0618 16:30:11.756618     547 trace.go:236] Trace[1841348608]: "Get" accept:application/vnd.kubernetes.protobuf, */*,audit-id:fe70b6a9-3004-4315-b97f-73a827b54fb8,client:127.0.0.1,api-group:apps,api-version:v1,name:traefik-lan,subresource:scale,namespace:operations,protocol:HTTP/2.0,resource:deployments,scope:resource,url:/apis/apps/v1/namespaces/operations/deployments/traefik-lan/scale,user-agent:k3s/v1.29.2+k3s1 (linux/amd64) kubernetes/86f1021/system:serviceaccount:kube-system:horizontal-pod-autoscaler,verb:GET (18-Jun-2024 16:30:04.391) (total time: 7364ms):
k3s[547]: Trace[1841348608]: ---"About to write a response" 7364ms (16:30:11.756)
k3s[547]: Trace[1841348608]: [7.364993817s] [7.364993817s] END
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:04.111270191 +0200 CEST m=+521993.088560946) (total time: 7.598685977s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/csinodes/% false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:10.485870555 +0200 CEST m=+521999.463161306) (total time: 1.230873162s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/driver-longhorn-io false]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:03.361736835 +0200 CEST m=+521992.339027573) (total time: 8.348261603s): INSERT INTO kine(name, created, deleted, create_revision, prev_revision, lease, value, old_value) values(?, ?, ?, ?, ?, ?, ?, ?) : [[/registry/leases/kube-system/plndr-cp-lock 0 0 576 1529467 0 [107 56 115 0 10 31 10 22 99 111 111 114 100 105 110 97 116 105 111 110 46 107 56 115 46 105 111 47 118 49 18 5 76 101 97 115 101 18 201 2 10 150 2 10 13 112 108 110 100 114 45 99 112 45 108 111 99 107 18 0 26 11 107 117 98 101 45 115 121 115 116 101 109 34 0 42 36 55 51 97 97 54 98 50 102 45 49 100 56 97 45 52 98 102 55 45 97 51 48 50 45 51 102 54 99 50 97 100 51 100 55 100 98 50 0 56 0 66 8 8 218 201 157 179 6 16 0 138 1 190 1 10 8 107 117 98 101 45 118 105 112 18 6 85 112 100 97 116 101 26 22 99 111 111 114 100 105 110 97 116 105 111 110 46 107 56 115 46 105 111 47 118 49 34 8 8 235 179 198 179 6 16 0 50 8 70 105 101 108 100 115 86 49 58 124 10 122 123 34 102 58 115 112 101 99 34 58 123 34 102 58 97 99 113 117 105 114 101 84 105 109 101 34 58 123 125 44 34 102 58 104 111 108 100 101 114 73 100 101 110 116 105 116 121 34 58 123 125 44 34 102 58 108 101 97 115 101 68 117 114 97 116 105 111 110 83 101 99 111 110 100 115 34 58 123 125 44 34 102 58 108 101 97 115 101 84 114 97 110 115 105 116 105 111 110 115 34 58 123 125 44 34 102 58 114 101 110 101 119 84 105 109 101 34 58 123 125 125 125 66 0 18 46 10 12 116 119 49 45 115 105 103 110 97 108 111 115 16 15 26 12 8 218 201 157 179 6 16 200 195 242 185 2 34 12 8 235 179 198 179 6 16 144 207 201 170 1 40 0 26 0 34 0] [107 56 115 0 10 31 10 22 99 111 111 114 100 105 110 97 116 105 111 110 46 107 56 115 46 105 111 47 118 49 18 5 76 101 97 115 101 18 201 2 10 150 2 10 13 112 108 110 100 114 45 99 112 45 108 111 99 107 18 0 26 11 107 117 98 101 45 115 121 115 116 101 109 34 0 42 36 55 51 97 97 54 98 50 102 45 49 100 56 97 45 52 98 102 55 45 97 51 48 50 45 51 102 54 99 50 97 100 51 100 55 100 98 50 0 56 0 66 8 8 218 201 157 179 6 16 0 138 1 190 1 10 8 107 117 98 101 45 118 105 112 18 6 85 112 100 97 116 101 26 22 99 111 111 114 100 105 110 97 116 105 111 110 46 107 56 115 46 105 111 47 118 49 34 8 8 233 179 198 179 6 16 0 50 8 70 105 101 108 100 115 86 49 58 124 10 122 123 34 102 58 115 112 101 99 34 58 123 34 102 58 97 99 113 117 105 114 101 84 105 109 101 34 58 123 125 44 34 102 58 104 111 108 100 101 114 73 100 101 110 116 105 116 121 34 58 123 125 44 34 102 58 108 101 97 115 101 68 117 114 97 116 105 111 110 83 101 99 111 110 100 115 34 58 123 125 44 34 102 58 108 101 97 115 101 84 114 97 110 115 105 116 105 111 110 115 34 58 123 125 44 34 102 58 114 101 110 101 119 84 105 109 101 34 58 123 125 125 125 66 0 18 46 10 12 116 119 49 45 115 105 103 110 97 108 111 115 16 15 26 12 8 218 201 157 179 6 16 200 195 242 185 2 34 12 8 233 179 198 179 6 16 248 162 141 168 1 40 0 26 0 34 0]]]"
k3s[547]: time="2024-06-18T16:30:11+02:00" level=info msg="Slow SQL (started: 2024-06-18 16:30:04.628949954 +0200 CEST m=+521993.606240702) (total time: 7.093006871s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/leases/longhorn-system/driver-longhorn-io false]]"
k3s[547]: I0618 16:30:11.763652     547 trace.go:236] Trace[2017536629]: "Get" accept:application/json, */*,audit-id:d188b519-e042-4965-a188-443ffc41341b,client:10.42.0.215,api-group:coordination.k8s.io,api-version:v1,name:external-attacher-leader-driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/external-attacher-leader-driver-longhorn-io,user-agent:csi-attacher/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:07.564) (total time: 4198ms):
k3s[547]: Trace[2017536629]: ---"About to write a response" 4198ms (16:30:11.763)
k3s[547]: Trace[2017536629]: [4.198719525s] [4.198719525s] END
k3s[547]: I0618 16:30:11.764482     547 trace.go:236] Trace[161558378]: "Get" accept:application/json, */*,audit-id:5c63a619-23ef-4c73-ba7f-00c6195941cf,client:10.42.0.216,api-group:coordination.k8s.io,api-version:v1,name:driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/driver-longhorn-io,user-agent:csi-provisioner/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:10.485) (total time: 1278ms):
k3s[547]: Trace[161558378]: ---"About to write a response" 1278ms (16:30:11.764)
k3s[547]: Trace[161558378]: [1.278882006s] [1.278882006s] END
k3s[547]: I0618 16:30:11.764758     547 trace.go:236] Trace[486615849]: "Update" accept:application/json, */*,audit-id:f8ea1394-fa6c-4ff8-9393-4880210172a8,client:10.6.100.50,api-group:coordination.k8s.io,api-version:v1,name:plndr-cp-lock,subresource:,namespace:kube-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-cp-lock,user-agent:kube-vip/v0.0.0 (linux/amd64) kubernetes/$Format,verb:PUT (18-Jun-2024 16:30:03.360) (total time: 8404ms):
k3s[547]: Trace[486615849]: ["GuaranteedUpdate etcd3" audit-id:f8ea1394-fa6c-4ff8-9393-4880210172a8,key:/leases/kube-system/plndr-cp-lock,type:*coordination.Lease,resource:leases.coordination.k8s.io 8404ms (16:30:03.360)
k3s[547]: Trace[486615849]:  ---"Txn call completed" 8403ms (16:30:11.764)]
k3s[547]: Trace[486615849]: [8.404298207s] [8.404298207s] END
k3s[547]: I0618 16:30:11.764831     547 trace.go:236] Trace[764158336]: "Get" accept:application/json, */*,audit-id:da74eb25-2f92-4835-9a4a-a24db98c6e9e,client:10.42.0.218,api-group:coordination.k8s.io,api-version:v1,name:driver-longhorn-io,subresource:,namespace:longhorn-system,protocol:HTTP/2.0,resource:leases,scope:resource,url:/apis/coordination.k8s.io/v1/namespaces/longhorn-system/leases/driver-longhorn-io,user-agent:csi-provisioner/v0.0.0 (linux/amd64) kubernetes/$Format,verb:GET (18-Jun-2024 16:30:04.628) (total time: 7136ms):
k3s[547]: Trace[764158336]: ---"About to write a response" 7135ms (16:30:11.764)
k3s[547]: Trace[764158336]: [7.136033697s] [7.136033697s] END
dmesg log (sdb onwards are longhorn virtual drives)
[Tue Jun 18 16:30:10 2024] sd 7:0:0:1: [sdf] tag#107 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=8s
[Tue Jun 18 16:30:10 2024] sd 7:0:0:1: [sdf] tag#107 Sense Key : Medium Error [current]
[Tue Jun 18 16:30:10 2024] sd 7:0:0:1: [sdf] tag#107 Add. Sense: Unrecovered read error
[Tue Jun 18 16:30:10 2024] sd 7:0:0:1: [sdf] tag#107 CDB: Write(10) 2a 00 00 08 00 00 00 00 08 00
[Tue Jun 18 16:30:10 2024] critical medium error, dev sdf, sector 524288 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
[Tue Jun 18 16:30:10 2024] Buffer I/O error on dev sdf, logical block 65536, lost sync page write
[Tue Jun 18 16:30:10 2024] JBD2: I/O error when updating journal superblock for sdf-8.
[Tue Jun 18 16:30:10 2024] Aborting journal on device sdf-8.
[Tue Jun 18 16:30:10 2024] sd 7:0:0:1: [sdf] tag#108 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Tue Jun 18 16:30:10 2024] sd 7:0:0:1: [sdf] tag#108 Sense Key : Medium Error [current]
[Tue Jun 18 16:30:10 2024] sd 7:0:0:1: [sdf] tag#108 Add. Sense: Unrecovered read error
[Tue Jun 18 16:30:10 2024] sd 7:0:0:1: [sdf] tag#108 CDB: Write(10) 2a 00 00 08 00 00 00 00 08 00
[Tue Jun 18 16:30:10 2024] critical medium error, dev sdf, sector 524288 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
[Tue Jun 18 16:30:10 2024] Buffer I/O error on dev sdf, logical block 65536, lost sync page write
[Tue Jun 18 16:30:10 2024] JBD2: I/O error when updating journal superblock for sdf-8.
[Tue Jun 18 16:30:10 2024] VFS:Filesystem freeze failed
[Tue Jun 18 16:30:18 2024] scsi host3: iSCSI Initiator over TCP/IP
[Tue Jun 18 16:30:18 2024] scsi 3:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
[Tue Jun 18 16:30:18 2024] scsi 3:0:0:0: Attached scsi generic sg2 type 12
[Tue Jun 18 16:30:18 2024] scsi 3:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
[Tue Jun 18 16:30:18 2024] sd 3:0:0:1: Attached scsi generic sg3 type 0
[Tue Jun 18 16:30:18 2024] sd 3:0:0:1: Power-on or device reset occurred
[Tue Jun 18 16:30:18 2024] sd 3:0:0:1: [sdb] 1953792 512-byte logical blocks: (1.00 GB/954 MiB)
[Tue Jun 18 16:30:18 2024] sd 3:0:0:1: [sdb] Write Protect is off
[Tue Jun 18 16:30:18 2024] sd 3:0:0:1: [sdb] Mode Sense: 69 00 10 08
[Tue Jun 18 16:30:18 2024] sd 3:0:0:1: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
[Tue Jun 18 16:30:18 2024] sd 3:0:0:1: [sdb] Attached SCSI disk
[Tue Jun 18 16:30:37 2024] EXT4-fs error (device sdf): ext4_journal_check_start:83: comm mysqld: Detected aborted journal
[Tue Jun 18 16:30:37 2024] EXT4-fs (sdf): Remounting filesystem read-only
[Tue Jun 18 16:30:37 2024] EXT4-fs warning (device sdf): dx_probe:822: inode #12: lblock 0: comm ib_buf_dump: error -5 reading directory block
[Tue Jun 18 16:30:37 2024] EXT4-fs warning (device sdf): dx_probe:822: inode #12: lblock 0: comm ib_buf_dump: error -5 reading directory block
[Tue Jun 18 16:30:37 2024] cni0: port 1(veth24f06782) entered disabled state
[Tue Jun 18 16:30:37 2024] device veth24f06782 left promiscuous mode
[Tue Jun 18 16:30:37 2024] cni0: port 1(veth24f06782) entered disabled state
[Tue Jun 18 16:30:37 2024] EXT4-fs (sdf): unmounting filesystem.
[Tue Jun 18 16:30:47 2024] EXT4-fs (sdb): mounted filesystem with ordered data mode. Quota mode: none.
[Tue Jun 18 16:30:47 2024] cni0: port 1(veth3d9d7e4d) entered blocking state
[Tue Jun 18 16:30:47 2024] cni0: port 1(veth3d9d7e4d) entered disabled state
[Tue Jun 18 16:30:47 2024] device veth3d9d7e4d entered promiscuous mode
[Tue Jun 18 16:30:47 2024] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[Tue Jun 18 16:30:47 2024] IPv6: ADDRCONF(NETDEV_CHANGE): veth3d9d7e4d: link becomes ready

IO timeout error

2024-06-18T14:30:12.676407568+02:00 Jun 18 12:30:03.901872: ->10.42.0.225:10001 W[  4kB]  8750903us failed

Can you provide more information about your environment?


 - Longhorn version: 
 - Impacted volume (PV): <!-- PLEASE specify the volume name to better identify the cause -->
 - Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
 - Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
   - Number of control plane nodes in the cluster:
   - Number of worker nodes in the cluster:
 - Node config
   - OS type and version:
   - Kernel version:
   - CPU per node:
   - Memory per node:
   - Disk type (e.g. SSD/NVMe/HDD):
   - Network bandwidth between the nodes (Gbps):
 - Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
 - Number of Longhorn volumes in the cluster:

commented

sure thing @derekbit

  • Longhorn 1.6.2
  • PV: pvc-fa88c9bf-9543-454e-b593-d9c46209035d (currently, as I created a new one)
  • helm chart via ArgoCD
  • k3s v1.29.2+k3s1 | single master node (ie. it's k8s api server and agent on the same node)
  • Debian, linux 6.1.76. Xeon E-2314, 8GB RAM, SSD 200GB
  • bare metal (supermicro)
  • 5 PVC (2 x 1GB, 3 x 10MB)

worth mentioning re PV that the faulty one is 1G whilst the other 1G one doesn't have this issue. The difference between these two is that the faulty one keeps mysql data while the other one rabbitmq data. The faulty one uses ~200MB while the rabbit one ~30MB.

I'm not sure if the 8-seconds timeout is caused by the disk or CPU starvation. If possible, can you scale down the volume, update the global setting engine-replica-timeout to a bigger value (15 seconds), scale it up and try to back it up again?

BTW, can you provide a support bundle as well? You can send it to longhorn-support-bundle@suse.com. We would like to check more details in it. Thank you.

commented

So I've increased the timeout from 8 to 20s, but the pvc is still failing, however I can notice that it takes longer for it to fail.
If you see the dstat screenshot below, at 09:00:09 mark the load avg starts to rapidly rise and it keep rising until 09:00:53, then starts dropping. Before this change it has been rising until ~00:35 seconds mark thus increasing the timeout prolonged this load avg rise.

Simultaneously I've ran iostat (log attached below) and noticed that the virtual disk (sdd), which is the faulty pvc, has %util param at 100 (ie. fully exhausted), then after the longhorn timeout passes, it drops to 0. Afterwards sda, the physical disk, has a spike, and finally it settles.

I've sent bundle to your email address. Thanks!

image

iostat log
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.76    0.00    0.50   24.69    0.00   74.06

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.50    0.00    1.75   24.06    0.00   72.68

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              4.00     16.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   1.60
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00    1.25   24.81    0.00   73.43

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.26    0.00    0.51   24.75    0.00   73.48

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.01    0.00    1.01   24.56    0.00   73.42

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.51    0.00    0.50   24.43    0.00   73.55

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.75    0.00    1.50   24.31    0.25   72.18

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              4.00     16.00     0.00   0.00    0.25     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   1.20
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.40
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.76    0.00    0.51   24.75    0.00   73.99

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.51    0.00    0.51   24.56    0.00   74.43

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.00    0.00    1.25   24.56    0.00   73.18

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.76   24.81    0.00   74.43

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.54    0.00    8.54   20.35    2.26   60.30

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              4.00     16.00     0.00   0.00    0.25     4.00  459.00   8452.00   200.00  30.35    0.34    18.41    0.00      0.00     0.00   0.00    0.00     0.00   18.00    2.94    0.21  12.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    1.00      4.00     0.00   0.00 27001.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   27.00  82.40
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          47.45    0.00   25.55    7.79   19.22    0.00

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              4.00     16.00     0.00   0.00    1.00     4.00  292.00   3756.00    91.00  23.76  144.35    12.86    0.00      0.00     0.00   0.00    0.00     0.00   11.00    7.09   42.23  77.60
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          48.05    0.00   27.80    2.44   18.29    3.41

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              1.00    128.00     0.00   0.00    3.00   128.00  560.00   8456.00   358.00  39.00    8.48    15.10    0.00      0.00     0.00   0.00    0.00     0.00   88.00    2.19    4.94  60.80
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          52.36    0.00   24.32    0.99   21.34    0.99

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              3.00     56.00     0.00   0.00    2.00    18.67 1149.00  14260.00   503.00  30.45    1.63    12.41    0.00      0.00     0.00   0.00    0.00     0.00  172.00    2.30    2.27  73.60
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          59.05    0.00   29.65    0.00    1.76    9.55

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              2.00     60.00     0.00   0.00    0.00    30.00  435.00   8596.00   215.00  33.08    0.92    19.76    0.00      0.00     0.00   0.00    0.00     0.00   69.00    1.62    0.51  21.60
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          55.30    0.00   25.25    0.00    4.29   15.15

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

So I've increased the timeout from 8 to 20s, but the pvc is still failing, however I can notice that it takes longer for it to fail. If you see the dstat screenshot below, at 09:00:09 mark the load avg starts to rapidly rise and it keep rising until 09:00:53, then starts dropping. Before this change it has been rising until ~00:35 seconds mark thus increasing the timeout prolonged this load avg rise.

Simultaneously I've ran iostat (log attached below) and noticed that the virtual disk (sdd), which is the faulty pvc, has %util param at 100 (ie. fully exhausted), then after the longhorn timeout passes, it drops to 0. Afterwards sda, the physical disk, has a spike, and finally it settles.

I've sent bundle to your email address. Thanks!

image

iostat log

Thanks @mike-code.
Can you adjust the global setting backup-compression-method to 1 for reducing the IO consumption of a backup?

commented

did that but no budge.
now I noticed it's not just longhorn/k3s that freezes but the IO on disk is not working. even doing touch foo will not work until this ~20 seconds pass

commented

we've found the culprit being unrelated to longhorn.
this issue may be closed.

we've found the culprit being unrelated to longhorn. this issue may be closed.

@mike-code Can you elaborate more on the culprit? Thank you.

commented

we've found the culprit being unrelated to longhorn. this issue may be closed.

@mike-code Can you elaborate more on the culprit? Thank you.

@derekbit sure. the k3s cluster is running in a VM. The VM management (proxmox) has been taking snapshots every 30 minutes, coincidentally in the same period as events on longhorn. These snapshots were (to some extent) partially locking the qemu drive thus leading to a severe hiccup

I've posted wrong information that the sever is bare metal (that's what I thought back then).

@mike-code Thanks for the update!