siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.

Home Page:https://www.talos.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AttachVolume Failed for pod after pod recreated and stuck in init state

Rammurthy5 opened this issue · comments

commented

Bug Report

a stateful workload pod was deleted, and its got recreated as expected but stuck in init state for long time. when described it says attach volume failed.

Description

postgres DB workload running on workers and the pod was deleted to see if its being recreated. And, it has but pvc attach got stuck.

Logs

  Warning  FailedAttachVolume  25s (x2 over 26s)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-556db03a-f0d2-46ed-b75c-b33d386c53d5" : rpc error: code = Internal desc = Operation failed: GenericOperation(500, "error in response: status code '500 Internal Server Error', content: 'RestJsonError { details: \"create_nexus::status: Internal, message: \\\"Failed to acquire write exclusive reservation on child nvmf://10.0.0.40:8420/nqn.2019-05.io.openebs:6d273a89-387a-4918-ada5-8bd21d6fea30?uuid=6d273a89-387a-4918-ada5-8bd21d6fea30 of nexus 556db03a-f0d2-46ed-b75c-b33d386c53d5: Failed to register key for child: NVMe IO Passthru command dh failed: NVMe IO Passthru command dh failed\\\", details: [], metadata: MetadataMap { headers: {\\\"content-type\\\": \\\"application/grpc\\\", \\\"date\\\": \\\"Tue, 18 Jun 2024 15:12:56 GMT\\\", \\\"content-length\\\": \\\"0\\\"} }\", message: \"SvcError::GrpcRequestError\", kind: Internal }'")
  Warning  FailedAttachVolume  16s (x3 over 23s)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-556db03a-f0d2-46ed-b75c-b33d386c53d5" : rpc error: code = Internal desc = Operation failed: GenericOperation(500, "error in response: status code '500 Internal Server Error', content: 'RestJsonError { details: \"create_nexus::status: Internal, message: \\\"Failed to acquire write exclusive reservation on child nvmf://10.0.0.40:8420/nqn.2019-05.io.openebs:6d273a89-387a-4918-ada5-8bd21d6fea30?uuid=6d273a89-387a-4918-ada5-8bd21d6fea30 of nexus 556db03a-f0d2-46ed-b75c-b33d386c53d5: Failed to register key for child: NVMe IO Passthru command dh failed: NVMe IO Passthru command dh failed\\\", details: [], metadata: MetadataMap { headers: {\\\"content-type\\\": \\\"application/grpc\\\", \\\"date\\\": \\\"Tue, 18 Jun 2024 15:12:58 GMT\\\", \\\"content-length\\\": \\\"0\\\"} }\", message: \"SvcError::GrpcRequestError\", kind: Internal }'")
  Warning  FailedMount         11s (x6 over 26s)  kubelet                  MountVolume.SetUp failed for volume "shm" : mount failed: signal: segmentation fault
Mounting command: mount
Mounting arguments: -t tmpfs -o size=1073741824 tmpfs /var/lib/kubelet/pods/52bee99a-a48a-4b58-b996-e062b960a69e/volumes/kubernetes.io~empty-dir/shm
Output:
  Warning  FailedMount  11s (x6 over 26s)  kubelet  MountVolume.SetUp failed for volume "app-secret" : mount failed: signal: segmentation fault
Mounting command: mount
Mounting arguments: -t tmpfs -o size=1073741824 tmpfs /var/lib/kubelet/pods/52bee99a-a48a-4b58-b996-e062b960a69e/volumes/kubernetes.io~secret/app-secret
Output:
  Warning  FailedMount  11s (x6 over 26s)  kubelet  MountVolume.SetUp failed for volume "scratch-data" : assign quota FAILED createProjectID /var/lib/kubelet/pods/52bee99a-a48a-4b58-b996-e062b960a69e/volumes/kubernetes.io~empty-dir/scratch-data 0 failed unable to run xfs_quota: signal: segmentation fault
  Warning  FailedMount  11s (x6 over 26s)  kubelet  MountVolume.SetUp failed for volume "kube-api-access-bqn5m" : mount failed: signal: segmentation fault
Mounting command: mount
Mounting arguments: -t tmpfs -o size=1073741824 tmpfs /var/lib/kubelet/pods/52bee99a-a48a-4b58-b996-e062b960a69e/volumes/kubernetes.io~projected/kube-api-access-bqn5m
Output:
  Warning  FailedAttachVolume  8s  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-556db03a-f0d2-46ed-b75c-b33d386c53d5" : rpc error: code = Internal desc = Operation failed: GenericOperation(500, "error in response: status code '500 Internal Server Error', content: 'RestJsonError { details: \"create_nexus::status: Internal, message: \\\"Failed to acquire write exclusive reservation on child nvmf://10.0.0.40:8420/nqn.2019-05.io.openebs:6d273a89-387a-4918-ada5-8bd21d6fea30?uuid=6d273a89-387a-4918-ada5-8bd21d6fea30 of nexus 556db03a-f0d2-46ed-b75c-b33d386c53d5: Failed to register key for child: NVMe IO Passthru command dh failed: NVMe IO Passthru command dh failed\\\", details: [], metadata: MetadataMap { headers: {\\\"content-type\\\": \\\"application/grpc\\\", \\\"date\\\": \\\"Tue, 18 Jun 2024 15:13:07 GMT\\\", \\\"content-length\\\": \\\"0\\\"} }\", message: \"SvcError::GrpcRequestError\", kind: Internal }'")

Environment

  • Talos version: 1.7.2
  • Kubernetes version: v1.28.3
  • Platform: AWS EC2

There are many errors here, and some of them might be related to Talos, some might be not, but it would be nice to have a small reproducer test case (something minimal you could kubectl apply) so that we can start looking into that.