Sporadic failure due to Workstation service not started on Windows, service had been started all along
doctorpangloss opened this issue · comments
What happened:
Pod stuck in creation phase with:
NewSmbGlobalMapping failed. output: "New-SmbGlobalMapping : The Workstation service has not been started. \r\nAt line:1 char:190\r\n+ ... ser, $PWord;New-SmbGlobalMapping -RemotePath $Env:smbremotepath -Cred ...\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n + CategoryInfo : NotSpecified: (MSFT_SmbGlobalMapping:ROOT/Microsoft/...mbGlobalMapping) [New-SmbGlobalMa \r\n pping], CimException\r\n + FullyQualifiedErrorId : Windows System Error 2138,New-SmbGlobalMapping\r\n \r\n", err: exit status 1
SSH-ing into the node showed the service had started.
nssm restart LanmanWorkstation
nonetheless resolved the issue.
What you expected to happen:
If this error appears, the driver should restart the service.
How to reproduce it:
Use the Feb 2024 Windows 2022 release and the latest 1.14 csi-driver-smb, perhaps... It's hard to say why this occurs.
FWIW the daemonset was failing to start in about 1 in 3 Windows nodes with 1.13 and the latest Windows patches.
Anything else we need to know?:
Environment:
- CSI Driver version:
$ kubectl get po -n kube-system -o yaml | grep registry.k8s | grep smb
image: registry.k8s.io/sig-storage/smbplugin:v1.14.0
image: registry.k8s.io/sig-storage/smbplugin:v1.14.0
imageID: registry.k8s.io/sig-storage/smbplugin@sha256:4e97e6f8c122c87253c89fce466e760f88122aa4a7b21677fad4c603144cc0dd
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:58:16Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"windows/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.2+k0s", GitCommit:"fc04e732bb3e7198d2fa44efa5457c7c6f8c0f5b", GitTreeState:"clean", BuildDate:"2023-03-02T19:21:48Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}
- OS (e.g. from /etc/os-release): Windows 2022
- Kernel (e.g.
uname -a
):Windows_NT AppMana-Hostname-XXX 10.0 20348 x86_64 MS/Windows
I reverted from Feb 2024 Windows 2022 because it also breaks Calico. Perhaps these are related.
Cropped up again:
Warning FailedMount 2m (x10 over 8m21s) kubelet MountVolume.MountDevice failed for volume "pvc-365af285-dd45-4376-86a8-64fa28c78f49" : rpc error: code = Internal desc = volume(appmana-017-ds.i.appmana.com/appmana-cluster-03#pvc-365af285-dd45-4376-86a8-64fa28c78f49#) mount "//appmana-017-ds.i.appmana.com/appmana-cluster-03/pvc-365af285-dd45-4376-86a8-64fa28c78f49" on "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\smb.csi.k8s.io\\1d50cf8e540807f9541f7d297b25d00fdb89a19cc7d7ede4e23fc87f89034f5c\\globalmount" failed with NewSmbGlobalMapping(\\appmana-017-ds.i.appmana.com\appmana-cluster-03\pvc-365af285-dd45-4376-86a8-64fa28c78f49, c:\var\lib\kubelet\plugins\kubernetes.io\csi\smb.csi.k8s.io\1d50cf8e540807f9541f7d297b25d00fdb89a19cc7d7ede4e23fc87f89034f5c\globalmount) failed with error: rpc error: code = Unknown desc = NewSmbGlobalMapping failed. output: "New-SmbGlobalMapping : The Workstation service has not been started. \r\nAt line:1 char:190\r\n+ ... ser, $PWord;New-SmbGlobalMapping -RemotePath $Env:smbremotepath -Cred ...\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n + CategoryInfo : NotSpecified: (MSFT_SmbGlobalMapping:ROOT/Microsoft/...mbGlobalMapping) [New-SmbGlobalMa \r\n pping], CimException\r\n + FullyQualifiedErrorId : Windows System Error 2138,New-SmbGlobalMapping\r\n \r\n", err: exit status 1
Warning FailedMount 110s kubelet Unable to attach or mount volumes: unmounted volumes=[comfyui-volume], unattached volumes=[kube-api-access-md2wm comfyui-volume workdir-volume]: timed out waiting for the condition