kubernetes / kubeadm

Aggregator for issues filed against kubeadm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

kubeadm.exe fails on Windows Server 2022

larsskj opened this issue · comments

BUG REPORT

Versions

kubeadm version: &version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.4", GitCommit:"55019c83b0fd51ef4ced8c29eec2c4847f896e74", GitTreeState:"clean", BuildDate:"2024-04-16T15:05:51Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"windows/amd64"}

Environment:

Existing kubeadm cluster based on four Linux nodes running Ubuntu 22.04 as VMware VMs. Cluster works fine and everything is updated recently to Kubernetes version 1.29.4.

All nodes have 16 cores and 64 GB memory. They run containerd and networking is Calico. We use an external Ubuntu host running HA Proxy as a load balancer for the API server. The four node cluster works fine.

What happened?

We need support for Windows containers as well, so I'm trying to add a Windows Server 2022 based host to the cluster. And this is where the fun starts. The new Windows host has 16 cores/64 GB as well, and it's running in the same VMware cluster.

I hope (and believe) I've done the preparation of the Win host properly - primarily based on documentation from these sources:

https://k8s-docs.netlify.app/en/docs/setup/production-environment/windows/user-guide-windows-nodes/
https://docs.tigera.io/calico/latest/getting-started/kubernetes/windows-calico/operator

But when I try to run

PS C:\Windows\system32> kubeadm join kube-cph.bogus.com:6443 --token ue62yc.ks285c6i4vr4cy1n --discovery-token-ca-cert-hash sha256:8deeb01d3a1a7fbad9d3708405c503d458f32beba5b36294a284cce8d46fbf1f
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR IsPrivilegedUser]: user is not running as administrator
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

I am running as a user with Admin permissions and in a PowerShell session with elevated permissions. Every utility we can think of confirms that this is a session with full administrative rights.

If I ignore the error above, preflight will continue and emit a whole cascade of new errors, all Linux related and all showing clear signs that kubeadm believes it's running on Linux.

But nevertheless

kubeadm version: &version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.4", GitCommit:"55019c83b0fd51ef4ced8c29eec2c4847f896e74", GitTreeState:"clean", BuildDate:"2024-04-16T15:05:51Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"windows/amd64"}

states that platform is "windows/amd64".

What you expected to happen?

I would expect the node to join the cluster.

How to reproduce it (as minimally and precisely as possible)?

Create a new Linux based kubeadm cluster. Verify that it runs fine. Setup a new Windows Server 2022 host with containerd. Try to add it to the cluster using kubeadm join.

Anything else we need to know?

I have asked for help in the Slack channel "Kubernetes/kubeadm", but except for friendly interest I've had no response.

https://k8s-docs.netlify.app/en/docs/setup/production-environment/windows/user-guide-windows-nodes/

is a guide maintained by sig-windows and not kubeamd maintainers, it maybe out of date.
if so log a ticket in https://github.com/kubernetes-sigs/sig-windows-tools

the same group used to maintain e2e tests for kubeadm.

https://docs.tigera.io/calico/latest/getting-started/kubernetes/windows-calico/operator

if there are issues with calico also log a ticket in the above repo.
last time i tested kubeadm on Windows myself was using Flennel and Calico still did not support Windows. that was some time ago.

If I ignore the error above, preflight will continue and emit a whole cascade of new errors, all Linux related and all showing clear signs that kubeadm believes it's running on Linux.

please share the full log with --v 5

PS C:\Windows\system32> kubeadm join kube-cph.bogus.com:6443 --token ue62yc.ks285c6i4vr4cy1n --discovery-token-ca-cert-hash sha256:8deeb01d3a1a7fbad9d3708405c503d458f32beba5b36294a284cce8d46fbf1f
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR IsPrivilegedUser]: user is not running as administrator

this is actually kubeadm running on Windows. On Linux the error message says "user is not running as root".
the logic for priv user checks on Windows is here:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/preflight/checks_windows.go

it has depedency if your current user is part of the "S-1-5-32-544" SID.
if it's not i'd say this is not a compliant "administrator" user.
https://support.microsoft.com/en-us/help/243330/well-known-security-identifiers-in-windows-operating-systems

i suspect then kubeadm is failing with more errors because it needs admin access.

I have asked for help in the Slack channel "Kubernetes/kubeadm", but except for friendly interest I've had no response.

we don't provide support on github, but let's see if there is a potential bug here. i don't think kubeadm for windows is fully tested in open source right now.

EDIT: this project uses kubeadm.exe, but i don't know all the details:
https://github.com/kubernetes-sigs/cluster-api-provider-azure

I understand you're not doing support here - but to me this looks like a bug.

I'll look into your suggestions above and return with more information.

A progress statement: I had an idea and added myself as local admin on the Windows Server. That solved the IsPrivilegedUser problem mentioned above.

I had my administrator privileges from a group membership, but apparently this isn't sufficient for kubeadm.exe.

I would see this as a bug in kubeadm.exe - there's a bunch of well-documented ways to check for administrator privileges without requiring people to be local admins.

I would see this as a bug in kubeadm.exe - there's a bunch of well-documented ways to check for administrator privileges without requiring people to be local admins.

care to share them? fwiw, noone else has complained about this.

@jsturtevant @knabben @marosset

this code was added by a sig win member a while back:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/preflight/checks_windows.go

@larsskj please also share the logs when this user is not part of the group.

my guess is that without local admin a user cannot start the kubelet service:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/initsystem/initsystem_windows.go

i don't think kubeadm for windows is fully tested in open source right now.

It is pretty well tested at this point. We run our sig-windows tests using kubeadm via cluster-api-for-azure. An example test: https://testgrid.k8s.io/sig-windows-master-release#capz-windows-2022-master

I would see this as a bug in kubeadm.exe - there's a bunch of well-documented ways to check for administrator privileges without requiring people to be local admins.

I don't really know the history of this check or why/if we need local admin. @marosset any ideas?

It is pretty well tested at this point. We run our sig-windows tests using kubeadm via cluster-api-for-azure. An example test: https://testgrid.k8s.io/sig-windows-master-release#capz-windows-2022-master

yes, i edited my comment as i forgot that kubeadm is used in CAPZ.

I don't really know the history of this check or why/if we need local admin. @marosset any ideas?

my guess is the following:

kubeadm has some service manager code (on Linux this "init system" is systemd) to start the kubelet.
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/initsystem/initsystem_windows.go

it uses golang.org/x/sys/windows/svc/mgr which wraps the OpenSCManager/OpenService WINAPI
https://learn.microsoft.com/en-us/windows/win32/services/starting-a-service, and it think it fails if the user is not in the S-1-5-32-544 group.

i don't think we should move away from managing the kubelet as a service or stop using golang.org/x/sys/windows/svc/mgr, so this seems to be a requirement and wont-fix.

@larsskj please also share the logs when this user is not part of the group.

my guess is that without local admin a user cannot start the kubelet service: https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/initsystem/initsystem_windows.go

This is not true: I have installed, started, stopped, restarted, and reconfigured the service a lot of times using my group assigned administrative privileges.

But of course your Go code might use a specific API call that requires membership as local administrator. But in general, this is not required.

I'm not sure which logs you're asking for? The immediate logs when running as a non-local admin are the logs shown in the opening of this ticket, so you should have them already; but if you want other logs, please specify which.

i don't think we should move away from managing the kubelet as a service or stop using golang.org/x/sys/windows/svc/mgr, so this seems to be a requirement and wont-fix.

Absolutely not: That's not what I'm suggesting. But you might consider using an API call that doesn't require a local admin - if this is the case?

Or otherwise at least modify the error message to something more meaningful: Being told you're not an admin (I am), you do not have administrative privileges (I do) is not the best communication.

An error message stating that "You must be local administrator to run kubeadm" might be more helpful?

That is, of course, only if you cannot use an API call that doesn't require local administrator membership.

And for one of the other questions: A PowerShell command like

(New-Object Security.Principal.WindowsPrincipal([Security.Principal.WindowsIdentity]::GetCurrent())).IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)

will return True if you have administrative privileges, False if not. This can surely be called through the Windows API as well.

And for one of the other questions: A PowerShell command like

(New-Object Security.Principal.WindowsPrincipal([Security.Principal.WindowsIdentity]::GetCurrent())).IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)

will return True if you have administrative privileges, False if not. This can surely be called through the Windows API as well.

in prior discussions i recall that using powershell from kubeadm is not preferred. yet, i'm not aware of equivalent APIs exposed to check if the user is admin in the standard library or x/windows.

Or otherwise at least modify the error message to something more meaningful: Being told you're not an admin (I am), you do not have administrative privileges (I do) is not the best communication.

yes, it seems like a PR to improve the error messaging is required.
i will be sending that once i understand more what it has to contain.

I'm not sure which logs you're asking for? The immediate logs when running as a non-local admin are the logs shown in the opening of this ticket, so you should have them already; but if you want other logs, please specify which.

you said;

If I ignore the error above, preflight will continue and emit a whole cascade of new errors, all Linux related and all showing clear signs that kubeadm believes it's running on Linux.

so exactly the first workflow your reported in the ticket description:

  • setup your user to be privileged but not part of local admins "S-1-5-32-544" SID
  • start kubeadm join with the flags --v=5 and --ignore-preflight-errors=IsPrivilegedUser
  • show the logs, i need to understand if the errors are due to privileges in the init system / service code.

will return True if you have administrative privileges, False if not. This can surely be called through the Windows API as well.

in prior discussions i recall that using powershell from kubeadm is not preferred. yet, i'm not aware of equivalent APIs exposed to check if the user is admin in the standard library or x/windows.

i tested locally on Windows 11 and the code that uses the standard library to check if user is admin no longer works for me!
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/preflight/checks_windows.go

my user is part of "S-1-5-32-544" SID, but the failure is when fetching the groups.
this is actually code i originally contributed to go, but it seems to me they did some string fixes and it might have regressed.
golang/go@2a16176
or maybe it never worked under certain conditions.

@jsturtevant @marosset

what are your thoughts on using the method of shelling to openfiles.exe to determine privileged user?
for a privileged user it would return exit code 0 and it seems to be available on Windows server * but how about nanoserver?

or use net session with exit code 0 for admin.

https://stackoverflow.com/questions/4051883/batch-script-how-to-check-for-admin-rights#11995662

or is there another hack we can use from what windows/x exposes?
https://golang.org/x/sys/windows

or should we use powershell commands like @larsskj suggested:

(New-Object Security.Principal.WindowsPrincipal([Security.Principal.WindowsIdentity]::GetCurrent())).IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)

so exactly the first workflow your reported in the ticket description:

* setup your user to be privileged but not part of local admins "S-1-5-32-544" SID
* start kubeadm join with the flags --v=5 and --ignore-preflight-errors=IsPrivilegedUser
* show the logs, i need to understand if the errors are due to privileges in the init system / service code.

Unfortunately my environment has changed quite a bit since I wrote that - and I cannot reproduce it with the same results.

It still doesn't work, but the errors are quite different now.

Currently, I'm fighting a situation where the kubelet service keeps restarting: That should not be part of this ticket, I assume.

BTW, I see a security concern here as well: Maintaining nodes in a Kubernetes cluster is rarely a one-man operation. If using kubeadm on Windows means that everybody doing maintenance on the host(s) in the cluster must be local admins, this could cause security problems when a person leaves the maintenance team.

Usually, you would remove that person from the group granting access - but for local admins, you need to log in to all affected nodes and remove that person from the local admin as well. That's potentially a lot of work - and error prone.

or is there another hack we can use from what windows/x exposes?
https://golang.org/x/sys/windows

this works and it's the canonical UAC way:

package main

import (
	"fmt"

	"golang.org/x/sys/windows"
)

func isPrivProcess() (bool, error) {
	hProcess := windows.CurrentProcess()
	var hProcessToken windows.Token
	err := windows.OpenProcessToken(hProcess, windows.TOKEN_QUERY, &hProcessToken)
	if err != nil {
		return false, err
	}
	return hProcessToken.IsElevated(), nil
}

func main() {
	val, err := isPrivProcess()
	fmt.Println(val, err)
}

IsElevated() calls GetTokenInformation with TOKEN_ELEVATION which is valid;
https://cs.opensource.google/go/x/sys/+/refs/tags/v0.19.0:windows/security_windows.go;l=759

i will change the kubeadm preflight check to this in 1.31.

Unfortunately my environment has changed quite a bit since I wrote that - and I cannot reproduce it with the same results.

+

BTW, I see a security concern here as well: Maintaining nodes in a Kubernetes cluster is rarely a one-man operation. If using kubeadm on Windows means that everybody doing maintenance on the host(s) in the cluster must be local admins, this could cause security problems when a person leaves the maintenance team.

Usually, you would remove that person from the group granting access - but for local admins, you need to log in to all affected nodes and remove that person from the local admin as well. That's potentially a lot of work - and error prone.

now i suspect the errors you got were not privilege errors.
the fix for the problem discussed here is --ignore-preflight-errors=IsPrivilegedUser for kubeadm versions < 1.31 under certain conditions.

It still doesn't work, but the errors are quite different now.
Currently, I'm fighting a situation where the kubelet service keeps restarting: That should not be part of this ticket, I assume.

could be anything, but this repo is for kubeadm.exe issues specifically as per your use case.

note, windows kubelet tickets should be logged in the kubernetes/kubernetes repo.
anything else around wrappers and windows tooling should go in https://github.com/kubernetes-sigs/sig-windows-tools

Eh - yes - I very much agree on that.

My comment was about trying to emphasize the importance of this problem: Even though it can be solved by adding people as local admins, this is an insecure and error prone workaround.

But that doesn't matter now: You seem to have found an excellent solution.

My comment was about trying to emphasize the importance of this problem: Even though it can be solved by adding people as local admins, this is an insecure and error prone workaround.

or by skipping the erroneous preflight check

here is the fix PR
kubernetes/kubernetes#124665
thanks for the report @larsskj !