Onyx-Protocol / Onyx

Onyx

Home Page:https://Onyx.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

desktop/windows: Windows app cannot run under administrator privileges

jeffomatic opened this issue · comments

Some of our customers (including our enterprise audience) attempt to run our Windows app under an administrator account.

On Windows, having administrator privileges causes startup to fail. ChainMgr.exe uses postgres.exe to launch a non-daemonized postgres server, but the latter must be run without administrator privileges.

Investigation notes

The recommended workaround to the administrator privilege issue is to use pg_ctl.exe instead of postgres.exe.

Applying this change to ChainMgr.exe superficially appears to work, but it breaks the semantics of ChainMgr.exe in subtle ways. Unlike postgres.exe, pg_ctl.exe spins up a background Postgres server and terminates immediately. ChainMgr.exe has a trap to monitor the Postgres launch command, which attempts to terminate cored.exe when the Postgres launch command terminates. When the launch command is pg_ctl.exe (which runs to termination very quickly), cored.exe should, in theory, be killed almost immediately after it starts. However, Chain Core appears to continue to function normally, and is responsive to dashboard interaction, etc.

What's happening here is that there are actually two cored.exes being run on Windows. The Windows cored.exe runs initially in "monitor mode". The monitor then spawns a second cored.exe running in "child mode". The monitor is responsible for respawning the child in the event of a reset API call; the child is responsible for providing the actual API server. When ChainMgr.exe attempts to kill cored.exe in response to pg_ctl.exe's termination, it kills the first cored.exe (the one running as monitor) but not the second cored.exe (the one running as the child).

This conforms quite well to observed behavior after changing the Postgres launch command to pg_ctl.exe: if you reset the core via the dashboard, the server appears to crash with no recovery, and the dashboard hangs. This is because the monitor process was killed shortly after startup, and there is nothing to restart the child after a reset.

In summary, there are three problems:

First, postgres.exe is not the appropriate command to launch Postgres, since it does not work for administrators, and we cannot easily control how users choose their login privileges.

Second, if we want to terminate Chain Core when Postgres terminates, and we resolve the first problem by switching to pg_ctl.exe, we need a different mechanic for monitoring the Postgres process. It would be simple to poll Postgres's availability in a goroutine using the pg_isready.exe utility; in fact, ChainMgr.exe already employs a similar polling mechanism (only in reverse) to determine when the Postgres server is ready to be bootstrapped.

Third, we unwittingly broke ChainMgr.exe's termination semantics when we moved to the monitor/child model in our Windows executable. In any circumstance where the Postgres launcher terminates, ChainMgr.exe will kill the monitor but not the child. The solution to this problem is not immediately obvious to me; traditionally, the problem of killing child processes when a parent is terminated unconditionally is a bit gnarly.

One possible solution to the third problem above is for ChainMgr.exe to use SIGINT, SIGQUIT, or SIGTERM to terminate cored.exe. The monitor should intercept the signal and terminate the child before exiting.

commented

What does pg_ctl do that lets it run? Ultimately pg_ctl has to execute postgres somehow. Presumably it drops privileges with some system interface. Maybe we could do the same thing. It is good to monitor the postgres process directly, since it will give us better latency to react when it exits, and has no CPU overhead while things are running.

Maybe we should take this opportunity to merge all the ChainMgr functionality into the cored monitor process (which is a windows-specific thing). This would solve the third problem, and I think we should do it anyway. It will reduce the number of processes running (from 3 to 2) and mean fewer files we need to ship.

Presumably it drops privileges with some system interface. Maybe we could do the same thing.

Yeah that was my first thought too. Worth the research. I started thinking in other directions because I had this instinct that pg_ctl would have lots of other Business Logic that we'd either have to sift through or replicate (and thus maintain), but it if it's just a flag or something, it'll end up as the simplest possible solution.

Maybe we should take this opportunity to merge all the ChainMgr functionality into the cored monitor process

Yeah I might need to chew on that more to feel great about it, because it sounds like it would it take cored from 1) assuming the presence of an active Postgres instance to 2) actually having to bootstrap the Postgres instance. It could be NBD but I get a spidey-sense tingle here. Maybe there's a developer-edition-only piece of code that worries about Postgres startup and bootstrapping? Would it possibly worth expanding that idea to our other platforms?

commented

I had this instinct that pg_ctl would have lots of other Business Logic

That is a good instinct, but I have it on high authority (@fdr) that (at least in unix) it is easy and good to run the postgres binary directly, and that technique is, if anything, under-utilized.

commented

As for having cored run postgres rather than assuming it is running, yeah that's true and a big conceptual change. But since we're talking about Windows, in practice that's basically always what we want.

If we need to preserve the ability to run cored on windows from the command line and explicitly connect it to an external postgres, we can gate this new behavior on having an empty DATABASE_URL. If the user sets that env var, then they get the traditional cored behavior, and if they don't set it, then it'll run a postgres automatically (to get the ChainMgr behavior).

So this is what pg_ctl does to spawn the server in Windows. At first glance it looks really gnarly, although I don't know how much of that is dedicated to service registration as opposed to simply changing privileges.

commented

Ugh, yeah that does look gnarly. I could be mistaken, but I think the key thing in that function is its call to CreateRestrictedToken instead of simply CreateProcess (which is what Go uses to start a child process).

According to the internet, we can use AdjustTokenPrivileges to get the same effect on the currently-running process, which we might want anyway. If we run AdjustTokenPrivileges in the monitor process, then presumably both child processes (postgres and cored) would run without admin privs.

I'll test this out and see if it looks promising.

My understanding is pg_ctl does stuff to handle Postgres as a Windows service. Running postgres -D ... in the foreground should work. It would seem bizarre to me that you could not exec Postgres on Windows. If people have started developing Postgres via pg_ctl rather than relying on foreground execution, that'd be news to me.

(On macintosh, we encountered problems with dynamic linking + re-locatable directories on account of security things to prevent messing with the dynamic linker via inherited environment variable, but it's not clear to me that precisely the same dynamic would show up on windows)

commented

The original report says:

[Postgres] must be run without administrator privileges.

However, I was going to test out potential fixes, and I first ran ChainMgr in an Administrator account without any special code to modify privileges, and postgres ran successfully. Are there other conditions that cause it to fail?

The plot thickens! I can replicate the admin privilege issue on a Windows Server 2016 EC2 instance, and we've also received reports of that error message from enterprise customers running Windows 7. There's also reports of similar things in the wild, which is what led me down the pg_ctl.exe path in the first place.

I guess we can start by compare host OS versions. Also, what happens if you try manually running postgres.exe -D dataDir?

commented

Unfortunately I was on Windows 10. Running postgres directly from the command line works fine as well. I'll try a couple of other versions of Windows and track it down.