desktop/windows: Windows app cannot run under administrator privileges

Question

desktop/windows: Windows app cannot run under administrator privileges

jeffomatic opened this issue 7 years ago · comments

Some of our customers (including our enterprise audience) attempt to run our Windows app under an administrator account.

On Windows, having administrator privileges causes startup to fail. ChainMgr.exe uses postgres.exe to launch a non-daemonized postgres server, but the latter must be run without administrator privileges.

Investigation notes

The recommended workaround to the administrator privilege issue is to use pg_ctl.exe instead of postgres.exe.

Applying this change to ChainMgr.exe superficially appears to work, but it breaks the semantics of ChainMgr.exe in subtle ways. Unlike postgres.exe, pg_ctl.exe spins up a background Postgres server and terminates immediately. ChainMgr.exe has a trap to monitor the Postgres launch command, which attempts to terminate cored.exe when the Postgres launch command terminates. When the launch command is pg_ctl.exe (which runs to termination very quickly), cored.exe should, in theory, be killed almost immediately after it starts. However, Chain Core appears to continue to function normally, and is responsive to dashboard interaction, etc.

What's happening here is that there are actually two cored.exes being run on Windows. The Windows cored.exe runs initially in "monitor mode". The monitor then spawns a second cored.exe running in "child mode". The monitor is responsible for respawning the child in the event of a reset API call; the child is responsible for providing the actual API server. When ChainMgr.exe attempts to kill cored.exe in response to pg_ctl.exe's termination, it kills the first cored.exe (the one running as monitor) but not the second cored.exe (the one running as the child).

This conforms quite well to observed behavior after changing the Postgres launch command to pg_ctl.exe: if you reset the core via the dashboard, the server appears to crash with no recovery, and the dashboard hangs. This is because the monitor process was killed shortly after startup, and there is nothing to restart the child after a reset.

In summary, there are three problems:

First, postgres.exe is not the appropriate command to launch Postgres, since it does not work for administrators, and we cannot easily control how users choose their login privileges.

Second, if we want to terminate Chain Core when Postgres terminates, and we resolve the first problem by switching to pg_ctl.exe, we need a different mechanic for monitoring the Postgres process. It would be simple to poll Postgres's availability in a goroutine using the pg_isready.exe utility; in fact, ChainMgr.exe already employs a similar polling mechanism (only in reverse) to determine when the Postgres server is ready to be bootstrapped.

Third, we unwittingly broke ChainMgr.exe's termination semantics when we moved to the monitor/child model in our Windows executable. In any circumstance where the Postgres launcher terminates, ChainMgr.exe will kill the monitor but not the child. The solution to this problem is not immediately obvious to me; traditionally, the problem of killing child processes when a parent is terminated unconditionally is a bit gnarly.

Jeff Lee · Answer 1 · Fri Jun 23 2017 07:44:16 GMT+0800 (China Standard Time)

One possible solution to the third problem above is for ChainMgr.exe to use SIGINT, SIGQUIT, or SIGTERM to terminate cored.exe. The monitor should intercept the signal and terminate the child before exiting.

kr · Answer 2 · Fri Jun 23 2017 07:59:53 GMT+0800 (China Standard Time)

What does pg_ctl do that lets it run? Ultimately pg_ctl has to execute postgres somehow. Presumably it drops privileges with some system interface. Maybe we could do the same thing. It is good to monitor the postgres process directly, since it will give us better latency to react when it exits, and has no CPU overhead while things are running.

Maybe we should take this opportunity to merge all the ChainMgr functionality into the cored monitor process (which is a windows-specific thing). This would solve the third problem, and I think we should do it anyway. It will reduce the number of processes running (from 3 to 2) and mean fewer files we need to ship.

Jeff Lee · Answer 3 · Fri Jun 23 2017 08:21:52 GMT+0800 (China Standard Time)

Presumably it drops privileges with some system interface. Maybe we could do the same thing.

Yeah that was my first thought too. Worth the research. I started thinking in other directions because I had this instinct that pg_ctl would have lots of other Business Logic that we'd either have to sift through or replicate (and thus maintain), but it if it's just a flag or something, it'll end up as the simplest possible solution.

Maybe we should take this opportunity to merge all the ChainMgr functionality into the cored monitor process

Yeah I might need to chew on that more to feel great about it, because it sounds like it would it take cored from 1) assuming the presence of an active Postgres instance to 2) actually having to bootstrap the Postgres instance. It could be NBD but I get a spidey-sense tingle here. Maybe there's a developer-edition-only piece of code that worries about Postgres startup and bootstrapping? Would it possibly worth expanding that idea to our other platforms?

kr · Answer 4 · Fri Jun 23 2017 08:25:09 GMT+0800 (China Standard Time)

I had this instinct that pg_ctl would have lots of other Business Logic

That is a good instinct, but I have it on high authority (@fdr) that (at least in unix) it is easy and good to run the postgres binary directly, and that technique is, if anything, under-utilized.

kr · Answer 5 · Fri Jun 23 2017 08:28:52 GMT+0800 (China Standard Time)

As for having cored run postgres rather than assuming it is running, yeah that's true and a big conceptual change. But since we're talking about Windows, in practice that's basically always what we want.

If we need to preserve the ability to run cored on windows from the command line and explicitly connect it to an external postgres, we can gate this new behavior on having an empty DATABASE_URL. If the user sets that env var, then they get the traditional cored behavior, and if they don't set it, then it'll run a postgres automatically (to get the ChainMgr behavior).

Jeff Lee · Answer 6 · Fri Jun 23 2017 08:43:41 GMT+0800 (China Standard Time)

So this is what pg_ctl does to spawn the server in Windows. At first glance it looks really gnarly, although I don't know how much of that is dedicated to service registration as opposed to simply changing privileges.

kr · Answer 7 · Fri Jun 23 2017 09:25:18 GMT+0800 (China Standard Time)

Ugh, yeah that does look gnarly. I could be mistaken, but I think the key thing in that function is its call to CreateRestrictedToken instead of simply CreateProcess (which is what Go uses to start a child process).

According to the internet, we can use AdjustTokenPrivileges to get the same effect on the currently-running process, which we might want anyway. If we run AdjustTokenPrivileges in the monitor process, then presumably both child processes (postgres and cored) would run without admin privs.

I'll test this out and see if it looks promising.

Daniel Farina · Answer 8 · Fri Jun 23 2017 10:24:25 GMT+0800 (China Standard Time)

My understanding is pg_ctl does stuff to handle Postgres as a Windows service. Running postgres -D ... in the foreground should work. It would seem bizarre to me that you could not exec Postgres on Windows. If people have started developing Postgres via pg_ctl rather than relying on foreground execution, that'd be news to me.

Daniel Farina · Answer 9 · Fri Jun 23 2017 10:27:46 GMT+0800 (China Standard Time)

(On macintosh, we encountered problems with dynamic linking + re-locatable directories on account of security things to prevent messing with the dynamic linker via inherited environment variable, but it's not clear to me that precisely the same dynamic would show up on windows)

kr · Answer 10 · Fri Jun 23 2017 17:09:56 GMT+0800 (China Standard Time)

The original report says:

[Postgres] must be run without administrator privileges.

However, I was going to test out potential fixes, and I first ran ChainMgr in an Administrator account without any special code to modify privileges, and postgres ran successfully. Are there other conditions that cause it to fail?

Jeff Lee · Answer 11 · Sat Jun 24 2017 00:46:07 GMT+0800 (China Standard Time)

The plot thickens! I can replicate the admin privilege issue on a Windows Server 2016 EC2 instance, and we've also received reports of that error message from enterprise customers running Windows 7. There's also reports of similar things in the wild, which is what led me down the pg_ctl.exe path in the first place.

I guess we can start by compare host OS versions. Also, what happens if you try manually running postgres.exe -D dataDir?

kr · Answer 12 · Sat Jun 24 2017 03:35:19 GMT+0800 (China Standard Time)

Unfortunately I was on Windows 10. Running postgres directly from the command line works fine as well. I'll try a couple of other versions of Windows and track it down.