nomad: deep dive into permissions & networking
noahehall opened this issue · comments
C
- see perm issues: #48
- we need to fully understand nomad permissions before moving on
- i only see this getting more frustrating as we move into more complex stacks
- while this was a good (and necessary) dive in to the docs
- the whole issue was around
su-exec
it must be run as root- setting
user: root
in the docker driver config, will start as root, but then drop privs to the docker img user - this is a similar setup to haproxy requiring to start as root, but run as anyone
- i'm sure there are ways around this, but this seems to be the most straight forward
- check ps logs to confirm that the consul agent is indeed running as the consul user
- setting
- the whole issue was around
- not quite sure wtf is going on, but it seems to be working now with user = consul
- lol check ps logs below
- i'm going to chalk this up as caching/config issue/just a long day, the first time we set user it failed, now its working
- an interesting idea is to remove all images/cache/etc and start from scratch,
- it could be that when we switched to root, it was able to execute the scripts, cached the image,
- then when we switch back to consul the scripts didnt need to run
T
- spike: ensure data/volumes are placed within the task working dir or one of the 3 NOMAD_POOP dirs
- figure out wtf is the difference between the working dir and the three dirs below, its not surfaced in the docs
- lol the working dir is just the work dir like in docker, silly me
- NOMAD_ALLOC_DIR
- NOMAD_TASK_DIR
- NOMAD_SECRETS_DIR
- spike: lifecycle prestart task to setup user and chown runtime dirs
- this is related to not being able to specify
USER consul
in the docker image - but that may be related to how we forced the gid/uid on the consul user to match the host
- this is related to not being able to specify
- spike: review the csi_plugins for something appropriate for validation
- most seem relevant for cloud stores, e.g. aws ebs
A
- see hashicorp/nomad#15540
- see zadam/trilium#2907
- see hashicorp/nomad#2800
- https://developer.hashicorp.com/nomad/docs/job-specification/lifecycle
- https://kubernetes-csi.github.io/docs/drivers.html
- core-consul perm issues
# docker compose: everything good
/consul $ ps
PID USER TIME COMMAND
1 consul 0:00 /sbin/docker-init -- ./consul.compose.boots
7 consul 0:00 {consul.compose.} /bin/sh ./consul.compose.
9 consul 0:00 consul agent -config-dir=/consul/config -da
33 consul 0:00 sh
40 consul 0:00 ps
# nomad > task > user = "root", sans volumes
# runtime drops privs to user consul, but must be run as root cuz su-exec must be run as root
/consul # ps
PID USER TIME COMMAND
1 root 0:00 /sbin/docker-init -- docker-entrypoint.sh a
7 root 0:00 {docker-entrypoi} /usr/bin/dumb-init /bin/s
8 consul 0:00 consul agent -data-dir=/consul/data -config
33 root 0:00 sh
39 root 0:00 ps
# with user = consul: dunno maybe i fixed something in the configs
$ script.exec.cunt.sh consul
OCI runtime exec failed: exec failed: unable to start container process: exec: "bash": executable file not found in $PATH: unknown
/consul $ ps
PID USER TIME COMMAND
1 consul 0:00 {consul.compose.} /bin/sh ./consul.compos
8 consul 0:00 consul agent -auto-reload-config -config-
32 consul 0:00 sh
38 consul 0:00 ps
/consul $
# with volumes
## ==> Failed to load cert/key pair: open /run/secrets/consul_server.pem: no such file or directory
# with secrets as volumes: w00p w00p
# ^ docker secrets need to be translated to nomad secrets
#### networking
## issue 1: cert is valid for localhost, not ...
# likely just need to set the extra_hosts in the container
- core-proxy perm issues
# on initial execution when all env vars are transposed from docker > nomad
# nomad > task > user = haproxy
/consul/consul.compose.bootstrap.sh: 11: cannot create /consul/config/env.token.hcl: Permission denied
/consul/consul.compose.bootstrap.sh: 31: cannot create /consul/pid.envoy: Permission denied
su: only root can specify alternative groupssu:
only root can specify alternative groups
[NOTICE] (14) : haproxy version is 2.7.1-3e4af0e
[NOTICE] (14) : path to executable is /usr/local/sbin/haproxy
[WARNING] (14) : config : [/var/lib/haproxy/configs/002-001-vault.cfg:19] : 'server lb-vault/core-vault-c-dns1' : could not resolve address 'core-vault.service.search', disabling server.
[WARNING] (14) : config : [/var/lib/haproxy/configs/002-001-vault.cfg:20] : 'server lb-vault/core-vault-d-dns1' : could not resolve address 'core-vault', disabling server.
[ALERT] (14) : Binding [/var/lib/haproxy/configs/000-000-global.cfg:37] for frontend GLOBAL: cannot bind UNIX socket (Permission denied) [/var/run/api.sock]
[ALERT] (14) : [haproxy.main()] Some protocols failed to start their listeners! Exiting.
# as with consul, switch user to "root" fixed it which makes sense
# haproxy is different than consul anyway, as haproxy recommends starting as root, but running as X
root@9ffca265061c:/usr/local/etc/haproxy# ps -aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 1136 4 ? Ss 02:12 0:00 /sbin/docker-init -- ./haproxy.compose.boo
root 7 0.0 0.0 2616 524 ? S 02:12 0:00 /bin/sh ./haproxy.compose.bootstrap.sh
root 11 0.0 0.0 2616 96 ? S 02:12 0:00 /bin/sh /consul/consul.compose.bootstrap.s
root 12 0.0 0.0 4524 2672 ? S 02:12 0:00 su -g consul - consul sh -c cd /consul/env
root 13 0.0 0.0 2616 96 ? S 02:12 0:00 /bin/sh /consul/consul.compose.bootstrap.s
root 14 0.0 0.0 4524 2680 ? S 02:12 0:00 su -g consul - consul sh -c consul agent -
root 15 0.0 0.0 90584 9876 ? S 02:12 0:00 haproxy -W -db -f /var/lib/haproxy/configs
consul 17 0.0 0.0 2616 592 ? Ss 02:12 0:00 -sh -c cd /consul/envoy && envoy -c envoy.
consul 18 0.0 0.0 2616 592 ? Ss 02:12 0:00 -sh -c consul agent -node=core-proxy-9ffca
consul 23 0.4 0.2 811016 76212 ? Sl 02:12 0:00 consul agent -node=core-proxy-9ffca265061c
consul 24 0.5 0.1 2420640 45916 ? Sl 02:12 0:00 envoy -c envoy.yaml
haproxy 71 0.0 0.0 846364 13700 ? Sl 02:12 0:00 haproxy -W -db -f /var/lib/haproxy/configs
root 93 0.0 0.0 4248 3404 pts/0 Ss 02:13 0:00 bash
root 103 0.0 0.0 5904 2792 pts/0 R+ 02:14 0:00 ps -aux