nomad: deep dive into permissions & networking

noahehall opened this issue


  • see perm issues: #48
  • we need to fully understand nomad permissions before moving on
  • i only see this getting more frustrating as we move into more complex stacks
  • while this was a good (and necessary) dive in to the docs
    • the whole issue was around su-exec it must be run as root
      • setting user: root in the docker driver config, will start as root, but then drop privs to the docker img user
      • this is a similar setup to haproxy requiring to start as root, but run as anyone
      • i'm sure there are ways around this, but this seems to be the most straight forward
      • check ps logs to confirm that the consul agent is indeed running as the consul user
  • not quite sure wtf is going on, but it seems to be working now with user = consul
    • lol check ps logs below
    • i'm going to chalk this up as caching/config issue/just a long day, the first time we set user it failed, now its working
    • an interesting idea is to remove all images/cache/etc and start from scratch,
      • it could be that when we switched to root, it was able to execute the scripts, cached the image,
      • then when we switch back to consul the scripts didnt need to run


  • spike: ensure data/volumes are placed within the task working dir or one of the 3 NOMAD_POOP dirs
    • figure out wtf is the difference between the working dir and the three dirs below, its not surfaced in the docs
    • lol the working dir is just the work dir like in docker, silly me
  • spike: lifecycle prestart task to setup user and chown runtime dirs
    • this is related to not being able to specify USER consul in the docker image
    • but that may be related to how we forced the gid/uid on the consul user to match the host
  • spike: review the csi_plugins for something appropriate for validation
    • most seem relevant for cloud stores, e.g. aws ebs


  • core-consul perm issues
# docker compose: everything good
/consul $ ps
    1 consul    0:00 /sbin/docker-init -- ./
    7 consul    0:00 {consul.compose.} /bin/sh ./consul.compose.
    9 consul    0:00 consul agent -config-dir=/consul/config -da
   33 consul    0:00 sh
   40 consul    0:00 ps

# nomad > task > user = "root", sans volumes
# runtime drops privs to user consul, but must be run as root cuz su-exec must be run as root
/consul # ps
    1 root      0:00 /sbin/docker-init -- a
    7 root      0:00 {docker-entrypoi} /usr/bin/dumb-init /bin/s
    8 consul    0:00 consul agent -data-dir=/consul/data -config
   33 root      0:00 sh
   39 root      0:00 ps

# with user = consul: dunno maybe i fixed something in the configs
$ consul
OCI runtime exec failed: exec failed: unable to start container process: exec: "bash": executable file not found in $PATH: unknown

/consul $ ps
    1 consul    0:00 {consul.compose.} /bin/sh ./consul.compos
    8 consul    0:00 consul agent -auto-reload-config -config-
   32 consul    0:00 sh
   38 consul    0:00 ps
/consul $ 

# with volumes
## ==> Failed to load cert/key pair: open /run/secrets/consul_server.pem: no such file or directory
# with secrets as volumes: w00p w00p
# ^ docker secrets need to be translated to nomad secrets 

#### networking
## issue 1:  cert is valid for localhost, not ...
# likely just need to set the extra_hosts in the container

  • core-proxy perm issues
# on initial execution when all env vars are transposed from docker > nomad
# nomad > task > user = haproxy
/consul/ 11: cannot create /consul/config/env.token.hcl: Permission denied
/consul/ 31: cannot create /consul/pid.envoy: Permission denied
su: only root can specify alternative groupssu: 
only root can specify alternative groups
[NOTICE]   (14) : haproxy version is 2.7.1-3e4af0e
[NOTICE]   (14) : path to executable is /usr/local/sbin/haproxy
[WARNING]  (14) : config : [/var/lib/haproxy/configs/002-001-vault.cfg:19] : 'server lb-vault/core-vault-c-dns1' : could not resolve address '', disabling server.
[WARNING]  (14) : config : [/var/lib/haproxy/configs/002-001-vault.cfg:20] : 'server lb-vault/core-vault-d-dns1' : could not resolve address 'core-vault', disabling server.
[ALERT]    (14) : Binding [/var/lib/haproxy/configs/000-000-global.cfg:37] for frontend GLOBAL: cannot bind UNIX socket (Permission denied) [/var/run/api.sock]
[ALERT]    (14) : [haproxy.main()] Some protocols failed to start their listeners! Exiting.

# as with consul, switch user to "root" fixed it which makes sense
# haproxy is different than consul anyway, as haproxy recommends starting as root, but running as X
root@9ffca265061c:/usr/local/etc/haproxy# ps -aux
root           1  0.0  0.0   1136     4 ?        Ss   02:12   0:00 /sbin/docker-init -- ./
root           7  0.0  0.0   2616   524 ?        S    02:12   0:00 /bin/sh ./
root          11  0.0  0.0   2616    96 ?        S    02:12   0:00 /bin/sh /consul/consul.compose.bootstrap.s
root          12  0.0  0.0   4524  2672 ?        S    02:12   0:00 su -g consul - consul sh -c cd /consul/env
root          13  0.0  0.0   2616    96 ?        S    02:12   0:00 /bin/sh /consul/consul.compose.bootstrap.s
root          14  0.0  0.0   4524  2680 ?        S    02:12   0:00 su -g consul - consul sh -c consul agent -
root          15  0.0  0.0  90584  9876 ?        S    02:12   0:00 haproxy -W -db -f /var/lib/haproxy/configs
consul        17  0.0  0.0   2616   592 ?        Ss   02:12   0:00 -sh -c cd /consul/envoy && envoy -c envoy.
consul        18  0.0  0.0   2616   592 ?        Ss   02:12   0:00 -sh -c consul agent -node=core-proxy-9ffca
consul        23  0.4  0.2 811016 76212 ?        Sl   02:12   0:00 consul agent -node=core-proxy-9ffca265061c
consul        24  0.5  0.1 2420640 45916 ?       Sl   02:12   0:00 envoy -c envoy.yaml
haproxy       71  0.0  0.0 846364 13700 ?        Sl   02:12   0:00 haproxy -W -db -f /var/lib/haproxy/configs
root          93  0.0  0.0   4248  3404 pts/0    Ss   02:13   0:00 bash
root         103  0.0  0.0   5904  2792 pts/0    R+   02:14   0:00 ps -aux