nirv-ai / IaC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nomad: core refactor and setup

noahehall opened this issue · comments

C

  • complete consul ticket before continuing
  • the core stack now includes vault, haproxy and consul+envoy;
    • bff, ui and postgres are now in a separate stack
    • this separation of platform (core) and web (ui, bff, db) concerns helps drive faster iteration
  • we need to refactor and integrate nomad for orchestration in validation
  • core goals
    • take the output from dev as input to validation
    • validation: execute services on prod like infra
    • push artifacts to nexus for downstream envs

T

  • nomad review: its been awhile
    • nomad notes
    • nomad docs
  • refactor existing nomad logic with intelligence gained from consul ticket
    • directory hierarchy: i think it should be IaC now instead of a nomad dir
    • nomad.sh incorporate new utils dir
    • docker env file: incorporate new .env.auto logic
  • save docker images as tar files so you can use artifact + load instead of running a registry
    • push this to the nexus ticket as that will determine which route we take
  • take another swing at nomad pack it should reduce the amount of inhouse stuff we have to create
    • stay away from levant, no matter how sweet it is
  • integrate nomad with core
  • review nomad resource utilization and update defaults (we were way off in estimates)
  • update aws AMIs to include nomad binary, cni plugins, and post install files
  • think through the interoperability between envs and devise a more efficient management process
    • the initial nomad integration is tiresome, it shouldnt be this way
    • albeit just awhole lotta copypasta and things, this highlights a need for automation/better architecture

A



issue 1: perm
chown: /consul/data: Operation not permitted

we switched the container workdir from /consul to /opt/consul to align with consul web docs
however if you read the consul dockerhub docs it uses /consul and not /opt/consul
solution is to follow the docker hub docs rather than dealing with nomad perm issues at this juncture
a longer term solution is to deal with nomad volume perm issues which doesnt seem as straight forward

issue 2: perm
su-exec: setgroups(994): Operation not permitted

relates to issue 1
finding the root cause of nomad perm issues will likely solve this
and truly resolve issue 1
quick fix: remove `USER consul` from image