I've been working the kubernetes environment for a few years. GitOps has introduced a new way of thinking about deployments; pull deploys vs push deploys. With the declarative nature of NixOS and the heavy use of git in the community, I thought it'd be interesting to see if pull deploys could be possible for NixOS to increase security by not requiring SSH access to the edge device.
I could see the advantage of an edge device being flashed with NixOS that contained a GitHub Deploy Key to phone home to auto update. All of the OS updates/changes would be fully auditable via the git repo. I could also see this being helpful for those of us (me) that would like updates to be automated, but in a way that could be rolled back in a programmatic way if a verification step failed.
The overall design 2.0 (limiting scope to strictly MVP for speed of idea testing) is to run a bash script
as a systemd service. This service would be deployed to a machine already running NixOS (or theoretically
a part of the initial configuration.nix
build/install).
The MVP design assumes that /etc/nixos
is a git repository with a symlink for configuraion.nix
.
Once pullnix is running, it will poll the repo every 5 minutes for a configuration update, if it is out of sync, pull the newest configuration and switch. If the switch somehow disrupts communication with git, switch back.
If this all works, it'd be an interesting idea to run vulnix
against the result
file before switching
to run some sort of logic to measure if the new build is more vulnerable than the current one.
After a few hours, I had a bash script that had the git polling working pretty well. It was accurately
alerting me to when the repo was out of sync, would update the local version, successfully build the new
version, then it would print that it would be running nixos-switch
if it was enabled and not just the
echo
command.
code snippet
# bin/pullnix
NIXOS_CONFIG_DIR=/etc/nixos
LOG_DIR=/var/log/pullnix
PULLNIX_LOG=$LOG_DIR/pullnix.log
GIT_REMOTE=origin
GIT_BRANCH=main
#POLL_TIME=300
POLL_TIME=30
# Set up logging files
echo "=====Setting up pullnix====="
if ! [ -d $LOG_DIR ]; then
echo "Creating $LOG_DIR"
mkdir $LOG_DIR
fi
if ! [ -f $PULLNIX_LOG ]; then
echo "Creating $PULLNIX_LOG"
touch $PULLNIX_LOG
fi
echo "NIX_PATH: $NIX_PATH"
echo "PATH: $PATH"
echo "============================"
runSync() {
git pull
echo "> nixos-rebuild build"
nixos-rebuild build -p $remote_head -I $NIXOS_CONFIG_DIR/configuration.nix
if [ $? -eq 0 ]; then
echo "> nixos-rebuild switch" >> $PULLNIX_LOG
#nixos-rebuild switch -p $remote_head -I $NIXOS_CONFIG_DIR/configuration.nix >> $PULLNIX_LOG 2>&1
if [ $(git ls-remote &>> /dev/null; echo $?) -eq 0 ]; then
echo "git repo connection successful" >> $PULLNIX_LOG
else
echo "git repo connection failed" >> $PULLNIX_LOG
echo "Switching back" >> $PULLNIX_LOG
#nixos-rebuild switch --rollback >> $PULLNIX_LOG 2>&1
fi
fi
}
cd $NIXOS_CONFIG_DIR
while true; do
git fetch
local_head=$(git rev-parse $GIT_BRANCH)
remote_head=$(git rev-parse $GIT_REMOTE/$GIT_BRANCH)
if [ $local_head != $remote_head ]; then
echo "Out of sync -- local:${local_head:0:7} => remote:${remote_head:0:7}"
runSync
fi
sleep $POLL_TIME
done
I then spent the next few hours running around being a NixOS noob and trying to figure out how make a
derivation of the bash script. While I could have added the contents of this repo as files in my
nixos-configs
repo under a subdirectory of ./pkgs
, I was stubborn and wanted to pull in the package
from a git repo.
I finally got them working (seen as the *.nix files
joseph-flinn/pullnix)! Looking back, it probably would
have been a lot easier to just add a derivation directly in the ./pkgs
directory. Every time I updated
pullnix
, I then had to update the fetchGitHub
sha and rev
code snippet
# nixos-configs/pkgs/pullnix/default.nix
{ stdenv, fetchFromGitHub, bash}:
stdenv.mkDerivation rec {
name = "pullnix-${version}";
version = "a77dcea124f79fe9b88e54cb52b8f12d53768370";
src = fetchFromGitHub {
owner = "joseph-flinn";
repo = "pullnix";
rev = "${version}";
sha256 = "0sgnzvynnh1z95avwrfn0r1y9zb8kpaph9hqd9fn2qvfx01mifai";
};
buildInputs = [ bash ];
preConfigure = ''
export PREFIX=$out
'';
}
# configuration.nix
{ config, pkgs, ... }:
let
pullnix = pkgs.callPackage ../pkgs/pullnix {};
in
{
#...
environment.systemPackages = with pkgs; [
"${pullnix}"
];
After getting the bash script installed as an executable in the nix store, I enabled the
nixos-rebuild switch
and tested that the logic worked altogether by manually running the script.
So far so good. On to enabling the service via systemd
.
I ran into some issues translating between systemd configuration and systemd the nix-way. Long story
short, full paths are important in systemd and bash scripts need a shebang or to be explicitly run. In
addition to that, PATH didn't have /run/current-system/sw/bin
in it for git
, and ssh
(needed for
git
). Thinking that Environment="PATH=/run/current-system/sw/bin:${PATH}"
would work, I messed around
with this format for a bit. Turns out that this doesn't work in non-NixOS environments either. I ended up
just hardcoding the paths for both PATH
and NIX_PATH
.
code snippet
# configuration.nix
{
#...
systemd.services.pullnix = {
enable = true;
description = "pullnix agent";
after = ["network.target"];
serviceConfig = {
Type = "simple";
Restart = "always";
RestartSec = 5;
Environment = "PATH=/run/current-system/sw/bin NIX_PATH=/root/.nix-defexpr/channels:nixpkgs=/nix/var/nix/profiles/per-user/root/channels/nixos:nixos-config=/etc/nixos/configuration.nix:/nix/var/nix/profiles/per-user/root/channels";
ExecStart = "${pkgs.bash}/bin/bash ${pullnix}/bin/pullnix";
StandardOutput = "append:/var/log/pullnix/pullnix.log";
StandardError = "append:/var/log/pullnix/pullnix.log";
};
wantedBy = [ "multi-user.target" ];
};
#...
Everything seemed to be humming along quite nicely. Until...until I did a full test where I was expecting
the system to switch itself. The agent did exactly what it was supposed until nixos-rebuild switch
got
to the stage where it restarts services. pullnix
was terminated in the middle of the switch. The switch
was being run in the pullnix
process so it was also terminated before it completed and brought the new
pullnix
back up. The next step was to fork the switch process and verification to outside of the process
that would be terminated on restart.
code snippet
# bin/pullnix
NIXOS_CONFIG_DIR=/etc/nixos
LOG_DIR=/var/log/pullnix
PULLNIX_LOG=$LOG_DIR/pullnix.log
GIT_REMOTE=origin
GIT_BRANCH=main
#POLL_TIME=300
POLL_TIME=30
# Set up logging files
echo "=====Setting up pullnix====="
if ! [ -d $LOG_DIR ]; then
echo "Creating $LOG_DIR"
mkdir $LOG_DIR
fi
if ! [ -f $PULLNIX_LOG ]; then
echo "Creating $PULLNIX_LOG"
touch $PULLNIX_LOG
fi
echo "NIX_PATH: $NIX_PATH"
echo "PATH: $PATH"
echo "============================"
runSync() {
git pull
echo "> nixos-rebuild build"
nixos-rebuild build -p $remote_head -I $NIXOS_CONFIG_DIR/configuration.nix
if [ $? -eq 0 ]; then
pullnix-switch $NIX_CONFIG_DIR $PULLNIX_LOG $remote_head &
fi
}
cd $NIXOS_CONFIG_DIR
while true; do
git fetch
local_head=$(git rev-parse $GIT_BRANCH)
remote_head=$(git rev-parse $GIT_REMOTE/$GIT_BRANCH)
if [ $local_head != $remote_head ]; then
echo "Out of sync -- local:${local_head:0:7} => remote:${remote_head:0:7}"
runSync
fi
sleep $POLL_TIME
done
# bin/pullnix-switch
NIX_CONFIG_DIR=$0
PULLNIX_LOG=$1
remote_head=$2
echo "====="
echo "PATH: $PATH"
echo "NIX_PATH: $NIX_PATH"
echo "NIX_CONFIG_DIR: $NIX_CONFIG_DIR"
echo "PULLNIX_LOG: $PULLNIX_LOG"
echo "remote_head: $remote_head"
echo "====="
cd $NIX_CONFIG_DIR
echo "> nixos-rebuild switch" >> $PULLNIX_LOG
nixos-rebuild switch -p $remote_head -I $NIXOS_CONFIG_DIR/configuration.nix >> $PULLNIX_LOG 2>&1
if [ $(git ls-remote &>> /dev/null; echo $?) -eq 0 ]; then
echo "git repo connection successful" >> $PULLNIX_LOG
else
echo "git repo connection failed" >> $PULLNIX_LOG
echo "Switching back" >> $PULLNIX_LOG
nixos-rebuild switch --rollback >> $PULLNIX_LOG 2>&1
fi
This seemed to work on initial testing. pullnix
was staying up on nixos-build
, which means that
it could continue working, but...it wouldn't restart itself which means that it couldn't manage itself.
While this might not have been a big issue if it was managed outside of the NixOS configuration, that
breaks the entire philosophy of Nix.
I'd like to see something like this idea in the future. However, I have realized that there is a problem with having systemd, managed by NixOS, trying to then manage NixOS where NixOS will kill it and its processes on reload. Something is needed outside of NixOS but...still on the machine?
If this is possible, more learning of NixOS and how it operates is needed. For now, I'll be using one of the other ops tools: NixOps, deploy-rs, morph, or nixus