Add some help for startup failure
danbst opened this issue · comments
When systemd unit fails to launch, would be great to show journalctl tail.
Also, when everything is OK, maybe mention there is command sudo journalctl -M container-name
to view internal container logs - activation warnings are visible only there.
Great idea!
Could you share a minimal config for a container that fails on startup?
this fails activation on NixOS, so does in container
services.postgresql.enable = true;
services.postgresql.extraConfig = ";";
extra-container add -s <<'EOF'
{
containers.faildemo = {
config = {
services.postgresql.enable = true;
services.postgresql.extraConfig = ";";
};
};
}
EOF
With the above config the postgresql service inside the container fails, but the container service itself keeps on running. (systemctl status container@faildemo
)
For the container service to fail, its nspawn process must fail, which, for example, could happen when the container init process (first the stage 2 init script, then the systemd process) fails.
This is a very unlikely outcome, that's why I asked for an example.
Did you really experience an actual container startup failure?
Or are you just looking for a way to get notified of service failures inside the container?
This is what I get when do activation with faulty configuration in NixOS:
activating the configuration...
setting up /etc...
reloading user units for danbst...
setting up tmpfiles
reloading the following units: dbus.service
restarting the following units: polkit.service
starting the following units: accounts-daemon.service
warning: the following units failed: postgresql.service
● postgresql.service - PostgreSQL Server
Loaded: loaded (/nix/store/x3jaw8c7wql7hqzfzl3lzivgprfyik56-unit-postgresql.service/postgresql.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2018-10-08 14:58:47 EEST; 20ms ago
Process: 17368 ExecStartPost=/nix/store/5hinhmldw0wvz6anppy6qddff41hmfl8-unit-script-postgresql-post-start (code=exited, status=1/FAILURE)
Process: 17367 ExecStart=/nix/store/p456h4s30czb24aj1vp2hsn6bj6q6grh-unit-script-postgresql-start (code=exited, status=1/FAILURE)
Process: 17356 ExecStartPre=/nix/store/x6pl6dp6wwgywm5y15qgz4nljxvfs7b2-unit-script-postgresql-pre-start (code=exited, status=0/SUCCESS)
Main PID: 17367 (code=exited, status=1/FAILURE)
жов 08 14:58:47 station p456h4s30czb24aj1vp2hsn6bj6q6grh-unit-script-postgresql-start[17367]: LOG: syntax error in file "/var/lib/postgresql/9.6/postgresql.conf" line 6, near token ";"
жов 08 14:58:47 station p456h4s30czb24aj1vp2hsn6bj6q6grh-unit-script-postgresql-start[17367]: FATAL: configuration file "/var/lib/postgresql/9.6/postgresql.conf" contains errors
жов 08 14:58:47 station systemd[1]: postgresql.service: Main process exited, code=exited, status=1/FAILURE
жов 08 14:58:47 station sudo[17461]: root : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAND=/nix/store/h1hp1rankf5py63qs453bq418mww5hpw-postgresql-9.6.10/bin/psql --port=5432 -d postgres -c
жов 08 14:58:47 station sudo[17461]: pam_unix(sudo:session): session opened for user postgres by (uid=0)
жов 08 14:58:47 station sudo[17461]: pam_unix(sudo:session): session closed for user postgres
жов 08 14:58:47 station 5hinhmldw0wvz6anppy6qddff41hmfl8-unit-script-postgresql-post-start[17368]: /nix/store/5hinhmldw0wvz6anppy6qddff41hmfl8-unit-script-postgresql-post-start: line 3: kill: (17367) - No such process
жов 08 14:58:47 station systemd[1]: postgresql.service: Control process exited, code=exited status=1
жов 08 14:58:47 station systemd[1]: postgresql.service: Failed with result 'exit-code'.
жов 08 14:58:47 station systemd[1]: Failed to start PostgreSQL Server.
warning: error(s) occurred while switching to the new configuration
So, activation has actually failed and nixos-rebuil switch
detects this. I wish container manager could detect such situation and inform right back.
But if you are talking about container failuer, then here it is:
{
containers.db = {
bindMounts."/db" = { hostPath = "/no-such-path"; };
config = {};
};
}
and current output:
$ sudo-extra-container create test-cont.nix --start
Building containers...
Installing containers:
db
Starting containers:
db
Job for container@db.service failed because the control process exited with error code.
See "systemctl status container@db.service" and "journalctl -xe" for details.
I also take back my words to mention journalctl -M
- no need, all it's output is available in host machine journal too.