Add some help for startup failure

Question

Add some help for startup failure

danbst opened this issue 6 years ago · comments

When systemd unit fails to launch, would be great to show journalctl tail.

Also, when everything is OK, maybe mention there is command sudo journalctl -M container-name to view internal container logs - activation warnings are visible only there.

erikarvstedt · Answer 1 · Mon Oct 08 2018 18:47:52 GMT+0800 (China Standard Time)

Great idea!
Could you share a minimal config for a container that fails on startup?

Danylo Hlynskyi · Answer 2 · Mon Oct 08 2018 19:59:46 GMT+0800 (China Standard Time)

this fails activation on NixOS, so does in container

    services.postgresql.enable = true;
    services.postgresql.extraConfig = ";";

erikarvstedt · Answer 3 · Mon Oct 08 2018 20:41:52 GMT+0800 (China Standard Time)

extra-container add -s <<'EOF'
{
  containers.faildemo = {
      config = {
        services.postgresql.enable = true;
        services.postgresql.extraConfig = ";";
      };
  };
}
EOF

With the above config the postgresql service inside the container fails, but the container service itself keeps on running. (systemctl status container@faildemo)

For the container service to fail, its nspawn process must fail, which, for example, could happen when the container init process (first the stage 2 init script, then the systemd process) fails.
This is a very unlikely outcome, that's why I asked for an example.

Did you really experience an actual container startup failure?
Or are you just looking for a way to get notified of service failures inside the container?

Danylo Hlynskyi · Answer 4 · Mon Oct 08 2018 21:12:01 GMT+0800 (China Standard Time)

This is what I get when do activation with faulty configuration in NixOS:

activating the configuration...
setting up /etc...
reloading user units for danbst...
setting up tmpfiles
reloading the following units: dbus.service
restarting the following units: polkit.service
starting the following units: accounts-daemon.service
warning: the following units failed: postgresql.service

● postgresql.service - PostgreSQL Server
   Loaded: loaded (/nix/store/x3jaw8c7wql7hqzfzl3lzivgprfyik56-unit-postgresql.service/postgresql.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2018-10-08 14:58:47 EEST; 20ms ago
  Process: 17368 ExecStartPost=/nix/store/5hinhmldw0wvz6anppy6qddff41hmfl8-unit-script-postgresql-post-start (code=exited, status=1/FAILURE)
  Process: 17367 ExecStart=/nix/store/p456h4s30czb24aj1vp2hsn6bj6q6grh-unit-script-postgresql-start (code=exited, status=1/FAILURE)
  Process: 17356 ExecStartPre=/nix/store/x6pl6dp6wwgywm5y15qgz4nljxvfs7b2-unit-script-postgresql-pre-start (code=exited, status=0/SUCCESS)
 Main PID: 17367 (code=exited, status=1/FAILURE)

жов 08 14:58:47 station p456h4s30czb24aj1vp2hsn6bj6q6grh-unit-script-postgresql-start[17367]: LOG:  syntax error in file "/var/lib/postgresql/9.6/postgresql.conf" line 6, near token ";"
жов 08 14:58:47 station p456h4s30czb24aj1vp2hsn6bj6q6grh-unit-script-postgresql-start[17367]: FATAL:  configuration file "/var/lib/postgresql/9.6/postgresql.conf" contains errors
жов 08 14:58:47 station systemd[1]: postgresql.service: Main process exited, code=exited, status=1/FAILURE
жов 08 14:58:47 station sudo[17461]:     root : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAND=/nix/store/h1hp1rankf5py63qs453bq418mww5hpw-postgresql-9.6.10/bin/psql --port=5432 -d postgres -c
жов 08 14:58:47 station sudo[17461]: pam_unix(sudo:session): session opened for user postgres by (uid=0)
жов 08 14:58:47 station sudo[17461]: pam_unix(sudo:session): session closed for user postgres
жов 08 14:58:47 station 5hinhmldw0wvz6anppy6qddff41hmfl8-unit-script-postgresql-post-start[17368]: /nix/store/5hinhmldw0wvz6anppy6qddff41hmfl8-unit-script-postgresql-post-start: line 3: kill: (17367) - No such process
жов 08 14:58:47 station systemd[1]: postgresql.service: Control process exited, code=exited status=1
жов 08 14:58:47 station systemd[1]: postgresql.service: Failed with result 'exit-code'.
жов 08 14:58:47 station systemd[1]: Failed to start PostgreSQL Server.
warning: error(s) occurred while switching to the new configuration

So, activation has actually failed and nixos-rebuil switch detects this. I wish container manager could detect such situation and inform right back.

But if you are talking about container failuer, then here it is:

{
   containers.db = {
     bindMounts."/db" = { hostPath = "/no-such-path"; };
     config = {};
   };
}

and current output:

$ sudo-extra-container create test-cont.nix --start
Building containers...

Installing containers:
db

Starting containers:
db

Job for container@db.service failed because the control process exited with error code.
See "systemctl  status container@db.service" and "journalctl  -xe" for details.

Danylo Hlynskyi · Answer 5 · Mon Oct 08 2018 21:14:17 GMT+0800 (China Standard Time)

I also take back my words to mention journalctl -M - no need, all it's output is available in host machine journal too.