GoogleCloudPlatform / guest-agent

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nscd, unscd, cron and crond fail to restart

zbjornson opened this issue · comments

When google-guest-agent tries to start, it seems to try to start nscd, unscd, cron and crond, but those units are not present on our servers.

$ uname -a
Linux server-2 5.11.0-1020-gcp #22~20.04.1-Ubuntu SMP Tue Sep 21 10:54:26 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# Ubuntu 20.04 LTS Minimal

$ systemctl status google-guest-agent
● google-guest-agent.service - Google Compute Engine Guest Agent
     Loaded: loaded (/lib/systemd/system/google-guest-agent.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2021-10-16 18:02:28 UTC; 19min ago
   Main PID: 555 (google_guest_ag)
      Tasks: 12 (limit: 9536)
     Memory: 20.8M
     CGroup: /system.slice/google-guest-agent.service
             └─555 /usr/bin/google_guest_agent

Oct 16 18:02:27 server-2 dhclient[620]: All rights reserved.
Oct 16 18:02:27 server-2 dhclient[620]: For info, please visit https://www.isc.org/software/dhcp/
Oct 16 18:02:27 server-2 dhclient[620]: 
Oct 16 18:02:27 server-2 dhclient[620]: Listening on Socket/ens4
Oct 16 18:02:27 server-2 dhclient[620]: Sending on   Socket/ens4
Oct 16 18:02:28 server-2 systemd[1]: Started Google Compute Engine Guest Agent.
Oct 16 18:02:28 server-2 GCEGuestAgent[555]: 2021-10-16T18:02:28.9221Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart nscd.service: Unit nscd.service not found.
                                                     .
Oct 16 18:02:29 server-2 GCEGuestAgent[555]: 2021-10-16T18:02:29.5818Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart unscd.service: Unit unscd.service not found.
                                                     .
Oct 16 18:02:29 server-2 GCEGuestAgent[555]: 2021-10-16T18:02:29.8194Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart cron.service: Unit cron.service not found.
                                                     .
Oct 16 18:02:29 server-2 GCEGuestAgent[555]: 2021-10-16T18:02:29.8254Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart crond.service: Unit crond.service not found.
                                                     .

Are these benign? If so, can they be downgraded from Errors?

These same error lines appear in #134, but in my case, the service is active/running, not dead.

I experience this same issue with COS version 93:

# cat /etc/os-release
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_CRASH_ID=Lakitu
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=435e3f6b0837d398051855e22b245142aceb1ec6
VERSION=93
VERSION_ID=93
BUILD_ID=16623.39.6
# journalctl -u google-guest-agent.service -p 3
-- Journal begins at Fri 2021-10-22 22:19:33 UTC, ends at Fri 2021-10-22 22:58:32 UTC. --
Oct 22 22:19:41 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:41.4680Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart nscd.service: Unit nscd.service not found.
                                                           .
Oct 22 22:19:41 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:41.4773Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart unscd.service: Unit unscd.service not found.
                                                           .
Oct 22 22:19:42 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:42.1232Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart cron.service: Unit cron.service not found.
                                                           .
Oct 22 22:19:42 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:42.2261Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart crond.service: Unit crond.service not found.
                                                           .
Oct 22 22:19:42 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:42.2421Z GCEGuestAgent Error oslogin.go:116: Error reloading service: Failed to reload-or-restart ssh.service: Unit ssh.service not found.
                                                           .

When lots of servers start or restart at once, we get 100s of these errors that end up triggering server alerts. Could someone please let us know if this is the same as #134 and thus being worked on? (Should I open a GCP Support case?)

The background for these log messages: on startup, the guest agent makes configuration changes, then restarts services for the changes to take effect. It logs a warning message when a service isn't found, but it is benign.

We actually already reduced this extraneous logging in #122 so if you use an updated version of the guest agent, these logs should go away. I think some of our partner distributions have not yet received this change, i.e. Ubuntu or COS.

Thanks @hopkiw. Indeed the latest version available from/for Ubuntu is 20210629.00. Do you know if there's a way to accelerate the release of a new version? (Is that done by Canonical or Google?)

Canonical takes updates on a regular cadence, except for critical vulnerabilities, where we will ask them to prioritize an update or patch. I don't know if end users can influence the process, but I imagine you might try filing them a bug.