GoogleCloudPlatform / guest-agent

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Package google-guest-agent install/re-install/upgrade triggers the shutdown-scripts to be executed on a running instance.

prahaladramji opened this issue · comments

Steps to reproduce. (tested on ubuntu 18.04)

  1. create a GCE vm with base ubuntu 18.04
  2. add metadata startup-script and shutdown-script as documented
  3. once the vm is booted and running past startup scripts have executed run sudo apt-get install --reinstall google-guest-agent
  4. watch the shutdown scripts being triggered and shutting down services.

Problem

In our case the vm is running our master mysql database. The intention is we do not want/have to stop the mysql service unless shutting down for maintenance or fail-overs. Updating a google managed package i would not consider it to trigger the shutdown sequence. (this seems to be a side effect of packaging the google-shutdown-scripts, service and package as part of the same google-guest-agent package.

The reason we ended up here is that the google-guest-agent v20201217.02-0ubuntu1~18.04.0 has an issue not adding the required route local 172.31.6.35 dev ens4 proto 66 scope host which was causing the ilb health-checks to fail. So contacting gcp support it was suggested to update that package and doing so triggered the shutdown of mysql causing an outage. This is not even documented and I believe it must be documented in the above docs when writing shutdown scripts so that ppl are aware of it. As most shutdown scripts are intended for graceful shutdowns and closing connections. (and not a service restart)

The issue can also be fixed by splitting the packages so we know that installing/updating the script runner packages would cause the shutdown script to run. Or the current package install steps should ignore restarting the script runner services post install/upgrade.

Jul 28 16:46:10 instance-1 sudo[3025]: pramji_company_com : TTY=pts/0 ; PWD=/home/pramji_company_com ; USER=root ; COMMAND=/usr/bin/apt install --reinstall google-guest-agent
Jul 28 16:46:10 instance-1 sudo[3025]: pam_unix(sudo:session): session opened for user root by pramji_company_com(uid=0)
Jul 28 16:46:12 instance-1 sshd[3066]: Did not receive identification string from 35.191.1.32 port 34808
Jul 28 16:46:13 instance-1 systemd[1]: Reloading.
Jul 28 16:46:13 instance-1 systemd[1]: Starting Message of the Day...
Jul 28 16:46:13 instance-1 systemd[1]: Reloading.
Jul 28 16:46:13 instance-1 GCEGuestAgent[2561]: 2021-07-28T16:46:13.9101+10:00 GCEGuestAgent Info: GCE Agent Stopped
Jul 28 16:46:13 instance-1 systemd[1]: Stopping Google Compute Engine Guest Agent...
Jul 28 16:46:13 instance-1 systemd[1]: Stopping Google Compute Engine Shutdown Scripts...
Jul 28 16:46:13 instance-1 GCEMetadataScripts[3209]: 2021/07/28 16:46:13 GCEMetadataScripts: Starting shutdown scripts (version 20210414.00-0ubuntu1~18.04.0).
Jul 28 16:46:13 instance-1 systemd[1]: Stopped Google Compute Engine Guest Agent.
Jul 28 16:46:13 instance-1 systemd[1]: Started Google Compute Engine Guest Agent.
Jul 28 16:46:13 instance-1 systemd[1]: Starting Google Compute Engine Startup Scripts...
Jul 28 16:46:13 instance-1 GCEMetadataScripts[3209]: 2021/07/28 16:46:13 GCEMetadataScripts: Found shutdown-script in metadata.
Jul 28 16:46:13 instance-1 GCEMetadataScripts[3234]: 2021/07/28 16:46:13 GCEMetadataScripts: Starting startup scripts (version 20210414.00-0ubuntu1~18.04.0).
Jul 28 16:46:13 instance-1 test_shutdown_script[3266]: starting shutdown sequence
Jul 28 16:46:13 instance-1 GCEGuestAgent[3232]: 2021-07-28T16:46:13.9657+10:00 GCEGuestAgent Info: GCE Agent Started (version 20210414.00-0ubuntu1~18.04.0)
Jul 28 16:46:13 instance-1 GCEMetadataScripts[3234]: 2021/07/28 16:46:13 GCEMetadataScripts: Found startup-script in metadata.

This was a bug in the Ubuntu package which has since been fixed: https://git.launchpad.net/ubuntu/+source/google-guest-agent/tree/debian/rules

If you CAN still reproduce this on current Ubuntu images please let us know (we cannot).

We actually found that Ubuntu 22.04 has this bug, while Ubuntu 20.04 and 18.04 do not. The logic in the rules file is not working the same on 22.04. We're raising this with Canonical as they maintain their own package rules.

I'm afraid I'm having problems reproducing this issue. To attempt to reproduce I've created a couple of simple 1 line scripts, and started a test001 instance:

$ cat shutdown.sh 
echo 'Shutdown' >> /home/ubuntu/hello
$ cat start.sh 
echo 'Start' >> /home/ubuntu/hello
$ gcloud compute instances create test001 --image-project=ubuntu-os-cloud --image-family=ubuntu-2204-lts --zone=europe-north1-a --metadata-from-file shutdown-script=shutdown.sh --metadata-from-file startup-script=start.sh

Once in the instance:

ubuntu@test001:~$ cat hello 
Start
ubuntu@test001:~$ sudo apt-get install --reinstall google-guest-agent
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnuma1
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 0 not upgraded.
Need to get 6457 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://europe-north1.gce.archive.ubuntu.com/ubuntu jammy-updates/main amd64 google-guest-agent amd64 20220622.00-0ubuntu2~22.04.1 [6457 kB]
Fetched 6457 kB in 0s (14.6 MB/s)       
(Reading database ... 64969 files and directories currently installed.)
Preparing to unpack .../google-guest-agent_20220622.00-0ubuntu2~22.04.1_amd64.deb ...
Unpacking google-guest-agent (20220622.00-0ubuntu2~22.04.1) over (20220622.00-0ubuntu2~22.04.1) ...
Setting up google-guest-agent (20220622.00-0ubuntu2~22.04.1) ...
Scanning processes...                                                                                                                                                                                       
Scanning linux images...                                                                                                                                                                                    

Running kernel seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.
ubuntu@test001:~$ cat hello
Start
ubuntu@test001:~$ 

So, I don't believe I'm seeing either the shutdown script being called during the guest agent re-install, or the instance rebooting.

Zach, could you share your reproducer?

Many thanks, Andy.

I can reproduce this with what appears to be a very similar model to yours. It only happens once - and won't happen if the service is not running already.

Create an instance with a shutdown script

gcloud compute instances create ubu2204-test --image-project ubuntu-os-cloud --image-family ubuntu-2204-lts --metadata=shutdown-script="date >> /home/ubuntu/shutdown"

Verify the state of the google-shutdown-script systemd unit

gcloud compute ssh ubu2204-test --command="sudo journalctl -u google-shutdown-scripts"

May 05 16:06:40 ubu2204-test systemd[1]: Starting Google Compute Engine Shutdown Scripts...
May 05 16:06:40 ubu2204-test systemd[1]: Finished Google Compute Engine Shutdown Scripts.

Verify there is indeed not a shutdown file present

gcloud compute ssh ubu2204-test --command="sudo cat /home/ubuntu/shutdown"

cat: /home/ubuntu/shutdown: No such file or directory

Reinstall the agent

gcloud compute ssh ubu2204-test --command="sudo apt install --reinstall google-guest-agent"

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Reading package lists...
Building dependency tree...
Reading state information...
The following package was automatically installed and is no longer required:
  libnuma1
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 0 not upgraded.
Need to get 6457 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://us-west1.gce.archive.ubuntu.com/ubuntu jammy-updates/main amd64 google-guest-agent amd64 20220622.00-0ubuntu2~22.04.1 [6457 kB]
debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Fetched 6457 kB in 0s (34.8 MB/s)
(Reading database ... 64969 files and directories currently installed.)
Preparing to unpack .../google-guest-agent_20220622.00-0ubuntu2~22.04.1_amd64.deb ...
Unpacking google-guest-agent (20220622.00-0ubuntu2~22.04.1) over (20220622.00-0ubuntu2~22.04.1) ...
Setting up google-guest-agent (20220622.00-0ubuntu2~22.04.1) ...

Running kernel seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.

Verify the shutdown file again

gcloud compute ssh ubu2204-test --command="sudo cat /home/ubuntu/shutdown"

Fri May  5 16:07:52 UTC 2023

Verify the state of the google-shutdown-script systemd unit

gcloud compute ssh ubu2204-test --command="sudo journalctl -u google-shutdown-scripts"

May 05 16:06:40 ubu2204-test systemd[1]: Starting Google Compute Engine Shutdown Scripts...
May 05 16:06:40 ubu2204-test systemd[1]: Finished Google Compute Engine Shutdown Scripts.
May 05 16:07:52 ubu2204-test systemd[1]: Stopping Google Compute Engine Shutdown Scripts...
May 05 16:07:52 ubu2204-test google_metadata_script_runner[1666]: Starting shutdown scripts (version 20220622.00-0ubuntu2~22.04.1).
May 05 16:07:52 ubu2204-test google_metadata_script_runner[1666]: Found shutdown-script in metadata.
May 05 16:07:52 ubu2204-test google_metadata_script_runner[1666]: shutdown-script exit status 0
May 05 16:07:52 ubu2204-test google_metadata_script_runner[1666]: Finished running shutdown scripts.
May 05 16:07:52 ubu2204-test systemd[1]: google-shutdown-scripts.service: Deactivated successfully.
May 05 16:07:52 ubu2204-test systemd[1]: Stopped Google Compute Engine Shutdown Scripts.

From this point on, until you reboot, it won't run the shutdown script (because the unit is not activate). Actually there's a second issue at this point in that your shutdown script won't actually run on shutdown after this because it has already run.

Reboot the VM

gcloud compute ssh ubu2204-test --command="sudo reboot"

Verify the shutdown file again

gcloud compute ssh ubu2204-test --command="sudo cat /home/ubuntu/shutdown"

Fri May  5 16:07:52 UTC 2023

Note this is the same entry as before (script didn't actually run).

Reboot the VM again

gcloud compute ssh ubu2204-test --command="sudo reboot"

Verify the shutdown file again

gcloud compute ssh ubu2204-test --command="sudo cat /home/ubuntu/shutdown"

Fri May  5 16:07:52 UTC 2023
Fri May  5 16:13:08 UTC 2023

Now that the service is up, it will run the script on shutdown as it should. And from this point you can repro again by reinstalling the agent etc.

launchpad bug https://bugs.launchpad.net/ubuntu/+source/google-guest-agent/+bug/2019089 created to track the ubuntu package update work

I believe this issue is fixed in Ubuntu package and we will have a test case for this now in GoogleCloudPlatform/guest-test-infra#990
I'm going to close this out.