widdix / aws-ec2-ssh

Manage AWS EC2 SSH access with IAM

Home Page:https://cloudonaut.io/manage-aws-ec2-ssh-access-with-iam/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

install.sh does not always restart sshd on Ubuntu

KusabiSensei opened this issue · comments

Operating System: Ubuntu
AWS CLI Version: aws-cli/1.11.13 Python/3.5.2 Linux/4.4.0-1044-aws botocore/1.4.70

CloudFormation template, when using Ubuntu, does not reload sshd using the systemd service restart.

Manual installation using install.sh in an interactive console session works as expected on Ubuntu.

The issue is a missing else block:

if [[ "$retval" -eq "0" ]]; then
  if [[ (`systemctl is-system-running` =~ running) || (`systemctl is-system-running` =~ degraded) ]]; then
    if [ -f "/usr/lib/systemd/system/sshd.service" ] || [ -f "/lib/systemd/system/sshd.service" ]; then
      systemctl restart sshd.service
    else
      systemctl restart ssh.service
    fi
  else <<< missing because systemctl is-system-running returns starting
  fi
elif [[ `/sbin/init --version` =~ upstart ]]; then
[...]

When I look trough the executed commands on Ubuntu I see:

 + which systemctl
   + [[ 0 -eq 0 ]]
   ++ systemctl is-system-running
   + [[ starting =~ running ]]
   ++ systemctl is-system-running
   + [[ starting =~ degraded ]]

Here's my take: If a system is in "starting" status as reported by systemd, we shouldn't be arbitrarily recycling system services until the system is started up fully. On a properly functioning system, the starting status should only be set during the system boot phase, and yield to running or degraded based upon the end state of the enabled unit files.

This seems to be a special case when this is run by CloudFormation, where it's running the installer script prior to the system being fully online. It's fine from a perspective of staging changes to be reloaded later on, but I have concerns about kicking a system service to restart while the rest of the system is still coming online (considering the case where we have systemd unit files that explicitly depend on sshd. My first thought was an old terminal style CRM application where the application daemon is set as a user's shell, so the daemon would be in a race condition to have sshd restart itself before systemd tried to start the daemon)

Since this is specific to CloudFormation (and provided that the system is in running state prior to starting, which it is when the system starts accepting console logins), my feeling is that the proper fix would lie in the CloudFormation YAML document to inform CloudFormation that part of the scripted setup is touching the SSH configuration and to therefore restart the service at the end of its execution.

From what I have read from you, it seems that your preferred fix is to invert the logic to have the check against systemd is-system-running to select it's if block in the absence of an explicit failure, as this then covers the scenario when CloudFormation is running the installer.

My suggestion in that case is we should add a condition to the existing if test to catch this desired state. It doesn't need a separate else block, since we want to take the same action if the system is in any of the three states (running, degraded, or starting). I'm pushing a patch to this to my fork right now, and if you want, I can re-open the PR for that branch once it's pushed out from my development machine.

Just for the record: during my testing on Ubuntu AMIs, if you use a console session after boot completes, systemctl is-system-running returns Running. CentOS returns degraded due to an issue in their AMI and one of their enabled services. The if block that searches for these looks for either one due to that. We can just tack the "Starting" status on to that.

would a condrestart help in those cases?

I just merged the PR. Let's see if this fixes the bug and then we can continue to make it better :)

@michaelwittig condrestart would restart the service if it is already running, so it would have the same effect as a restart.

What would be better is a reload (SIGHUP to reload configuration by a running daemon) but OpenSSH re-execs and daemonizes itself on a SIGHUP (which is what gets sent by a reload). So that wouldn't work, since the way it treats a SIGHUP is equivalent to restart.

This should cover the CloudFormation use case.

I'm also going to package together a .deb since I'll need to use that for my company (our setup scripts for AWS pull down deb packages from a private repo to install and configure our apps). Out of curiosity, are you manually packaging the RPM or using an authoring tool like FPM (https://github.com/jordansissel/fpm)?

The rpm is created on a Jenkins of my company using the aws-ec2-ssh.spec by running three commands:

rpmdev-setuptree
spectool --define="jenkins_version ${VERSION}" --define="jenkins_release 1" --define="jenkins_archive v${VERSION}" --define="jenkins_suffix ${VERSION}" -g -R aws-ec2-ssh.spec
rpmbuild --define="jenkins_version ${VERSION}" --define="jenkins_release 1" --define="jenkins_archive v${VERSION}" --define="jenkins_suffix ${VERSION}" -bb aws-ec2-ssh.spec

If you know of a better way to create the package / packages in the future I'm happy to improve the process.

I will close this issue now since ubuntu is back to working. Thanks for you help!