AWS IoT Jobs and Mender integration demo
This demo show show to integrate AWS IoT Device Management Jobs with the mender client in order to perform safe system OTA upgrades.
Architecture
The following diagram depict the architecture of the solution we are going to build.
Process
The process that the system we are building implements is the following:
The steps in the blue boxes are performed in the cloud, the red boxes on the device.
Prerequisites
In order to build and run this demo, you need the following:
- A Linux enviroment
- A Raspberry Pi board. This demo has been tested with a Raspberry Pi 3 B+ board, but any other board supported by the mender tool would do. If using other boards, you might need to adapt the cross compiling option for Go and change some settings in the
mender-env.sh
file - An SD card with at least 8Gb of space
- A computer with an SD Card reader
- The Etcher tool to write the image to the SD Card. Any other tool you are familiar with would also do, including
dd
. - The Golang tools for the platform on which you will develop. Ensure you have go version 1.12 or above (
go version
).
In this document I am assuming you'll be using an AWS Cloud9 environment. If nevertheless you are running this in another environemnt I'll assume you know what you are doing and will be able to adapt the commands as needed.
Create an AWS Cloud9 instance
For this demo, I recommend to use an AWS Cloud9 instance to install and run the tools, especially if you have a Windows laptop.
Create a new AWS Cloud9 instance using the console.
- Click on this link to open the console
- Enter a name and click
Next step
- Leave all settings as-is but change Platform to Ubuntu
- Click
Next step
- Click
Create Environment
Wait for the instance to be initialized.
Resize the EBS volume
Cloud9 comes with a default 8Gib volume, which is too small for building a full Raspbberry Pi image.
Once the instance is up and running, select the top folder in the explorer on the left and then File | New File. Name the file resize.sh
and copy the following script:
#!/bin/bash
# Specify the desired volume size in GiB as a command-line argument. If not specified, default to 20 GiB.
SIZE=${1:-20}
# Install the jq command-line JSON processor.
sudo apt install -y jq
# Get the ID of the envrionment host Amazon EC2 instance.
INSTANCEID=$(curl http://169.254.169.254/latest/meta-data//instance-id)
# Get the ID of the Amazon EBS volume associated with the instance.
VOLUMEID=$(aws ec2 describe-instances --instance-id $INSTANCEID | jq -r .Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.VolumeId)
# Resize the EBS volume.
aws ec2 modify-volume --volume-id $VOLUMEID --size $SIZE
# Wait for the resize to finish.
while [ "$(aws ec2 describe-volumes-modifications --volume-id $VOLUMEID --filters Name=modification-state,Values="optimizing","completed" | jq '.VolumesModifications | length')" != "1" ]; do
sleep 1
done
# Rewrite the partition table so that the partition takes up all the space that it can.
sudo growpart /dev/xvda 1
# Expand the size of the file system.
sudo resize2fs /dev/xvda1
Save the file. In the terminal window at the bottom of the IDE, execute the following:
chmod +x resize.sh
sudo ./resize.sh
Build the image and the mender artifact
Install mender-convert
mender-convert
is a tool provided by the mender.io project.
You can read more about mender convert tool https://github.com/mendersoftware/mender-convert, and you can get an overview on how to use it by following this blog post https://hub.mender.io/t/raspberry-pi-3-model-b-b-raspbian/140.
This project has been built for v2.2.0.
To install mender do the following:
cd ~/environment
git clone -b 2.2.0 https://github.com/mendersoftware/mender-convert.git
cd mender-convert
./docker-build
NOTE: if you get a warning about Could not get lock...
, wait another 60sec os so. There are some background scripts finalizing the configuration of the AWS Cloud9 instance.
Obtaining and building the goagent
To build the goagent
run:
cd ~/environment
git clone https://github.com/aws-samples/aws-iot-jobs-full-system-update
cd aws-iot-jobs-full-system-update/files
env GOOS=linux GOARCH=arm GOARM=7 go build ../goagent.go
mkdir -p overlay_root_fs/usr/sbin
install -m 755 goagent overlay_root_fs/usr/sbin/goagent
Raspbian
Download the Raspbian image and extract it:
cd ..
mkdir -p input
cd input
wget http://downloads.raspberrypi.org/raspbian_lite/images/raspbian_lite-2019-04-09/2019-04-08-raspbian-stretch-lite.zip
unzip raspbian_lite-2019-04-09/2019-04-08-raspbian-stretch-lite.zip
Modify the Wifi configuration
Open files/overlay_root_fs/etc/wpa_supplicant/wpa_supplicant.conf
and provide the values for <SSID>
and <SECRET>
.
Enable the connection to AWS IoT
The goagent connects to AWS IoT over MQTT in order to receive the commands sent by the AWS IoT Jobs service. MQTT connections are encrypted via TLS and secured via mutual TLS authentication. For this we need to have a private key and a device certificate to identify and authenticate the device. In a real production environment the private key would be generated in a secure way on the actual device and be accessible to the agent, for example using a secure element. The device certificate would then be obtained by issuing a CSR (certificate signing request) that would be signed by a CA recognized by AWS IoT. You can find an implementation of this process in the following repo https://github.com/aws-samples/iot-provisioning-secretfree.
For this demonstration, we are going to generate the private key and the certificate using the AWS IoT console and then transfer them to the image via the mender-convert
tool.
- Go to the AWS IoT Console
- Click on Create a single thing
- Enter a Name, eg "rpi-mender-demo" and click Next at the bottom of the page
- Click on Create certificate
- Download the certificate for this thing and the private key.
- Click on Activate
- Click on Done
Select Secure | Policies and click on Create (or click on this quick link)
Enter a name for the policy, eg "agent-policy" and click on Advanced mode.
In the editor, delete all the text and copy paste the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iot:*",
"Resource": "*"
}
]
}
Click on Create
Select the Manage | Things menu, click on the thing you just created (eg "rpi-mender-demo"), and then on Security. Click on Actions|Attach Policy and select the policy you created in the previous step.
Transfer the certificates
Back in the Cloud9 instance, select the aws-iot-jobs-full-system-update/files/overlay_root_fs/etc/goagent
folder in the navigation pane on the left. Then, select File in the top menu bar and Upload local files.... Select the certificate and private key you downloaded before or drag&drop them on the dialog box.
Close the dialog box.
Using the file explorer or the terminal, rename the files to cert.pem
and private.key
respectively.
For the mutual TLS authentication to work we also need the server certificate. Run the following command in the terminal window to save the server certificate locally.
cd ~/environment/aws-iot-jobs-full-system-update/files/overlay_root_fs/etc/goagent
curl -o rootCA.pem https://www.amazontrust.com/repository/AmazonRootCA1.pem
Update the goagent configuration file
goagent
uses a configuration file to get the parameters needed to connect to AWS IoT. The file can be found in the aws-iot-jobs-full-system-update/files/overlay_root_fs/etc/goagent
folder.
Open the file in the Cloud9 editor and provide the following information:
endpoint
- it can be found herethingId
- the name of the Thing you have createdclientId
- use the same name as for the thingId
Save the modifications
Build the SD card image
Now we have all the necessary bits and pieces to build the image. Run the following to generate the image and the mender artifact
cd ~/environment/mender-convert
rm -rf overlay_root_fs
sudo cp -R ~/environment/aws-iot-jobs-full-system-update/files/overlay_root_fs overlay_root_fs
sudo chown -R root.root overlay_root_fs
MENDER_ARTIFACT_NAME=release-1 MENDER_ENABLE_SYSTEMD=n ./docker-mender-convert \
--disk-image input/2019-04-08-raspbian-stretch-lite.img \
--config configs/raspberrypi3_config \
--overlay ./overlay_root_fs
NOTE:
MENDER_ENABLE_SYSTEMD=n
disablesmender-client
service as we want to use the client in standalone mode.
Transfer the image
Once the above process is finished you'll end up with two relevant files in the mender-convert/deploy
folder:
- an
img.gz
file - this is the full image that need to be transferred to the SD card - a
mender
file - this is the mender artifact which is used by the mender client to upgrade the system
The img file must be transferred to your local machine and copied onto the SD card. To do this navigate to the deploy
folder in the explorer tab on the left, right-click on the img file and click on Download. Depending on your internet connection it might take some time.
The mender artifact needs to be copied to an S3 bucket where it can later be accessed by the mender client via a pre-signed URL.
Let's create the bucket. You can either use the Amazon S3 console or the aws cli in the Cloud9 terminal:
aws s3 mb s3://<bucket name>
Once you have created the bucket, copy the mender file to it.
aws s3 cp ~/environment/mender-convert/deploy/2019-04-08-raspbian-stretch-lite-raspberrypi3-mender.mender s3://<bucket name>
This bucket and the file are private and cannot be accessed by unauthorized parties.
Verify the system
Once the image has finished downloading, you copy it to the SD card. You can use Etcher or any other command you are familiar with on your system.
Insert the SD card in your Raspberry Pi and connect the power. It is advisable to have a monitor connected to the Raspberry Pi to check that the boot process executes correctly and to capture the IP address assigned to the Pi depending on the chosen network connection.
If everything is fine you should be able to login to the Raspberry Pi using the default username and password (pi/raspberry
).
NOTE: in a production setup you would likely disable SSH or at a minimum use stronger passwords or public keys. An more secure method would be to use AWS IoT Secure Tunneling which can be integrated with the job agent.
You can then verify that the agent is running by executing:
systemctl status goagent
The result should look like the following:
AWS IoT Jobs
It is now time to test the agent. As we are going to use URL signing, we first have to create a Role that AWS IoT Device Management can assume to create the pre-signed URL for the mender artifact.
Create a Job Document
First we have to create a Job Document and upload it to an S3 bucket.
In the Cloud9 IDE, create a new file in the top folder with the following content. Let's call it menderjob.json
. Replace <BUCKET> with the name of the bucket where you placed the mender artifact.
{
"operation": "mender_install",
"url": "${aws:iot:s3-presigned-url:https://s3.amazonaws.com/<BUCKET>/2019-04-08-raspbian-stretch-lite.mender}"
}
Copy the file to the S3 bucket with the following command. For BUCKET we are going to use the same bucket we created for storing the mender artifact:
aws s3 cp ~/environment/menderjob.json s3://<BUCKET>
Create an AWS IoT Job
It is time now to create AWS IoT Job containing the information necessary for the de Go the AWS IoT Console and select Manage|Jobs. Click on Create Job. You can also use this quick link.
Select Create a custom job, then enter a Job ID, for example "mender-update-1".
-
Under Select devices to update select the Thing you have created earlier.
-
Under Add a job file select the bucket and then the
menderjob.json
file. -
Under Pre-sign resource URLs select I want to... and select Create Role. Enter a name and then click on Create role. In URL will expire at this time select 1 hour.
Leave the rest as-is and click Next and then click Create.
Monitoring the progress
AWS IoT Jobs has a rather coarse reporting for job progress avaiable to other subscriber than the device itself.
To overcome this limitiation, the goagent publishes the output of the mender command to a dedicated topic, so that a montioring application can easily follow the progress.
In the AWS IoT Console, select Test and subscribe to mender/#
.
You should start seeing messages like the following appearing:
{
"progress": "................................ 5% 12288 KiB",
"ts": 1574763274
}
How does this work?
The goagent has received the new job, and accepted it by reporting back to the AWS Job service an IN_PROGRESS status. It also reports back the current step of the installation progress, in this case "downloading". This information is not available in the console but can be queried via the API prior knowing the jobId and the thingName.
At this stage the mender client is downloading the artifact from S3 via the pre-signed URL and copying it to the inactive partition (mender -install <s3 presigned url>
)
Once the installation is completed and the mender client exits, the goagent reports back to the AWS Jobs service a status of IN_PROGRESS with step "rebooting" and issues a reboot command.
When the system comes up again, the goagent will retrieve the current job as pending. Since the stage is "rebooting" it determines that the Raspberry has rebooted and commit the update using the mender client (mender -commit
), and reports back a successful job. If the commit command fails it means that the system has rebooted to the old partition, and the goagent issues a rollback command (mender -rollback
) and reports back a failed job.
If the network connection is interrupted during the download or the device reboots for any other reason before the update is completed, the goagent invokes the mender install
command which in turn download the firmware again.
It is also possible to add a counter to the job reported state to keep track of how many time a download has been attempted and fail the job after N attempts.
Troubleshooting
The goagent is not active
On the Raspberry Pi
sudo journalctl -u goagent
to check the logs
The job never shows completed
It might take some time for the mender client to finish the upgrade even after the logs have been showing 99% completion. Be patient. If you have an ssh connection to the Raspberry Pi open during the upgrade, it will disconnect on reboot, indicating that the mender command has successfully terminated. If after this the job still does not show completed, ssh to the Raspberry Pi and check if the goagent is running
systemctl status goagent
Improvements
If you feel brave enough feel free to take this code and:
- Add status reporting (startup time, heartbeat, local IP address) to using thing shadow
- Perform graceful shutdown of other components running on the system before rebooting
- Perform additional system verifications upon reboot before committing
License
This project is licensed under the Apache-2.0 License.