d3zd3z / zephyr_secure_inference

Confidential AI sample application for Zephyr RTOS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TF-M Confidential AI Project

Confidential AI Architecture Overview

Overview

This Zephyr project provides a complete secure (S) plus non-secure (NS) solution for execution of an inference engine in the secure processing environment, as well as end-to-end processing of inference outputs.

Outputs from the inference engine are encoded as CBOR payloads, with COSE used to enable optional signing and encryption of the data.

Custom secure services are included in the sample in the tfm_secure_inference_partitions folder:

  • TF-M HUK Key Derivation: UUID and key derivation from the HUK
  • TFLM Service: TensorFlow Lite Micro inference engine and model execution

These secure services are added to TF-M as part of the secure build process that takes place before the NS Zephyr application is built, and are available to the NS environment based on the access-rights specified in the service definition files.

Inference Engine(s)

This sample currently uses TensorFlow Lite Micro (TFLM) and microTVM as the inference engines, with a simple sine-wave model.

This will be extended to support more complex AI/ML models soon.

You can interact with the sine wave model from the NS side via the infer shell command.

Key management

Certain operations like signing or encrypting the COSE-encoded inference engine outputs require the use of keys, and X.509 certificates for these keys.

All keys used in this project are derived at startup from the Hardware Unique Key (HUK), meaning that they are device-bound (i.e. explicity tied to a specific instance of an SoC), storage-free (meaning they can't be retrieved by dumping flash memory or firmware analysis), and repeatable across firmware updates.

X.509 certificates generated for these keys are associated with a UUID, which is also derived from the HUK. This derived UUID allows us to uniquely and consistently identify a single SoC or embedded device.

The following EC keys are currently generated:

  • Device Client TLS key (secp256r1)
  • Device COSE SIGN/ENCRYPT (secp256r1)

The non-secure processing environment exposes a keys shell command that can be used to retrieve the public key component of the above private keys, as well as generate a certificate signing request (CSR) for a specific key.

Non-volatile counter

TF-M has a standard NV Counter API, defined here: https://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/platform/include/tfm_plat_nv_counters.h, but which can't be used by application ROT services as it is pre-allocated.

Created custom NV counters support based on protected storage in secure inference, currently, two NV counters of 4bytes in each size for NV counter and NV roll-over counter, avoid frequent read-write to PS, added another two RAM based static local tracker of NV tracker counter and NV rollover tracker counter.

NV counters support is enabled via project Kconfig variable NV_PS_COUNTERS_SUPPORT (default enabled) in the build and another variable NV_COUNTER_TRACKER_THRESHOLD_LIMIT for setting the threshold limit to write back NV tracker counter value to PS (by default threshold limit set as 100).

On every boot (or first use), read both counters from PS, and store them to local tracker counter variables.

Roll over NV counter is incremented whenever the current NV tracker counter value is greater than the UINT32_MAX (which is aligned with NV_COUNTER_TRACKER_THRESHOLD_LIMIT value), reset current NV tracker counter to zero, and write back both tracker counters (NV counter and NV rollover counter) to PS.

NV tracker counter value will be inserted into every COSE/CBOR payload as an additional field along with the inference value.

Required Setup

This sample assumes you have already cloned zephyr locally. You will need to use a specific commit of zephyr to be sure that certain assumptions in this sample are met:

  • 9562e3f794d7e3d4acc305e3a0dd52536a867586

Run these commands to checkout the expected commit hash, and apply a required patch to TF-M, allowing us to enable CPP support in the TF-M build system. This patch also modifies relevant target's flash layout(s) to increase flash allocation for the secure image(s), where required:

$ cd path/to/zephyrproject/zephyr
$ source zephyr-env.sh
$ git checkout 9562e3f794d7e3d4acc305e3a0dd52536a867586
$ west update
$ cd ../modules/tee/tf-m/trusted-firmware-m
$ git apply <sample-path>/patch/tfm.patch

Building and Running

This app is built as a Zephyr application, and can be built with the west command. There are a few config options that need to be set in order for it to build successfully. A sample configuration could be:

Build without networking support:

$ west build -p auto -b mps2_an521_ns -t run

Build with networking support and QEMU user mode for networking:

$ west build -p auto -b mps2_an521_ns -t run -- \
    -DOVERLAY_CONFIG="overlay-smsc911x.conf overlay-network.conf" \
    -DCONFIG_NET_QEMU_USER=y \
    -DCONFIG_BOOTSTRAP_SERVER_HOST=\"hostname.domain.com\"

Note

DCONFIG_BOOTSTRAP_SERVER_HOST should point to the domain name where the bootstrap server is located. This may be a proper domain, or the output of the hostname command, depending on how the bootstrap server was configured. See https://github.com/microbuilder/linaroca for details.

On Target

Refer to :ref:`tfm_ipc` for detailed instructions.

On QEMU:

Refer to :ref:`tfm_ipc` for detailed instructions.

Sample Output

$ west build -t run
-- west build: running target run
[0/25] Performing build step for 'tfm'
ninja: no work to do.
[1/2] To exit from QEMU enter: 'CTRL+a, x'[QEMU] CPU: cortex-m33
char device redirected to /dev/pts/1 (label hostS0)
qemu-system-arm: warning: nic lan9118.0 has no peer
[INF] Beginning TF-M provisioning
[WRN] TFM_DUMMY_PROVISIONING is not suitable for production! This device is NOT SECURE
[Sec Thread] Secure image initializing!
Booting TF-M v1.6.0+8cffe127
[UTVM SERVICE] UTVM initalisation completed
[TFLM SERVICE] TFLM initalisation completed
Creating an empty ITS flash layout.
Creating an empty PS flash layout.
[HUK DERIV SERV] Successfully derived the key for HUK_COSE
[NV PS COUNTERS] nv_ps_counter_tracker 0
[NV PS COUNTERS] nv_ps_counter_rollover_tracker 0
[NV PS COUNTERS] NV_PS_COUNTER_ROLLOVER_MAX 4294967200
[NV PS COUNTERS] NV_COUNTER_TRACKER_THRESHOLD_LIMIT 100
*** Booting Zephyr OS build zephyr-v3.1.0-3390-g9562e3f794d7  ***
[HUK DERIV SERV] Generated UUID: 45b51869-8132-4e15-b780-288d521a5078


<inf> app: Successfully derived the key for HUK_CLIENT_TLS

[    2.631000] <inf> app: Azure: waiting for network...
[    7.141000] <inf> app: Azure: Waiting for provisioning...

After waiting for the "Waiting for provisioning" message, the keys ca 5001 command can be used to query the bootstrap server.

uart:~$ keys ca 5001
argc: 2
[    9.288000] <inf> app: uuid: d74696ad-cb3b-4275-b74a-c346ffe71ea9

Generating X.509 CSR for 'Device Client TLS' key:
Subject: O=Linaro,CN=d74696ad-cb3b-4275-b74a-c346ffe71ea9,OU=Device Client TLS
[HUK DERIV SERV] tfm_huk_hash_sign_csr()::503 Verified ASN.1 tag and length of the payload
[HUK DERIV SERV] tfm_huk_hash_sign_csr()::511 Key id: 0x5001
cert starts at 0x2e2 into buffer
[    9.527000] <inf> app: Got DNS for linaroca
[    9.658000] <inf> app: All data received 595 bytes
[    9.658000] <inf> app: Response to req
[    9.658000] <inf> app: Status OK
[    9.659000] <inf> app: Result: 3
[    9.659000] <inf> app: cert: 460 bytes

         0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
00000000 30 82 01 C8 30 82 01 6F A0 03 02 01 02 02 08 16 0...0..o........
00000010 EB F5 18 21 87 AE 38 30 0A 06 08 2A 86 48 CE 3D ...!..80...*.H.=
...
[    9.725000] <inf> app: provisioned host: davidb-zephyr, port 8883
[    9.725000] <inf> app: our uuid: d74696ad-cb3b-4275-b74a-c346ffe71ea9
[    9.726000] <inf> app: Device Topic: devices/d74696ad-cb3b-4275-b74a-c346ffe71ea9/messages/devicebound/#
[    9.727000] <inf> app: Event Topic: devices/d74696ad-cb3b-4275-b74a-c346ffe71ea9/messages/events/
[    9.727000] <inf> app: Azure hostname: davidb-zephyr.azure-devices.net
[    9.728000] <inf> app: Azure port: 8883
[    9.728000] <inf> app: Azure user: davidb-zephyr.azure-devices.net/d74696ad-cb3b-4275-b74a-c346ffe71ea9
[    9.729000] <inf> app: Azure: Provisioning available

         0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
00000000 30 82 01 C8 30 82 01 6F A0 03 02 01 02 02 08 16 0...0..o........
00000010 EB F5 18 21 87 AE 38 30 0A 06 08 2A 86 48 CE 3D ...!..80...*.H.=
...

Common Problems

Why are my derived keys values and UUID always the same?

TF-M defines a hard-coded HUK value for the mps2 and mps3 platforms, meaning that every instance of this sample run on these platforms will derive the same key values.

This project defines an optional HUK_DERIV_LABEL_EXTRA value in the secure parition that can be used to provide an additional label component for key derivation, enabling key diversity when testing on emulated platforms.

A KConfig wrapper for this variable is also added via the DCONFIG_SECURE_INFER_HUK_DERIV_LABEL_EXTRA config flag to facilitate passing the label from Zephyr's build system up to the TF-M build system.

The label value must be less than 16 characters in size!

It can be defined at compile time with west via:

$ west build -p -b mps2_an521_ns -t run -- \
  -DCONFIG_SECURE_INFER_HUK_DERIV_LABEL_EXTRA=\"123456789012345\"

Compilation fails with ca_crt.txt: No such file or directory

If you are building with networking support, some files from the LITE Bootstrap Server (https://github.com/microbuilder/linaroca) are required to be copied into your sample application so that it can generate X.509 certificates, and communicate with the MQTT Broker that the bootstrap server describes.

Make sure you've run the following scripts in the bootstrap server:

  • setup-ca.sh
  • setup-bootstrap.sh

And then copy the following files:

<bootstrap>/certs/bootstrap_crt.txt -> src/bootstrap_crt.txt
<bootstrap>/certs/bootstrap_key.txt -> src/bootstrap_key.txt
<bootstrap>/certs/ca_crt.txt        -> src/ca_crt.txt

Before running this sample, be sure that you also execute the run-server.sh script to start the LITE bootstrap server.

If everything is configured correctly you can run the keys ca 5001 shell command to get an X.509 certificate for the client TLS key:

uart:~$ keys ca 5001
argc: 2
[00:00:25.904,000] <inf> app: uuid: d74696ad-cb3b-4275-b74a-c346ffe71ea9

Generating X.509 CSR for 'Device Client TLS' key:
Subject: O=Linaro,CN=d74696ad-cb3b-4275-b74a-c346ffe71ea9,OU=Device Client TLS
[HUK DERIV SERV] Verified ASN.1 tag and length of the payload
[HUK DERIV SERV] Key id: 0x5001
cert starts at 0x2e2 into buffer
[00:00:26.787,000] <inf> app: Got DNS for linaroca
[00:00:27.346,000] <inf> app: All data received 591 bytes
[00:00:27.346,000] <inf> app: Response to req
[00:00:27.347,000] <inf> app: Status OK
[00:00:27.348,000] <inf> app: Result: 3
[00:00:27.349,000] <inf> app: cert: 461 bytes
...
[00:00:27.403,000] <inf> app: Request result: 390
[00:00:27.408,000] <inf> app: Close: 0

And you should see the following log message for the bootstrap server:

$ ./run-server.sh
Using config file: /Users/xyz/linaroca/.linaroca.toml
Starting mTLS TCP server on MBP2021.lan:8443
Starting CA server on https://MBP2021.lan:1443
2022/05/23 12:47:07 Received CSR: CN=d74696ad-cb3b-4275-b74a-c346ffe71ea9,OU=Device Client TLS,O=Linaro

How to disable TrustZone on the B-U585I-IOT02A?

If you have flashed a sample to the B-U585I-IOT02A board that enables TrustZone, you will need to disable it before you can flash and run a new non-TrustZone sample on the board.

To disable TrustZone on the B-U585I-IOT02A board, i.e. set TZEN bit from 1 to 0 in the User Configuration register, it's necessary to change AT THE SAME TIME the TZEN and the RDP bits.

Hence, TZEN needs to get set from 1 to 0 and RDP, AT THE SAME TIME, needs to get set from DC to AA (step 3 below).

This is docummented in the AN5347, in section 9, "TrustZone deactivation".

However it happens that the RDP bit is probably not set to DC yet, so first you need to set it to DC (step 2).

Finally you need to set the "Write Protection 1 & 2" bytes properly, otherwise some memory regions won't be erasable and mass erase will fail (step 4).

The following command sequence will fully deactivate TZ:

Step 1:

Ensure U23 BOOT0 switch is set to 1 (switch is on the left, assuming you read "BOOT0" silkscreen label from left to right). You need to press "Reset" (B2 RST switch) after changing the switch to make the change effective.

Step 2:

$ ./STM32_Programmer_CLI -c port=/dev/ttyACM0 -ob rdp=0xDC

Step 3:

$ ./STM32_Programmer_CLI -c port=/dev/ttyACM0 -tzenreg

Step 4:

$ ./STM32_Programmer_CLI -c port=/dev/ttyACM0 -ob wrp1a_pstrt=0x7f
$ ./STM32_Programmer_CLI -c port=/dev/ttyACM0 -ob wrp1a_pend=0x0
$ ./STM32_Programmer_CLI -c port=/dev/ttyACM0 -ob wrp1b_pstrt=0x7f
$ ./STM32_Programmer_CLI -c port=/dev/ttyACM0 -ob wrp1b_pend=0x0
$ ./STM32_Programmer_CLI -c port=/dev/ttyACM0 -ob wrp2a_pstrt=0x7f
$ ./STM32_Programmer_CLI -c port=/dev/ttyACM0 -ob wrp2a_pend=0x0
$ ./STM32_Programmer_CLI -c port=/dev/ttyACM0 -ob wrp2b_pstrt=0x7f
$ ./STM32_Programmer_CLI -c port=/dev/ttyACM0 -ob wrp2b_pend=0x0

Integration test using ZTest and twister

You can find the integration tests under path/to/zephyr_secure_inference/tests and follows below directory structure:

tests

└───test_service
└───tfm_huk_deriv_srv
    │─── src
    │─── CMakeLists.tx
    │─── prj.conf
    └─── testcase.yaml

Building and Running the test on QEMU

Run all the tests using the command:

$ cd path/to/zephyr
$ source zephyr-env.sh
$ twister -p mps2_an521_ns -N --inline-logs \
   -T path/to/modules/outoftree/zephyr_secure_inference/tests

For example to run a specific HUK key derivation service test using the command:

$ twister -p mps2_an521_ns -N --inline-logs \
  -T modules/outoftree/zephyr_secure_inference/tests/tfm_sp/tfm_huk_deriv_srv/

Sample test execution logs

Install the anytree module to use the --test-tree option
Renaming output directory to /home/arm/projects/zephyrproject/zephyr/twister-out.1
INFO    - Using Ninja..
INFO    - Zephyr version: zephyr-v3.1.0-2265-g62f19cc6b3d4
INFO    - Using 'zephyr' toolchain.
INFO    - Selecting default platforms per test case
INFO    - Building initial testsuite list...
INFO    - Writing JSON report /home/arm/projects/zephyrproject/zephyr/twister-out/testplan.json
INFO    - JOBS: 8
INFO    - 1 test scenarios (1 configurations) selected, 0 configurations discarded due to filters.
INFO    - Adding tasks to the queue...
INFO    - Added initial list of jobs to queue
INFO    - Total complete:    1/   1  100%  skipped:    0, failed:    0
INFO    - 1 of 1 test configurations passed (100.00%), 0 failed, 0 skipped with 0 warnings in 75.06 seconds
INFO    - In total 2 test cases were executed, 0 skipped on 1 out of total 473 platforms (0.21%)
INFO    - 1 test configurations executed on platforms, 0 test configurations were only built.
INFO    - Saving reports...
INFO    - Writing JSON report /home/arm/projects/zephyrproject/zephyr/twister-out/twister.json
INFO    - Writing xunit report /home/arm/projects/zephyrproject/zephyr/twister-out/twister.xml...
INFO    - Writing xunit report /home/arm/projects/zephyrproject/zephyr/twister-out/twister_report.xml...
INFO    - Run completed

About

Confidential AI sample application for Zephyr RTOS

License:Apache License 2.0


Languages

Language:C++ 83.1%Language:C 15.7%Language:CMake 0.6%Language:Python 0.3%Language:Makefile 0.3%