This is an unfinished attempt at implementing PUF (Physically Unclonable Function) for TwPM project.
This repository contains code integrating RO (Ring Oscillator) PUF from stnolting/fpga_puf and custom arbiter PUF implementation.
The implementation was tested on Lattice ECP5 (OrangeCrab) using both open-source tools (Yosys+nextpnr) and proprietary tools (Lattice Diamond).
RO PUF did not work as we expected, raw readings from PUF were very unstable, making it impossible to reliably generate a stable and unique data. This issue could be worked-around by using placement constraints in Lattice Diamond to separate PUF from the rest of design, giving much better results, however this could not be reproduced with Yosys and nextpnr.
Without placement constraints PUF is surrounded by other components, which could interfere with ring oscillators.
This thesis may be supported by the fact that adding placement constraints:
UGROUP "PUF" BBOX 10 10 BLKNAME cpu/neorv32_top_inst/io_system.neorv32_cfs_inst_true.neorv32_cfs_inst/fpga_puf_inst;
LOCATE UGROUP "PUF" SITE "R2C2D" ;
so that PUF is separated from the rest of the design
gives much better results and we can generate a stable ID:
Select test mode:
a - run quick test (execute 8 runs with 4096 ID samples each).
b - compute hamming distance of raw IDs
starting test...
Run 0 ID: 0x1000000080148642254A0000
Run 1 ID: 0x1000000080148642254A0000
Run 2 ID: 0x1000000080148642254A0000
Run 3 ID: 0x1000000080148642254A0000
Run 4 ID: 0x1000000080148642254A0000
Run 5 ID: 0x1000000080148642254A0000
Run 6 ID: 0x1000000080148642254A0000
Run 7 ID: 0x1000000080148642254A0000
The ID is mostly reproducible. Running for a longer time causes some small errors probably caused by rising temperature, however those errors could be corrected by ECC.
However, most of bits are still unstable, out of 96 bits only 15 bits are stable and the number is slowly decreasing over time, being 13 after 2,5k runs:
Run 2531: I=0x200004f04aa1280d0828a806, A=0x1a40010d111487724028aa12, B=0x649804c04820480801286026 - F=0xfff9e5ffdbfdef7fefbfff7f (83) - H(A,B)=44, H_max(A,B)=57 - H(I,A)=42, H_max(I,A)=55
Run 2532: I=0x200004f04aa1280d0828a806, A=0x1a40a0404aa0280d0828a806, B=0x6000a0400000405a43a25564 - F=0xfff9e5ffdbfdef7fefbfff7f (83) - H(A,B)=36, H_max(A,B)=57 - H(I,A)=12, H_max(I,A)=55
Run 2533: I=0x200004f04aa1280d0828a806, A=0x2000a0011014877243821568, B=0x1a40a0404aa0280d0828a806 - F=0xfff9e5ffdbfdef7fefbfff7f (83) - H(A,B)=47, H_max(A,B)=57 - H(I,A)=49, H_max(I,A)=55
Run 2534: I=0x200004f04aa1280d0828a806, A=0x649804c04020480a01286126, B=0x3000a0011014877247821560 - F=0xfff9e5ffdbfdef7fefbfff7f (83) - H(A,B)=41, H_max(A,B)=57 - H(I,A)=23, H_max(I,A)=55
Run 2535: I=0x200004f04aa1280d0828a806, A=0x300084d04aa0280d0028a806, B=0x6080a0410004465a03a25564 - F=0xfff9e5ffdbfdef7fefbfff7f (83) - H(A,B)=39, H_max(A,B)=57 - H(I,A)=5, H_max(I,A)=55
Run 2536: I=0x200004f04aa1280d0828a806, A=0x649884c00820480a01224124, B=0x380020051014876247821564 - F=0xfff9e5ffdbfdef7fefbfff7f (83) - H(A,B)=37, H_max(A,B)=57 - H(I,A)=28, H_max(I,A)=55
Run 2537: I=0x200004f04aa1280d0828a806, A=0x449804c04820480801286026, B=0x2000a0011014877247821548 - F=0xfff9e5ffdbfdef7fefbfff7f (83) - H(A,B)=46, H_max(A,B)=57 - H(I,A)=21, H_max(I,A)=55
Note: the parameters shown in each line are:
- I - initial reading from PUF
- A and B - two consecutive readings from PUF
- F - mask of noisy bits (and count of noisy bits in parenthesis)
- H(A,B) - hamming distance between A and B
- H(I,A) - hamming distance between I and A
Also, please refer to https://github.com/stnolting/fpga_puf for description.
Overall results are still bad when compared to results described in stnolting/fpga_puf README.md, however it's much better than without constraints, when there were no stable bits.
I tried to reproduce current results on open-source before going further.
Nexpnr doesn't support UGROUP
directive - but supports
BEL attribute.
There is also an largely undocumented feature, which allows to constrain cells
to specific region without giving absolute constraints for each cell. This can
be achieved passing custom python script (see constrain_puf.py)
as --pre-pack
argument.
But, despite using both script-based and BEL attribute constraints we seen no
improvements with nextpnr and all PUF bits were noisy. By manually analysing
output of nextpnr (.json
file) I saw that PUF was tightly surrounded by other
components. Unfortunatelly I couldn't get nextpnr's floorplan view to work
properly. I tried to constrain other components in order to make some distance
between PUF and the rest of the system, however I kept running into issues,
routing/placement failures, crashes, freezes.
Other attempts involved tuning PUF by changing sampling frequency and increasing
number of NOT gates which affect oscillator frequency. The second approach
turned to be problematic as yosys seems to ignore (* dont_touch = "yes" *)
so
additional gates were remove. The problem can be worked around by using
device-specific primitives directly. Tuning these parameters had a minimal
effect on PUF stability.
Eventually, I gave up with, stnolting's design and tried custom with arbiter PUF implementation.
Arbiter PUF is mostly untested. Nextpnr freezes when routing design with arbiter PUF enabled, and if left running for hours it consumes all available memory. Diamond can still build the design without noticeable time differences.
Nextpnr freezes after printing device utilization
Info: Device utilisation:
Info: TRELLIS_IO: 7/ 197 3%
Info: DCCA: 1/ 56 1%
Info: DP16KD: 34/ 56 60%
Info: MULT18X18D: 0/ 28 0%
Info: ALU54B: 0/ 14 0%
Info: EHXPLLL: 0/ 2 0%
Info: EXTREFB: 0/ 1 0%
Info: DCUA: 0/ 1 0%
Info: PCSCLKDIV: 0/ 2 0%
Info: IOLOGIC: 0/ 128 0%
Info: SIOLOGIC: 0/ 69 0%
Info: GSR: 0/ 1 0%
Info: JTAGG: 0/ 1 0%
Info: OSCG: 0/ 1 0%
Info: SEDGA: 0/ 1 0%
Info: DTR: 0/ 1 0%
Info: USRMCLK: 0/ 1 0%
Info: CLKDIVF: 0/ 4 0%
Info: ECLKSYNCB: 0/ 10 0%
Info: DLLDELD: 0/ 8 0%
Info: DDRDLL: 0/ 4 0%
Info: DQSBUFM: 0/ 8 0%
Info: TRELLIS_ECLKBUF: 0/ 8 0%
Info: ECLKBRIDGECS: 0/ 2 0%
Info: DCSC: 0/ 2 0%
Info: TRELLIS_FF: 1656/24288 6%
Info: TRELLIS_COMB: 7963/24288 32%
Info: TRELLIS_RAMW: 58/ 3036 1%
Clock '$glbnet$clk_i$TRELLIS_IO_IN' can be driven by:
clk_i$tr_io.O delay 0.000ns
Clock '$glbnet$clk_i$TRELLIS_IO_IN' can be driven by:
clk_i$tr_io.O delay 0.000ns
Little tests have been done with Diamond-built design. There is no test
application and some basic tests were done from Neorv32 BootROM. Upstream
BootROM does not have memory r/w commands, but those can be added by
cherry-picking 8ad745efde1545e5f4241f9173e601e3e021717a
from
https://github.com/stnolting/neorv32.
After hardware power-on PUF is disabled. To generate response, first, one should write challenge and trigger PUF. PUF should be again disabled before writing another challenge.
When built with arbiter PUF, registers should be available at 0xf0001800 - 0xf000180c
.
First register is control register:
CMD:> R
Address (8 hex): f0001800
0xf0001800 0x00000002
32-bit challenge is written to the register at 0xf0001804:
CMD:> W
Address (8 hex): f0001804
Value (8 hex): 1465dda1
Writing 0x1465dda1 to address 0xf0001804
PUF is triggered by setting bit 0 in 0xf0001800:
CMD:> W
Address (8 hex): f0001800
Value (8 hex): 00000001
Writing 0x00000001 to address 0xf0001800
Response is available in 0xf0001808:
CMD:> R
Address (8 hex): f0001808
0xf0001808 0xfffffffe
PUF typically responds with values:
0xfffffffe
0xffffffff
0x00000001
0x00000000
which are not valid. The issue is probably caused by some optimizations done
Diamond. I tried keep_hierarchy
, keep
and dont_touch
attributes, but they
didn't provide the results I expected. I tried also using ECP5 directives
directly for switches and SR latch, in fact, I had to build SR latch
using primitives to prevent Diamond from completely optimizing-away PUF.
Arbiter PUF was meant to be a simpler and temporary replacement for RO PUF. Long term plan was to replace arbiter PUF with something more secure - RO or even something else. But, since it turned to be problematic - due to Diamond's problems and even more so due to nextpnr's problems, we abandon the plan and implementation is left in non-working state.
Nix can be used to quickly initialize build environment. The most simple way is to type
nix develop
while in repo root directory. From Nix shell run make
to build project.
By default open-source toolchain is used and arbiter PUF is disabled. Toolchain
can be changed by setting TOOLCHAIN=diamond
in make
invocation. To enable
arbiter PUF set ARB_PUF=yes
.
Note: Diamond is not included in default Nix shell: use
nix develop .#with-diamond
.
See https://github.com/Dasharo/TwPM_toplevel README for details on how to configure environment.
Note: application must be loaded using UART. See TwPM README for details.
If using Diamond, you can instead set
INT_BOOTLOADER_EN
to false in neorv32_wrapper.vhd so the CPU will boot directly into PUF test application.