Boot firmware modifies/writes to FAT filesystem on SD card, breaking dm-integrity
jplarocque opened this issue · comments
Describe the Bug
Hi,
The Raspberry Pi firmware seems to write to the SD card at bootup, modifying the boot FAT filesystem—the one typically mounted at /boot/firmware
, containing the firmware being booted, the kernel image, device tree blobs, initrd image, and configuration files like config.txt
and cmdline.txt
.
To Reproduce
Clone the latest Raspberry Pi firmware (commit 5c83250 in my case):
$ git clone --depth 1 https://github.com/raspberrypi/firmware.git
Run a script I wrote (attached) to generate a minimal SD card image. For convenience, I'm also attaching a compressed output image. The image size is 256 MiB, containing a single partition with just the firmware, and not containing a Linux kernel, initrd, or any root filesystem. Then write the image to an SD or microSD card:
$ sudo ./make_image.sh
[various verbose output omitted]
# I'm in the `disk` group; you may need to wrap with `sudo sh -c '...'`; and
# adjust the destination path as appropriate:
$ cat image > /dev/disk/by-id/usb-TS-RDF8_SD_Transcend_000000079-0\:1
Let's be absolutely certain that the image was written without any corruption:
$ cmp image /dev/disk/by-id/usb-TS-RDF8_SD_Transcend_000000079-0\:1
cmp: EOF on image after byte 268435456, in line 94408
Move the card to a Raspberry Pi and try booting from it. Wait just a few seconds after the activity lights stop blinking; it won't take long, since there's no kernel to load or boot.
Unpower the Pi, and move the card back to your computer. Take an image of the card (limited to just 256 MiB, matching the original image; and may need sudo
):
$ head -c 256MiB /dev/disk/by-id/usb-TS-RDF8_SD_Transcend_000000079-0\:1 > after_rpi3.img
Compare the images:
$ cmp image after_rpi3.img
image after_rpi3.img differ: byte 1049581, line 3
$ diff -u <(hd image) <(hd after_rpi3.img) | dwdiff -u
--- /dev/fd/63 2024-05-21 19:51:57.006203947 -0700
+++ /dev/fd/62 2024-05-21 19:51:57.006203947 -0700
@@ -27,7 +27,7 @@
00100200 52 52 61 41 00 00 00 00 00 00 00 00 00 00 00 00 |RRaA............|
00100210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
001003e0 00 00 00 00 72 72 41 61 76 24 07 00 [-09-] {+0a+} b4 00 00 |....rrAav$......|
001003f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.|
00100400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
$ sudo losetup -o 1MiB --show -rf image
/dev/loop1
$ sudo losetup -o 1MiB --show -rf after_rpi3.img
/dev/loop2
$ diff -u <(minfo -i /dev/loop1 ::) <(minfo -i /dev/loop2 ::)
Could not get geometry of device (Inappropriate ioctl for device)Could not get geometry of device (Inappropriate ioctl for device)--- /dev/fd/63 2024-05-21 19:54:40.478537134 -0700
+++ /dev/fd/62 2024-05-21 19:54:40.478537134 -0700
@@ -1,6 +1,6 @@
device information:
===================
-filename="/dev/loop1"
+filename="/dev/loop2"
sectors per track: 32
heads: 16
cylinders: 1020
@@ -8,7 +8,7 @@
media byte: f8
mformat command line:
- mformat -t 1020 -h 16 -s 32 -r 0 -c 1 -m 248 -i "/dev/loop1" ::
+ mformat -t 1020 -h 16 -s 32 -r 0 -c 1 -m 248 -i "/dev/loop2" ::
bootsector information
======================
@@ -41,4 +41,4 @@
Infosector:
signature=0x41615252
free clusters=468086
-last allocated cluster=46089
+last allocated cluster=46090
$ sudo losetup -d /dev/loop1 /dev/loop2
Expected Behavior
After the above steps, I would expect cmp image after_rpi3.img
to produce no output, and exit with code 0.
Generally, I never expected the early boot process of a Raspberry Pi to write to or modify the contents of its SD card. I thought this could/should only happen at some point after the kernel has loaded.
Actual Behavior
The Raspberry Pi firmware writes to the SD card, modifying the contents of the FAT filesystem from which it has booted. This occurs before the kernel loads.
The purpose of make_image.sh
is to produce an extremely simple cut-down test image, where the buggy behavior matches the behavior I found in my real-world installation. It also demonstrates that the problem is not in the Linux kernel, initrd, or anywhere in userland, since those files are never copied into the image. The script does generate small dummy (all-zeroes) kernel.img
, etc. files in lieu of copying the Linux kernel images. This is because without those files, the firmware doesn't seem to advance far enough in its boot process to reproduce this issue. Giving it kernel files with no usable code is sufficient, though.
System
-
Which model of Raspberry Pi? Reproduced on a Raspberry Pi 1 Model B rev. 2, and on a Raspberry Pi 3 Model B+ rev. 1.3.
-
Which OS and version? Originally found with a Debian Bookworm system (so no
/etc/rpi-issue
), but seems to occur independent of the OS/distribution. -
Which firmware? No
vcgencmd
on Debian, but I found it on a system with Debian'sraspi-firmware
package version 1.20220830+ds-1, as well as with commit 5c83250 in this repo. -
Which kernel version? Originally found with a system reporting
Linux mosaik 6.1.0-21-rpi #1 Debian 6.1.90-1 (2024-05-03) armv6l GNU/Linux
, but seems to occur independent of the kernel version.
Additional Context
Surprising and Undocumented Behavior
I think it's bad for the firmware to write to the SD card in principle. There's no functionality that I'm aware of in this early, pre-kernel boot process that suggests that writing to the card is required, or could ever happen.
I've scanned the documentation in these pages, which seem to cover the boot process the most, and couldn't find this behavior documented:
-
https://www.raspberrypi.com/documentation/computers/configuration.html#the-boot-folder
-
https://www.raspberrypi.com/documentation/computers/config_txt.html
Breaks dm-integrity
This behavior causes problems for me because I'm trying to use dm-integrity for protection against data corruption in /boot/firmware
. I've found microSD cards and the ecosystem surrounding them to be unreliable:
-
Even the good Samsung ones, in various readers. I've had to RMA a brand-new Samsung EVO Select for silent data corruption. Though I haven't tried industrial-grade cards yet.
-
Even those little USB 2.0 microSD readers that CanaKit used to (or still do) ship will occasionally silently corrupt data. I should have reported it, but I left the employer who was buying those kits, and don't have any CanaKit-branded readers to test with again in my personal time.
-
And yes, I experience this with good power supplies, and without unclean shutdowns or unexpected power loss.
While microSD just isn't dependable in my experience, ultimately any storage device of any medium/type/format will some day fail.
For Raspberry Pis I run where availability is important to me, I try to use RAID1-equivalent protection using btrfs or ZFS for the root filesystem, mirroring data with another microSD card in an external USB card reader. Since the Raspberry Pi and its firmware require FAT for the firmware filesystem, I'm trying to use dm-integrity to catch any data corruption on that partition. On top of dm-integrity, I layer on mdraid level 1 with metadata format 1.0 (metadata stored at the end of the device), mirroring the contents onto another microSD card in a USB card reader which is formatted the same way. If I ever have trouble booting the main microSD card, I can just swap them, and then make sure that a RAID scrub fixes the problem on both disks (and also consider whether to replace the first card). And a periodic RAID scrub cronjob should make sure that any corrupted writes or bitrot will eventually be caught and corrected.
The gist of dm-integrity is that it maintains a checksum of every sector, so that it can tell when data has been silently corrupted and present the condition to higher layers as a read error rather than incorrect data. Usually it interleaves checksums with data sectors in the underlying storage medium, but in this case, to allow the firmware which isn't aware of dm-integrity to be able to read the filesystem, I put the checksums and other dm-integrity-specific data on a separate partition (formatting and opening with integritysetup --data-device ...
).
This is why the Raspberry Pi firmware writes to the device causes me grief: it successfully reads the FAT filesystem which is layered on top of mdraid 1.0 and dm-integrity, but the act of changing the filesystem without the awareness to update dm-integrity checksums invalidates those checksums. dm-integrity then reports read errors, as intended, when the system boots up.
Thank you for reading my report. Would you please consider updating the Raspberry Pi firmware so that it no longer writes to the SD card at boot?
Before considering the possibility that the firmware actually may write to SD card, there is a fault in your test procedure. You should repeat the test but without turning on the Raspberry Pi, i.e. remove the card from the PC and reinsert it, then look for differences. You can also repeat this test to see if it changes again or if it is a one-off.
That's a fair point. Here is my result immediately after writing the image:
$ cmp image /dev/disk/by-id/usb-TS-RDF8_SD_Transcend_000000079-0\:1
cmp: EOF on image after byte 268435456, in line 94408
(So the initial 256 MiB of the SD card matches the entirety of the image.)
Here's the result after removing and re-inserting the card:
$ cmp image /dev/disk/by-id/usb-TS-RDF8_SD_Transcend_000000079-0\:1
cmp: EOF on image after byte 268435456, in line 94408
And here's the result after trying to boot my Pi 1 Model B with it:
$ cmp image /dev/disk/by-id/usb-TS-RDF8_SD_Transcend_000000079-0\:1
image /dev/disk/by-id/usb-TS-RDF8_SD_Transcend_000000079-0:1 differ: byte 1049581, line 3
$ cmp after_rpi1.img /dev/disk/by-id/usb-TS-RDF8_SD_Transcend_000000079-0\:1
cmp: EOF on after_rpi1.img after byte 268435456, in line 94409
(Mismatch to the written image, exact match to a prior image I took after reproducing this problem.)
Also, consider that I couldn't reproduce the problem with no kernel*.img files; I had to create bogus ones. If it was wonkiness on my computer causing this, then it would have caused it then too, but it was worthwhile to double check.
As far as repeating the test again in case it was a one-off, I've reliably repeated this test over a dozen times while preparing my report, and even checked and reproduced it on another model of Pi. Can you take a look to see if you can reproduce it on one of your Pis? It should only take a spare SD card that you don't mind overwriting, and a few minutes of work.
Thanks,
-Jean-Paul
To preempt some other concerns that may be raised:
I've reproduced this issue with three models of SD card:
- SanDisk, "SDHC Card", Class 4 mark, "8 GB"
- PNY "SD", "1 GB"
- Onefavor "TF card" "2 GB" (weird little guys I picked up from AliExpress for netbooting), via a Samsung branded "SD Adapter for microSD"
I've reproduced the issue with three models of card reader, getting the exact same results before and after (including the exact same changed byte after booting with the Pi):
- Transcend TS-RDC8K USB 3.1 multi-interface card reader, including double-checking after removing and re-inserting the card and running
cmp
between the image and the card. - PNY microSD card reader, model unknown,
idVendor=0bda, idProduct=0109
, USB product string "USB2.0-CRW", double-checked by removing and re-inserting the card reader with a microSD card still in it, then runningcmp
. - SanDisk SDDR-339 micro SD UHS-II USB 3.0 Reader, double-checked by removing and re-inserting the card reader with a microSD card still in it, then running
cmp
.
The Raspberry Pi 1 Model B rev. 2 that I tested with is powered by a Riden RD6006P bench supply set for 5.25 V and 5 A, through a combination of 1 m 14 AWG copper wire leads and 0.5 m 22 AWG copper wire leads. These connect to pins 4 and 6 of P1 of the Raspberry Pi. (Excessive detail edited out, because it was snarky and unproductive. I apologize for that.)
The Raspberry Pi 3 Model B+ rev. 1.3 that I tested with is powered by a Samsung Travel Adapter model EP-TA10JWE rated for 5.3 V 2.0 A output. A 1.2 m USB Type A to micro B cable is used to connect the Samsung power supply to micro USB connector J1 on the Raspberry Pi, and this cable was marketed as 20 AWG when I bought it. On the cable, the jacket is marked: "20AWG+2C".
Please let me know if there's anything else I can check, or any other information that might be helpful.
To our general surprise, it turns out that you are correct. Although the firmware makes no attempt to write anything to the card, the filesystem layer unconditionally writes back that sector, and because it has a slightly different idea of what the next cluster hint should be you see a one-off change.
I've attached a trial version of the firmware (just start.elf and fixup.dat) that should never write anything to the card. Let me know how you get on with it.
Wonderful, I can confirm that it fixes the issue with my test image on all models I've tested with:
- Raspberry Pi 1 Model B rev. 2
- Raspberry Pi 2 Model B V1.1 (dug up for more testing)
- Raspberry Pi 3 Model B+ rev. 1.3
For the original system installation where I found the problem, I had to disable gpu_mem=16
in config.txt
to get the fix to work, since (if I understand correctly) it was loading start_cd.elf
and fixup_cd.dat
instead of the versions of start.elf
and fixup.dat
that you provided.
Thank you for the fix!
The firmware patch the prevents all writes has been merged, and all future firmware releases will include it.