pftf / RPi3

Raspberry Pi 3 UEFI Firmware Images

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

v1.24 unable to boot from SD card

kailiu42 opened this issue · comments

When I upgraded to v1.24, I got this on serial:

Raspberry Pi Bootcode
Read File: config.txt, 399
Read File: start.elf, 2976448 (bytes)
Read File: fixup.dat, 7270 (bytes)
NOTICE:  Booting Trusted Firmware
NOTICE:  BL1: v2.3():v2.3
NOTICE:  BL1: Built : 10:40:46, Apr 21 2020
NOTICE:  rpi3: Detected: Raspberry Pi 3 Model B (1GB, Embest, China) [0x00a22082]
NOTICE:  BL1: Booting BL2
ERROR:   rpi3_sdhost: unknown err, cmd = 0xd
ERROR:   rpi3_sdhost status: 0x10
ERROR:   rpi3_sdhost: unknown err, cmd = 0xd
ERROR:   rpi3_sdhost status: 0x10
ERROR:   rpi3_sdhost: unknown err, cmd = 0xd
ERROR:   rpi3_sdhost status: 0x10
...

The rpi3_sdhost error kept running endlessly and the device is not able to boot. Downgrade to v1.23 immediately fixed it.

This would be indicative of a defective SD card.

status: 0x10 means CRC error which indicates that, while the data was read successfully, its checksum doesn't match the expected value, meaning that it is corrupted.

I don't think any part of the software that's being executed during early platform boot, i.e. the Raspberry Pi Foundation's start.elf and ARM's Trusted Firmware, which is where this actual error comes from and which are both external component we rely on to launch the UEFI firmware, can interfere much with the CRC validation. So if you do get a CRC error, chances are pretty solid that this is a pure hardware issue. SD cards, especially cheap one, are not exactly models of reliability and their flash memory tend to fail a lot sooner than people expect it to...

Having something fail and then work after you re-write some file would also be typical of SD card flash failure, since, depending on semi-random factors, you will usually be exerting different areas of the flash memory, so you might happen to use the flash cells that are failing for one test but not for the other. In other words, a small number of tests on a CRC error isn't really indicative that exists a regression.

I would strongly suggest that you run a comprehensive bad blocks check on the SD card you use, as well as try with a different SD card (preferably new), as I'm not seeing an issue here and have to conclude, due to the nature of the error code, that it most likely has to do with your hardware environment.

With a different SD card the problem indeed is gone.

But the problematic one is not bad. It's a new Samsung EVO Plus 128G, and it passed the badblock non-destructive read-write test with 0 bad blocks.

When I upgraded to v1.24

If you upgraded not only RPI_EFI.fs but other files from release zip too, then try to rollback bootcode.bin, fixup.dat, start.elf to version from 1.23 release. For me it solved issues with booting 1.24 release.

Keep in mind these errors are for the Pi start.elf, not UEFI code. Can't do anything about these. Please report to Pi Foundation.

Actually - never mind. I am wrong. These messages are from TF-A/

It could be a bug in the TF-A SdHost driver, which we still have no (immediate) control over, but it would be interesting to narrow down the specific TF-A commit that caused a regression.

commented

When I upgraded to v1.24

If you upgraded not only RPI_EFI.fs but other files from release zip too, then try to rollback bootcode.bin, fixup.dat, start.elf to version from 1.23 release. For me it solved issues with booting 1.24 release.

Yes, did the rollback for 1.28
edit 1: same files for 1.29

I've been getting this with 1.30 (including after applying the updates from raspberrypi/firmware#1445).

If I power-down the pi for a little while, the error USUALLY goes away and I can boot the next time. But a warm-reboot almost always hits it.

If I get rid of UEFI and use u-boot (i.e., change my CONFIG.TXT appropriately to load u-boot.bin instead of RPI_EFI.fd), it warm reboots perfectly every time. So I don't think it's a bad SD card.

I spent some time comparing and debugging with both u-boot and TF-A. I've got a small patch that fixes this problem for me, though I have very limited ability to test it.

ARM-software/arm-trusted-firmware#1995

The basic idea is that some initial configuration parameters (clock rate, bus width) aren't configured into the hardware before commands start being sent. I suspect that the particular setting that matters is the "slow card" bit, but the initial clock setting also seemed wrong to me.

I'm certainly interested in feedback. I don't quite know how to push for this to be included in ARM's TF-A, but I've at least put it up here so folks can see it.