linux-nvme / nvme-cli

NVMe management command line interface.

Home Page:https://nvmexpress.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Samsung SSD 980 1TB: can't download firmware

bam80 opened this issue · comments

I'm trying to upgrade the firmware of my Samsung SSD 980 1TB:

$ sudo nvme fw-download -f 3B4QFXO7.enc /dev/nvme0
fw-download: error on offset 0x00000000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 0 (1/3)
fw-download: error on offset 0x00000000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 0 (2/3)
fw-download: error on offset 0x00000000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)

nvme-2.8, Ubuntu 24.04

This looks like the device rejects the firmware update. Does it support it and are you sure the firmware you got is the right one?

Absolutely.

I booted the original .iso this firmware came from:
https://download.semiconductor.samsung.com/resources/software-resources/Samsung_SSD_980_3B4QFXO7.iso

and the update went flawlessly with their tool (not sure if they use nvme-cli at all).

Here are the nvme-cli steps I tried unsuccessfully:
https://wiki.archlinux.org/title/Solid_state_drive/NVMe#Samsung

This looks like they are using a different (proprietary) API to update the firmware. The generic nvme firmware update code works for many other devices. And the error you got is basically saying, this operation is not supported by the firmware. Sorry, not much we can do.

But the link above states nvme-cli method should work for Samsung, too:

Instead of using the manufacturer's program you might prefer to use nvme-cli and upload the firmware manually as explained in the previous section:

There are many different firmwares from every manufacture. Not all have the same features. And just because a wiki says something it doesn't make it true.

Okay I bite the bullet, what does the command print out if you enable verbose logging, -vvv?

$ sudo nvme fw-download -f 3B4QFXO7.enc /dev/nvme0 -vvv
opcode       : 11
flags        : 00
rsvd1        : 0000
nsid         : 00000000
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00001000
metadata_len : 00000000
addr         : 78e23e600000
metadata     : 0
cdw10        : 000003ff
cdw11        : 00000000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : 2
latency      : 9139 us
fw-download: error on offset 0x00000000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)

This is the download command as specified in 5.13 Firmware Image Download command.

The chunk size (aka 'Firmware Update Granularity (FWUG)') is 4Kb (the min possible). And if we look at cdw10 it is 4Kb (0 based number, sic). So this this matches with the spec.

The only thing off here seems the data_len (NUMD). This field is also 0 based. This looks wrong indeed. I'd expected 0x3ff as well.

@jk-ozlabs Do you agree with my analysis?

@igaw The NUMD field is cdw10, not data_len. I'm not sure how data_len is used in the ioctl transport, but we do expect a (1-based) number of bytes for the MI transport here.

In terms of what may be causing the error: it's possible that NUMD doesn't comply with the required FWUG (which is reported in the controller identify data; maybe it's requiring chunks that match a larger internal flash erase-block size...).

@bam80, can you try a:

sudo nvme id-ctrl /dev/nvme0 | grep fwug

and paste the output?

You can try increasing the --xfer size. All devices are supposed to support the default 4k, but maybe this one just wants the whole thing in a single payload.

@keithbusch if this is a chunk size issue, might be nice for us to query the fwug and set the chunk size automatically...

$ sudo nvme id-ctrl /dev/nvme0 | grep fwug
fwug      : 4

might be nice for us to query the fwug and set the chunk size automatically...

Oh, good call, that's exactly what we should do.

Meantime, could retry --xfer=0x4000 parameter?

Meantime, could retry --xfer=0x4000 parameter?

That made the difference - now I get much longer list of (successfully downloaded?) chunks but still at the end I get similar error:

$ sudo nvme fw-download --xfer=0x4000 -f Downloads/initrd\ \(2\)/root/fumagician/3B4QFXO7.enc /dev/nvme0 -vvv
...
opcode       : 11
flags        : 00
rsvd1        : 0000
nsid         : 00000000
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00004000
metadata_len : 00000000
addr         : 74bf4db88000
metadata     : 0
cdw10        : 00000fff
cdw11        : 00062000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : 0
latency      : 3403 us
opcode       : 11
flags        : 00
rsvd1        : 0000
nsid         : 00000000
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00004000
metadata_len : 00000000
addr         : 74bf4db8c000
metadata     : 0
cdw10        : 00000fff
cdw11        : 00063000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : 0
latency      : 7241 us
opcode       : 11
flags        : 00
rsvd1        : 0000
nsid         : 00000000
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00003690
metadata_len : 00000000
addr         : 74bf4db90000
metadata     : 0
cdw10        : 00000da3
cdw11        : 00064000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : 2
latency      : 4700 us
fw-download: error on offset 0x00190000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 190000 (1/3)
opcode       : 11
flags        : 00
rsvd1        : 0000
nsid         : 00000000
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00003690
metadata_len : 00000000
addr         : 74bf4db90000
metadata     : 0
cdw10        : 00000da3
cdw11        : 00064000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : 2
latency      : 1001 us
fw-download: error on offset 0x00190000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 190000 (2/3)
opcode       : 11
flags        : 00
rsvd1        : 0000
nsid         : 00000000
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00003690
metadata_len : 00000000
addr         : 74bf4db90000
metadata     : 0
cdw10        : 00000da3
cdw11        : 00064000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : 2
latency      : 836 us
fw-download: error on offset 0x00190000/0x00193690

Ha, your firmware file size is not aligned, so the final transfer is getting truncated. Seems strange, why would the vendor provide a firmware image for a device that doesn't align to its granularity? Try --xfer=\<file-size\> so that it could be attempted in a single go.

Hmm, seem it doesn't take such --xfer:

$ ls -l Downloads/initrd\ \(2\)/root/fumagician/3B4QFXO7.enc
-rwxr-xr-x 1 bam bam 1652368 сен 19  2022 'Downloads/initrd (2)/root/fumagician/3B4QFXO7.enc'

# 1652368 == 193690₁₆

$ sudo nvme fw-download --xfer=0x193690 -f Downloads/initrd\ \(2\)/root/fumagician/3B4QFXO7.enc /dev/nvme0
fw-download: error on offset 0x00000000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 0 (1/3)
fw-download: error on offset 0x00000000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 0 (2/3)
fw-download: error on offset 0x00000000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)

Here is the attempts with different --xfer:

$ sudo nvme fw-download --xfer=0x8000 -f Downloads/initrd\ \(2\)/root/fumagician/3B4QFXO7.enc /dev/nvme0
fw-download: error on offset 0x00190000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 190000 (1/3)
fw-download: error on offset 0x00190000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 190000 (2/3)
fw-download: error on offset 0x00190000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)

bam@i7:~$ sudo nvme fw-download --xfer=0x10000 -f Downloads/initrd\ \(2\)/root/fumagician/3B4QFXO7.enc /dev/nvme0
fw-download: error on offset 0x00190000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 190000 (1/3)
fw-download: error on offset 0x00190000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 190000 (2/3)
fw-download: error on offset 0x00190000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)

bam@i7:~$ sudo nvme fw-download --xfer=0x14000 -f Downloads/initrd\ \(2\)/root/fumagician/3B4QFXO7.enc /dev/nvme0
fw-download: error on offset 0x00190000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 190000 (1/3)
fw-download: error on offset 0x00190000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 190000 (2/3)
fw-download: error on offset 0x00190000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)

bam@i7:~$ sudo nvme fw-download --xfer=0x18000 -f Downloads/initrd\ \(2\)/root/fumagician/3B4QFXO7.enc /dev/nvme0
fw-download: error on offset 0x00180000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 180000 (1/3)
fw-download: error on offset 0x00180000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 180000 (2/3)
fw-download: error on offset 0x00180000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)

bam@i7:~$ sudo nvme fw-download --xfer=0x20000 -f Downloads/initrd\ \(2\)/root/fumagician/3B4QFXO7.enc /dev/nvme0
fw-download: error on offset 0x00180000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 180000 (1/3)
fw-download: error on offset 0x00180000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 180000 (2/3)
fw-download: error on offset 0x00180000/0x00193690
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)

bam@i7:~$ sudo nvme fw-download --xfer=0x30000 -f Downloads/initrd\ \(2\)/root/fumagician/3B4QFXO7.enc /dev/nvme0
fw-download: error on offset 0x00000000/0x00193690
fw-download: Invalid argument
retrying offset 0 (1/3)
fw-download: error on offset 0x00000000/0x00193690
fw-download: Invalid argument
retrying offset 0 (2/3)
fw-download: error on offset 0x00000000/0x00193690
fw-download: Invalid argument

The maximum offset I could get is 0x00190000.
Seems I can't set --xfer more than 0x20000

Hm, I'm not sure what the right thing to do here is. FWUG is a "should" in spec, so host shouldn't strictly need to align to it in order to be compliant, but this device definitely appears to require it. So if we 0-pad the image to the 16k granularity, would the device be okay with that? That'd be weird, the downloaded image may fail a checksum, but not sure what else we can do from the host side.

Here's a quick hack to try 0-padding:

dd if=/dev/zero of=padded-firmware bs=16k count=101
dd if=3B4QFXO7.enc of=padded-firmware conv=notrunc
sudo nvme fw-download --xfer=0x194000 -f padded-firmware /dev/nvme0

Success!

--xfer=0x194000 didn't work:

$ sudo nvme fw-download --xfer=0x194000 -f padded-firmware /dev/nvme0 -vvv
opcode       : 11
flags        : 00
rsvd1        : 0000
nsid         : 00000000
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00194000
metadata_len : 00000000
addr         : 77e505600000
metadata     : 0
cdw10        : 00064fff
cdw11        : 00000000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : -1
latency      : 51 us
fw-download: error on offset 0x00000000/0x00194000
fw-download: Invalid argument
retrying offset 0 (1/3)
...

but --xfer=0x4000 did:

$ sudo nvme fw-download --xfer=0x4000 -f padded-firmware /dev/nvme0 
Firmware download success

Is it safe try to flash it?

It should be safe. nvme fw-activate -a 1 -s 0 /dev/nvme0 && nvme reset /dev/nvme0. If the zero padding makes the image invalid, then the activate command should fail with an error. If activate succeeds, then the device believes the fw image is safe to load and execute.

No luck so far:

bam@i7:~/Downloads/initrd (2)/root/fumagician$ sudo nvme fw-commit -s 2 -a 0 /dev/nvme0
NVMe status: Invalid Firmware Image: The firmware image specified for activation is invalid and not loaded by the controller(0x107)

bam@i7:~/Downloads/initrd (2)/root/fumagician$ sudo nvme fw-commit -s 1 -a 0 /dev/nvme0
NVMe status: Invalid Firmware Image: The firmware image specified for activation is invalid and not loaded by the controller(0x107)

bam@i7:~/Downloads/initrd (2)/root/fumagician$ sudo nvme fw-log /dev/nvme0
Firmware Log for device:nvme0
afi  : 0x1
frs1 : 0x374f584651344233 (3B4QFXO7)

bam@i7:~/Downloads/initrd (2)/root/fumagician$ sudo nvme fw-commit -s 0 -a 0 /dev/nvme0
NVMe status: Invalid Firmware Image: The firmware image specified for activation is invalid and not loaded by the controller(0x107)

bam@i7:~/Downloads/initrd (2)/root/fumagician$ sudo nvme fw-commit -s 0 -a 1 /dev/nvme0
NVMe status: Invalid Firmware Image: The firmware image specified for activation is invalid and not loaded by the controller(0x107)

bam@i7:~/Downloads/initrd (2)/root/fumagician$ sudo nvme fw-commit -s 1 -a 1 /dev/nvme0
NVMe status: Invalid Firmware Image: The firmware image specified for activation is invalid and not loaded by the controller(0x107)

bam@i7:~/Downloads/initrd (2)/root/fumagician$ sudo nvme fw-commit -s 2 -a 1 /dev/nvme0
NVMe status: Invalid Firmware Image: The firmware image specified for activation is invalid and not loaded by the controller(0x107)

Yeah, as predicted the image is failing a checksum. I think you have to go back to the vendor to find out what gives. The provided image is fundamentally incompatible with the existing firmware.

It looks, though, that you are trying to activate the same firmware that is already loaded (3B4QFXO7).

Yes, I flashed 3B4QFXO7 earlier with (proprietary?) Samsung tools.
Does it mean the fw itself is OK?

It is the running firmware. As long as your storage device is otherwise working successfully, I think everything is "okay".

The issue you brought up is still strange, though. We should respect FWUG if it is advertised, but I don't know how to handle if the image itself isn't aligned.

So as I can see we have 2 issues here:

  • can't download unaligned image
  • can't align the image without breaking it

Should we try to solve the first without the second?

If the FWUG only provided by the newer version firmware but not provided by the older version. The newer firmware image not aligned can be downloaded on the older firmware and activated but not able to be done them on the newer version firmware. Also probably the next firmware image in future will be aligned.

Do you think we could try to get in touch with Samsung support?

I found respective resources:
https://www.samsung.com/us/support/computing/memory-storage/
Here is the form for report:
https://www.livehelpnow.net/lhn/lcv.aspx?d=34834&ms=&zzwindow=0&lhnid=29783

I'm trying to clarify the problem:

Meantime, could retry --xfer=0x4000 parameter?

So even with padded fw image, it requires -xfer parameter (I tried 0x4000 but bigger sizes might work also).
Is this a problem with Samsung fw?

If you help me to formulate the problem I'll fill the form.
Or, you could do it yourself.

Thanks.

Not sure if the firmware update by nvme-cli supported by the vendor since only seems only the tool release for the update. But seems okay to consult about the issue anyway. Seems not the drive firmware problem and the problems are below I think. If needed probably you can wait the expertises advices also.

  1. The firmware image download command failed with nvme-2.8 on Ubuntu 24.04 to update the firmware 3B4QFXO7 on the drive updated and activated already 3B4QFXO7. (It was successed to update 3B4QFXO7 from the older version by using the vendor tool once.)
  2. The firmware 3B4QFXO7 provides FWUG but the FWUG firmware image size itself not aligned by the FWUG size so the firmware image download for 3B4QFXO7 fails on the drive activated 3B4QFXO7.

By the way still the firmware update by the vendor tool can be done with any error on the drive?

Just FYI: Already the firmware image download command implemented to use the FWUG value as below. By the way now I am thinking to add the warning message if the firmware image size not aligned with the FWUG value.

static int fw_download(int argc, char **argv, struct command *cmd, struct plugin *plugin)
{
...
	if (cfg.xfer == 0) {
		err = nvme_cli_identify_ctrl(dev, &ctrl);
		if (err) {
			nvme_show_error("identify-ctrl: %s", nvme_strerror(errno));
			return err;
		}
		if (ctrl.fwug == 0 || ctrl.fwug == 0xff)
			cfg.xfer = 4096;
		else
			cfg.xfer = ctrl.fwug * 4096;
	} else if (cfg.xfer % 4096)
		cfg.xfer = 4096;

By the way still the firmware update by the vendor tool can be done with any error on the drive?

After recent 3B4QFXO7 update I suppose, the vendor tool doesn't see the disk to update any more, at least not if run from host system directly:

# extract initrd from vendor .ISO image and cd into it:
$ cd Downloads/initrd/root/fumagician/
$ ./fumagician
##############################################################################
#                            Samsung Electronics                             #
#                Samsung SSD Firmware Update Utility Ver. 3.1                #
#                   Samsung Electronics Co., Ltd. (c) 2022                   #
##############################################################################

SCANNING -> -> -> -> -> -> -> -> -> ->

 ______________________________DISK(s) DETECTED______________________________ 
|#|               Drive Model               |    Serial Number    | Firmware |
|-|-----------------------------------------|---------------------|----------|
| |            No supported SSD detected for Firmware Update!!!   |          |
|_|_________________________________________|_____________________|__________|

Press any key to EXIT...

Thanks for the confirmation. Noted and seems since already updated to 3B4QFXO7 so not detected as for the udpate.

But seems okay to consult about the issue anyway.

So I did.
Feel free to add more info on the link below, if needed:

We have received your inquiry. A team member will address your request as soon as possible.

Question/Issue: 1. The firmware image download command failed with nvme-2.8 on Ubuntu 24.04 to update the firmware 3B4QFXO7 on the drive updated and activated already 3B4QFXO7. (It was successed to update 3B4QFXO7 from the older version by using the vendor tool once.)
2. The firmware 3B4QFXO7 provides FWUG (Firmware Update Granularity) but the FWUG firmware image size itself not aligned by the FWUG size so the firmware image download for 3B4QFXO7 fails on the drive activated 3B4QFXO7.

The issue was discussed with the nvme-cli community and after the research, we decided to try to get some expertise from official Samsung support:
#2298

You'll be notified promptly by email as soon as your inquiry is addressed. Simply reply to this email to add a comment or edit your ticket online.

Best Regards,
Samsung Memory Services

Just FYI: Already the firmware image download command implemented to use the FWUG value as below. By the way now I am thinking to add the warning message if the firmware image size not aligned with the FWUG value.

Hm, the earlier test with default parameters showed "data_len : 00001000".

But then I see that the default "xfer" is already 4096, so the user needs to set it to 0 to get the fwug behavior. I don't think that's correct, we should default to 0 so that we always check identify with default/unset parmeters.

What's not clear to me:
if we say the FWUG is 4K (0x1000), why it still doesn't work even for aligned fw image?

$ sudo nvme id-ctrl /dev/nvme0 | grep fwug
fwug      : 4

$ sudo nvme fw-download --xfer=0x1000 -f padded-firmware /dev/nvme0 
[sudo] password for bam: 
fw-download: error on offset 0x00000000/0x00194000
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 0 (1/3)
fw-download: error on offset 0x00000000/0x00194000
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)
retrying offset 0 (2/3)
fw-download: error on offset 0x00000000/0x00194000
NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x2)

the default "xfer" is already 4096, so the user needs to set it to 0 to get the fwug behavior.

That worked:

$ sudo nvme fw-download --xfer=0 -f padded-firmware /dev/nvme0 
Firmware download success

extract initrd from vendor .ISO image and cd into it:

$ cd Downloads/initrd/root/fumagician/
$ ./fumagician

It might have been enlightening to turn on admin command tracing when you ran this vendor specific tool. Now I'm curious what it's doing, but I suppose you'd have to somehow down-rev your firmware in order to run the tool again.

What's not clear to me: if we say the FWUG is 4K (0x1000), why it still doesn't work even for aligned fw image?

$ sudo nvme id-ctrl /dev/nvme0 | grep fwug
fwug      : 4

On your device, fwug is 16k. It's reported as "4", and the units are in "pages", which are 4k, so 4*4k = 16k

Once you pad the firmware image to the alignment, the device accepts the transfer, but it fails the activation because the extra padding fails the checksum when we really just wanted the padding ignored.

but I suppose you'd have to somehow down-rev your firmware in order to run the tool again.

Unfortunately, I don't see the older fw versions available:
https://semiconductor.samsung.com/consumer-storage/support/tools/

The default "xfer" 4096 fixed by the PR #2308.

It might have been enlightening to turn on admin command tracing when you ran this vendor specific tool. Now I'm curious what it's doing, but I suppose you'd have to somehow down-rev your firmware in order to run the tool again.

I also have 980 Pro SSD, probably unupdated and waiting it's turn.
What debug technique you have in mind? strace or something?

Not sure if the pro model drive firmware update behavior same or not but can be confirmed below on the drive I think.

  1. Confirm the current firmware version and its firmware slot by the nvme fw-log command output.
  2. Confirm the fwug value also by the nvme id-ctrl command output.
  3. If the firmware version 5B2QGXA7 on the support site is a newer version use the nvme-cli and try to download and commit it to other firmware slot not used but not the current firmware version slot used.
  4. If the firmware can be updated confirm the fwug value again on the newer version.
  5. If the newer version fwug value changed from the older version firmware retry the firmware update to the newer version firmware.

Without a trace what the vendor specific tool is doing, it's hard to tell what is does to fix the padding problem. The xfer size problem is fixed.

Anyway, this report went stale. Let's close it.