overwrite operation still issued even "ndctl sanitize-dimm nmem0 --overwrite" failed
yizhanglinux opened this issue · comments
Hello
I tried ndctl sanitize-dimm nmem0 --overwrite
, it tells me the execution failed, but the overwrite operation still issued to nmem0.
# uname -r
6.4.0-rc1+
# ndctl setup-passphrase "$dev" -k user:"$masterkey"
passphrase enabled for 1 nmem.
# ./ndctl list -Di
[
{
"dev":"nmem1",
"id":"8089-a2-1833-00000510",
"handle":257,
"phys_id":32,
"flag_failed_map":true,
"security":"disabled"
},
{
"dev":"nmem3",
"id":"8089-a2-1833-00000497",
"handle":4353,
"phys_id":44,
"security":"disabled"
},
{
"dev":"nmem0",
"id":"8089-a2-1833-000004a3",
"handle":1,
"phys_id":26,
"security":"unlocked"
},
{
"dev":"nmem2",
"id":"8089-a2-1833-000004a9",
"handle":4097,
"phys_id":38,
"security":"disabled"
}
]
# ls /etc/ndctl/keys/
keys.readme nvdimm_8089-a2-1833-000004a3_intel-purley-04.khw1.lab.eng.bos.redhat.com.blob nvdimm-master.blob
# ./ndctl sanitize-dimm nmem0 --overwrite
libndctl: ndctl_dimm_enable: nmem0: failed to enable
overwrite issued for 0 nmem.
# ./ndctl list -Di
[
{
"dev":"nmem1",
"id":"8089-a2-1833-00000510",
"handle":257,
"phys_id":32,
"flag_failed_map":true,
"security":"disabled"
},
{
"dev":"nmem3",
"id":"8089-a2-1833-00000497",
"handle":4353,
"phys_id":44,
"security":"disabled"
},
{
"dev":"nmem0",
"id":"8089-a2-1833-000004a3",
"handle":1,
"phys_id":26,
"state":"disabled",
"security":"overwrite"
},
{
"dev":"nmem2",
"id":"8089-a2-1833-000004a9",
"handle":4097,
"phys_id":38,
"security":"disabled"
}
]
Hmm....I don't understand why it attempts to enable the dimm while attempting overwrite. Can you enable verbose debugging and provide the log please?
Ok, I think I know why we are seeing this behavior. Overwrite has been issued, but then we call revalidate_labels() afterwards. and that fails. I think this is the wrong place to do so because overwrite is still in progress, and therefore it will fail. But it makes sense that overwrite succeeded because it's already issued before this software error. And the 0 nmem overwritten is deceiving. revalidate_labels() error does not reverse or stop the overwrite operation.
@djbw, you introduced the revalidate_labes() call for overwrite. But it's to be failing on a real dimm. I don't think you can call that until the DIMM has completed overwrite. So it may not be something that can be issued from user space since ndctl is stateless?
8186ec8 ("ndctl/dimm: Flush invalidated labels after overwrite")