pmem / ndctl

A "device memory" enabling project encompassing tools and libraries for CXL, NVDIMMs, DAX, memory tiering and other platform memory device topics.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

overwrite operation still issued even "ndctl sanitize-dimm nmem0 --overwrite" failed

yizhanglinux opened this issue · comments

Hello
I tried ndctl sanitize-dimm nmem0 --overwrite, it tells me the execution failed, but the overwrite operation still issued to nmem0.

# uname -r
6.4.0-rc1+

# ndctl setup-passphrase "$dev" -k user:"$masterkey"
passphrase enabled for 1 nmem.
# ./ndctl list -Di
[
  {
    "dev":"nmem1",
    "id":"8089-a2-1833-00000510",
    "handle":257,
    "phys_id":32,
    "flag_failed_map":true,
    "security":"disabled"
  },
  {
    "dev":"nmem3",
    "id":"8089-a2-1833-00000497",
    "handle":4353,
    "phys_id":44,
    "security":"disabled"
  },
  {
    "dev":"nmem0",
    "id":"8089-a2-1833-000004a3",
    "handle":1,
    "phys_id":26,
    "security":"unlocked"
  },
  {
    "dev":"nmem2",
    "id":"8089-a2-1833-000004a9",
    "handle":4097,
    "phys_id":38,
    "security":"disabled"
  }
]
# ls /etc/ndctl/keys/
keys.readme  nvdimm_8089-a2-1833-000004a3_intel-purley-04.khw1.lab.eng.bos.redhat.com.blob  nvdimm-master.blob

# ./ndctl sanitize-dimm nmem0 --overwrite
libndctl: ndctl_dimm_enable: nmem0: failed to enable
overwrite issued for 0 nmem.


# ./ndctl list -Di
[
  {
    "dev":"nmem1",
    "id":"8089-a2-1833-00000510",
    "handle":257,
    "phys_id":32,
    "flag_failed_map":true,
    "security":"disabled"
  },
  {
    "dev":"nmem3",
    "id":"8089-a2-1833-00000497",
    "handle":4353,
    "phys_id":44,
    "security":"disabled"
  },
  {
    "dev":"nmem0",
    "id":"8089-a2-1833-000004a3",
    "handle":1,
    "phys_id":26,
    "state":"disabled",
    "security":"overwrite"
  },
  {
    "dev":"nmem2",
    "id":"8089-a2-1833-000004a9",
    "handle":4097,
    "phys_id":38,
    "security":"disabled"
  }
]

Hmm....I don't understand why it attempts to enable the dimm while attempting overwrite. Can you enable verbose debugging and provide the log please?

Ok, I think I know why we are seeing this behavior. Overwrite has been issued, but then we call revalidate_labels() afterwards. and that fails. I think this is the wrong place to do so because overwrite is still in progress, and therefore it will fail. But it makes sense that overwrite succeeded because it's already issued before this software error. And the 0 nmem overwritten is deceiving. revalidate_labels() error does not reverse or stop the overwrite operation.

@djbw, you introduced the revalidate_labes() call for overwrite. But it's to be failing on a real dimm. I don't think you can call that until the DIMM has completed overwrite. So it may not be something that can be issued from user space since ndctl is stateless?
8186ec8 ("ndctl/dimm: Flush invalidated labels after overwrite")