linux-nvme / nvme-cli

NVMe management command line interface.

Home Page:https://nvmexpress.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[bug report]blktests nvme/029 failed from v2.7 on s390x

yizhanglinux opened this issue · comments

Hi
I found nvme/029 failed recently on s390x from v2.7, I did bisect and found it was introduced from commit[2], please help check it, thanks.

[1]
$nvme --version
nvme version 2.8 (git 2.8)
libnvme version 1.8 (git 1.8)

$ uname -a
Linux 6.8.0-rc3 #1 SMP Fri Feb 16 23:08:50 UTC 2024 s390x GNU/Linux

$ ./check nvme/029
nvme/029 (test userspace IO via nvme-cli read/write interface) [failed]
runtime 0.564s ... 0.584s
--- tests/nvme/029.out 2024-02-16 20:56:47.707529154 -0500
+++ /mnt/tests/gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/archive/production/kernel-tests-production.zip/storage/blktests/blk/blktests/results/nodev/nvme/029.out.bad 2024-02-17 23:45:28.078269334 -0500
@@ -1,2 +1,5 @@
Running nvme/029
+FAIL
+FAIL
+FAIL
Test complete

[2]
commit 51e68f7
Author: Daniel Wagner dwagner@suse.de
Date:   Fri Nov 17 09:01:35 2023 +0100

    nvme: replace libhugetlbfs with mmap and madvise

    Instead depending on libhugetlbfs for large memory allocation mapping
    just use mmap and madvise for trying to allocate contiguous memory.

    While at it also introduce an auto cleanup helper.

    Signed-off-by: Daniel Wagner dwagner@suse.de

Could you post the output of these commands here?

nvme id-ns /dev/nvmeXnY
cat /proc/meminfo

The test fails because the tests is allocate bigger chunks of linear memory and it might be that this machine runs out of big blocks of memory.

The allocation strategy is to use posix_memalign for allocation below 512k first. For larger allocation we try hugetlb and if this fails it's posix_memalign/madvise.

One thing you could try is, if /proc/sys/vm/nr_hugepages is return 0, tell the kernel to reserve a few of the hugepages before running the test:

echo 20 > /proc/sys/vm/nr_hugepages

Could you post the output of these commands here?

nvme id-ns /dev/nvmeXnY
cat /proc/meminfo

Here is the info:

NVME Identify Namespace 1:
nsze    : 0x200000
ncap    : 0x200000
nuse    : 0x200000
nsfeat  : 0x12
nlbaf   : 0
flbas   : 0
mc      : 0
dpc     : 0
dps     : 0
nmic    : 0x1
rescap  : 0
fpi     : 0
dlfeat  : 0
nawun   : 0
nawupf  : 0
nacwu   : 0
nabsn   : 0
nabo    : 0
nabspf  : 0
noiob   : 0
nvmcap  : 0
npwg    : 0
npwa    : 0
npdg    : 7
npda    : 7
nows    : 0
mssrl   : 0
mcl     : 0
msrc    : 0
nulbaf  : 0
anagrpid: 1
nsattr	: 0
nvmsetid: 0
endgid  : 0
nguid   : 00000000000000000000000000000000
eui64   : 0000000000000000
lbaf  0 : ms:0   lbads:9  rp:0 (in use)
MemTotal:        2020144 kB
MemFree:         1706188 kB
MemAvailable:    1820712 kB
Buffers:            5516 kB
Cached:           119724 kB
SwapCached:            0 kB
Active:           144008 kB
Inactive:          37584 kB
Active(anon):      56684 kB
Inactive(anon):        0 kB
Active(file):      87324 kB
Inactive(file):    37584 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       2019324 kB
SwapFree:        2019324 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:               168 kB
Writeback:             0 kB
AnonPages:         56372 kB
Mapped:            47772 kB
Shmem:               332 kB
KReclaimable:      14016 kB
Slab:              61652 kB
SReclaimable:      14016 kB
SUnreclaim:        47636 kB
KernelStack:        3456 kB
PageTables:         6904 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3029396 kB
Committed_AS:     236332 kB
VmallocTotal:   534773760 kB
VmallocUsed:       20336 kB
VmallocChunk:          0 kB
Percpu:            24064 kB
CmaTotal:           4096 kB
CmaFree:            4096 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       1024 kB
Hugetlb:               0 kB
DirectMap4k:      116736 kB
DirectMap1M:     1980416 kB
DirectMap2G:           0 kB

The test fails because the tests is allocate bigger chunks of linear memory and it might be that this machine runs out of big blocks of memory.

The allocation strategy is to use posix_memalign for allocation below 512k first. For larger allocation we try hugetlb and if this fails it's posix_memalign/madvise.

One thing you could try is, if /proc/sys/vm/nr_hugepages is return 0, tell the kernel to reserve a few of the hugepages before running the test:

echo 20 > /proc/sys/vm/nr_hugepages

Yeah, the nr_hugepages is 0, it works now after changing it to 20.

Thanks, close this issue now.