martinpitt / umockdev

Mock hardware devices for creating unit tests and bug reporting

Home Page:https://launchpad.net/umockdev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Broken links in /sys/ even though the links in $UMOCKDEV_DIR are correct

mrc0mmand opened this issue · comments

Hello!

In my recent endeavours in systemd/systemd#21725 I've been trying to "collect" a couple of NICs to further expand on the idea, but stumbled across a weird bug(?).

I have a machine with a OneConnect 10Gb NIC:

# lshw -C network | grep 'bus info'
       bus info: pci@0000:16:00.0
       bus info: pci@0000:16:00.1
       bus info: pci@0000:16:00.4
       bus info: pci@0000:16:00.5
# ls -l /sys/class/net
total 0
lrwxrwxrwx. 1 root root 0 Jan  6 02:27 eno1 -> ../../devices/pci0000:00/0000:00:02.2/0000:16:00.0/net/eno1
lrwxrwxrwx. 1 root root 0 Jan  6 02:27 enp22s0f1 -> ../../devices/pci0000:00/0000:00:02.2/0000:16:00.1/net/enp22s0f1
lrwxrwxrwx. 1 root root 0 Jan  6 02:27 enp22s0f4 -> ../../devices/pci0000:00/0000:00:02.2/0000:16:00.4/net/enp22s0f4
lrwxrwxrwx. 1 root root 0 Jan  6 02:27 enp22s0f5 -> ../../devices/pci0000:00/0000:00:02.2/0000:16:00.5/net/enp22s0f5
lrwxrwxrwx. 1 root root 0 Jan  6 02:27 lo -> ../../devices/virtual/net/lo

umockdev-record successfully creates a text dump of the topology (tested with both umockdev-record /sys/class/net/eno1 and umockdev-record /sys/class/net/en*), but mocking the devices back on a different machine yields weird results:

$ ~/repos/umockdev/build/umockdev-run -d oneconnect_be2net.umockdev -- bash
$ ls -l /sys/class/net
ls: cannot access '/sys/class/net/enp22s0f5': No such file or directory
ls: cannot access '/sys/class/net/enp22s0f4': No such file or directory
ls: cannot access '/sys/class/net/enp22s0f1': No such file or directory
ls: cannot access '/sys/class/net/eno1': No such file or directory
total 0
l?????????? ? ?    ?    ?            ? eno1
l?????????? ? ?    ?    ?            ? enp22s0f1
l?????????? ? ?    ?    ?            ? enp22s0f4
l?????????? ? ?    ?    ?            ? enp22s0f5
lrwxrwxrwx. 1 root root 0 Oct 28 19:42 lo -> ../../devices/virtual/net/lo

However, the links are correct when looking directly in the $UMOCKDEV_DIR:

$ ls -l $UMOCKDEV_DIR/sys/class/net
total 0
lrwxrwxrwx. 1 fsumsal fsumsal 59 Jan  6 13:50 eno1 -> ../../devices/pci0000:00/0000:00:02.2/0000:16:00.0/net/eno1
lrwxrwxrwx. 1 fsumsal fsumsal 64 Jan  6 13:50 enp22s0f1 -> ../../devices/pci0000:00/0000:00:02.2/0000:16:00.1/net/enp22s0f1
lrwxrwxrwx. 1 fsumsal fsumsal 64 Jan  6 13:50 enp22s0f4 -> ../../devices/pci0000:00/0000:00:02.2/0000:16:00.4/net/enp22s0f4
lrwxrwxrwx. 1 fsumsal fsumsal 64 Jan  6 13:50 enp22s0f5 -> ../../devices/pci0000:00/0000:00:02.2/0000:16:00.5/net/enp22s0f5
lrwxrwxrwx. 1 fsumsal fsumsal 28 Jan  6 13:50 lo -> ../../devices/virtual/net/lo

On both machines I tried the F34 umockdev version as well as the latest master (4f22f02 ATTOW), with the same results.

Used umockdev files (renamed to *.txt, to make GH happy):

Is there something I'm missing or there's an actual issue somewhere?

I can reproduce this on F35. It's not specific to your eno1.txt, it happens just as well with e.g. devices/input/usbkbd.umockdev from upstream git:

❱❱❱ LD_LIBRARY_PATH=. ./umockdev-run -d ../devices/input/usbkbd.umockdev -- ls -l /sys/bus/pci/devices/
ls: cannot access '/sys/bus/pci/devices/0000:00:1a.0': No such file or directory
total 0
l????????? ? ? ? ?            ? 0000:00:1a.0

However, this is somehow specific to ls. libudev or even Python are happy:

❱❱❱ LD_LIBRARY_PATH=. ./umockdev-run -d ../devices/input/usbkbd.umockdev -- python3 -c 'import os; print(os.listdir("/sys/bus/pci/devices/0000:00:1a.0"))'
['driver', 'config', 'vendor', 'uframe_periodic_max', 'subsystem_vendor', 'subsystem_device', 'resource', 'pools', 'numa_node', 'msi_bus', 'modalias', 'local_cpus', 'local_cpulist', 'irq', 'dma_mask_bits', 'device', 'd3cold_allowed', 'consistent_dma_mask_bits', 'companion', 'class', 'broken_parity_status', 'uevent', 'subsystem', 'usb1']

This smells like a new glibc introducing yet another ?stat?() variant which needs to be wrapped. I'll strace it and investigate. Thanks for your report!

Here is the culprit:

23876 statx(AT_FDCWD, "/sys/bus/pci/devices/", AT_STATX_SYNC_AS_STAT, STATX_MODE, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_blksize=4096, stx_attributes=0, stx_nlink=2, stx_uid=65534, stx_gid=65534, stx_mode=S_IFDIR|0755, stx_ino=2419, stx_size=0, stx_blocks=0, stx_attributes_mask=STATX_ATTR_AUTOMOUNT|STATX_ATTR_MOUNT_ROOT|STATX_ATTR_DAX, stx_atime={tv_sec=1641722969, tv_nsec=983000028} /* 2022-01-09T11:09:29.983000028+0100 */, stx_btime={tv_sec=0, tv_nsec=0}, stx_ctime={tv_sec=1641722969, tv_nsec=888000023} /* 2022-01-09T11:09:29.888000023+0100 */, stx_mtime={tv_sec=1641722969, tv_nsec=888000023} /* 2022-01-09T11:09:29.888000023+0100 */, stx_rdev_major=0, stx_rdev_minor=0, stx_dev_major=0, stx_dev_minor=23}) = 0

I have a first fix. It still fails on Alpine with musl libc as that doesn't have a statx() (and thus I can't land it yet), but it works fine with glibc now.

Thanks a lot for the fix!

@mrc0mmand : I released this in 0.17.2, in updates-testing on Fedora 34 and Fedora 35 now.