ikwzm / udmabuf

User space mappable dma buffer device driver for Linux.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Test program failure on aarch64 system

gmegan opened this issue · comments

commented

We are attempting to use the udmabuf device on an aarch64 system. When running the udmabuf_test.c program, it will succeed the first time. The second time we run the program, the kernel will issue errors and the system will go down.

$ uname -a
Linux XXXXX 4.14.0-49.el7a.aarch64 #1 SMP Wed Mar 14 10:02:34 EDT 2018 aarch64 aarch64 aarch64 GNU/Linux

$ ./udmabuf_test.x 
phys_addr=0x81b00000
size=1048576
check_buf()
sync_mode=0, O_SYNC=1, time = 2.175054 sec
$ ./udmabuf_test.x 
phys_addr=0x81b00000
size=1048576
check_buf()
sync_mode=0, O_SYNC=1, time = 2.175091 sec

Message from syslogd@vulcan2 at Jul  1 16:21:11 ...
 kernel:page:ffff7fe000006c40 count:1 mapcount:-127 mapping:          (null) index:0x0

Message from syslogd@vulcan2 at Jul  1 16:21:11 ...
 kernel:flags: 0xfffff0000000014(referenced|dirty)

Message from syslogd@vulcan2 at Jul  1 16:21:11 ...
 kernel:page:ffff7fe000006c80 count:1 mapcount:-127 mapping:          (null) index:0x0

Message from syslogd@vulcan2 at Jul  1 16:21:11 ...
 kernel:flags: 0xfffff0000000014(referenced|dirty)

There are many of similar error messages.

dmesg:

[Jul 1 16:03] udmabuf udmabuf0: driver version = 1.4.1
[  +0.005047] udmabuf udmabuf0: major number   = 239
[  +0.004886] udmabuf udmabuf0: minor number   = 0
[  +0.004692] udmabuf udmabuf0: phys address   = 0x0000000081b00000
[  +0.006178] udmabuf udmabuf0: buffer size    = 1048576
[  +0.005213] udmabuf udmabuf0: dma coherent   = 0
[  +0.004700] udmabuf udmabuf.0: driver installed.
[Jul 1 16:06] BUG: Bad page map in process udmabuf_test.x  pte:e8000081b10f4f pmd:bf67ac0003
[  +0.008352] page:ffff7fe000006c40 count:1 mapcount:-127 mapping:          (null) index:0x0
[  +0.008352] flags: 0xfffff0000000014(referenced|dirty)
[  +0.005222] raw: 0fffff0000000014 0000000000000000 0000000000000000 00000001ffffff80

Message from syslogd@vulcan2 at Jul  1 16:06:17 ...
 kernel:page:ffff7fe000006c40 count:1 mapcount:-127 mapping:          (null) index:0x0

Message from syslogd@vulcan2 at Jul  1 16:06:17 ...
 kernel:flags: 0xfffff0000000014(referenced|dirty)
[  +0.007817] raw: ffff7fe000006f60 ffff7fe0000069e0 0000000000000000 0000000000000000
[  +0.007828] page dumped because: bad pte
[  +0.004009] addr:0000ffff89380000 vm_flags:000040fb anon_vma:          (null) mapping:ffff809f5141dde0 index:1
[  +0.010100] file:udmabuf0 fault:udmabuf_device_vma_fault [udmabuf] mmap:udmabuf_device_file_mmap [udmabuf] readpage:          (null)
[  +0.011998] CPU: 28 PID: 36133 Comm: udmabuf_test.x Kdump: loaded Tainted: G           OE  ------------   4.14.0-49.el7a.aarch64 #1
[  +0.011897] Hardware name: To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M., BIOS 0ACKQ214 01/12/2018
[  +0.011028] Call trace:
[  +0.002528] [<ffff000008088e6c>] dump_backtrace+0x0/0x23c
[  +0.005474] [<ffff0000080890cc>] show_stack+0x24/0x2c
[  +0.005131] [<ffff0000087f607c>] dump_stack+0x84/0xa8
[  +0.005131] [<ffff00000823a324>] print_bad_pte+0x16c/0x1f0
[  +0.005560] [<ffff00000823cd48>] unmap_page_range+0x4f0/0x708
[  +0.005820] [<ffff00000823d004>] unmap_single_vma+0xa4/0xf8
[  +0.005646] [<ffff00000823d3a8>] unmap_vmas+0x70/0xbc
[  +0.005127] [<ffff000008243740>] unmap_region+0xb0/0x120
[  +0.005385] [<ffff000008246080>] do_munmap+0x1e8/0x2f8
[  +0.005212] [<ffff0000082467f8>] vm_munmap+0x6c/0xb8
[  +0.005038] [<ffff000008246874>] SyS_munmap+0x30/0x40
[  +0.005125] Exception stack(0xffff00003020fec0 to 0xffff000030210000)
[  +0.006515] fec0: 0000ffff89370000 0000000000100000 0000000000000001 0000000000000000
[  +0.007904] fee0: 0000000000401160 0000000000000bd0 0000ffff894badd4 0000000000000000
[  +0.007902] ff00: 00000000000000d7 0000ffff895f15b0 00000000ffffffff 0000000000000000
[  +0.007903] ff20: 0000000000000018 00000003e8000000 000e6ed3c5384905 00009ea30d20eab6
[  +0.007903] ff40: 0000ffff8954ae00 0000000000420070 0000ffffc15f8b50 0000000000000000
[  +0.007903] ff60: 0000000000000000 00000000004008e0 0000000000000000 0000000000000000
[  +0.007902] ff80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  +0.007903] ffa0: 0000000000000000 0000ffffc15f8d80 0000000000400d48 0000ffffc15f8d80
[  +0.007903] ffc0: 0000ffff8954ae08 0000000080000000 0000ffff89370000 00000000000000d7
[  +0.007903] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  +0.007904] [<ffff00000808359c>] __sys_trace_return+0x0/0x4
[  +0.005658] Disabling lock debugging due to kernel taint

Thank you for the issue.

I want to try to reproduce in my environment.
How did you install udmabuf.ko? If you installed using devicetree, please tell me the contents. Also, if you installed using insmod command, please tell me the arguments.

commented

Thank you for responding so quickly. I build udmabuf.ko in the repo directory and then install with insmod. Here are my commands to build the module and tests:

$ git clone https://github.com/ikwzm/udmabuf

$ cd udmabuf

$ make

$ cat /etc/udev/rules.d/99-udmabuf.rules
KERNEL=="udmabuf[0-9]*", GROUP="wheel", MODE="0660"

$ sudo insmod udmabuf.ko udmabuf0=1048576

$ gcc  udmabuf_test.c  -o udmabuf_test.x

$ ./udmabuf_test.x

uum...

I have an Ultra96. It is powered by Xilinx's ZynqMP. The CPU of ZynqMP is ARM64.
I put Linux-Kernel 4.14 on this board.
Please refer to this URL for details.

  • htts://github.com/ikwzm/ZynqMP-FPGA-Linux

Here is the result of running udmabuf on this system.

root@debian-fpga:~/tmp# git clone https://github.com/ikwzm/udmabuf
Cloning into 'udmabuf'...
remote: Enumerating objects: 25, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 396 (delta 11), reused 19 (delta 6), pack-reused 371
Receiving objects: 100% (396/396), 239.23 KiB | 118.00 KiB/s, done.
Resolving deltas: 100% (220/220), done.
root@debian-fpga:~/tmp# cd udmabuf
root@debian-fpga:~/tmp/udmabuf# make
make -C /lib/modules/4.14.0-xlnx-v2018.2-zynqmp-fpga/build ARCH=arm64 CROSS_COMPILE= M=/root/tmp/udmabuf modules
make[1]: Entering directory '/usr/src/linux-headers-4.14.0-xlnx-v2018.2-zynqmp-fpga'
  CC [M]  /root/tmp/udmabuf/udmabuf.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /root/tmp/udmabuf/udmabuf.mod.o
  LD [M]  /root/tmp/udmabuf/udmabuf.ko
make[1]: Leaving directory '/usr/src/linux-headers-4.14.0-xlnx-v2018.2-zynqmp-fpga'
root@debian-fpga:~/tmp/udmabuf# insmod udmabuf.ko udmabuf0=1048576
[  132.536923] udmabuf: loading out-of-tree module taints kernel.
[  132.547518] udmabuf udmabuf0: driver version = 1.4.1
[  132.552440] udmabuf udmabuf0: major number   = 243
[  132.557202] udmabuf udmabuf0: minor number   = 0
[  132.561788] udmabuf udmabuf0: phys address   = 0x0000000070100000
[  132.567867] udmabuf udmabuf0: buffer size    = 1048576
[  132.572983] udmabuf udmabuf0: dma coherent   = 0
[  132.577586] udmabuf udmabuf.0: driver installed.
root@debian-fpga:~/tmp/udmabuf# gcc udmabuf_test.c -o udmabuf_test.x
udmabuf_test.c: In function 'check_buf_test':
udmabuf_test.c:63:7: warning: implicit declaration of function 'write' [-Wimplicit-function-declaration]
       write(fd, attr, strlen(attr));
       ^~~~~
udmabuf_test.c:64:7: warning: implicit declaration of function 'close' [-Wimplicit-function-declaration]
       close(fd);
       ^~~~~
udmabuf_test.c:71:7: warning: implicit declaration of function 'gettimeofday' [-Wimplicit-function-declaration]
       gettimeofday(&start_time, NULL);
       ^~~~~~~~~~~~
udmabuf_test.c: In function 'main':
udmabuf_test.c:118:7: warning: implicit declaration of function 'read' [-Wimplicit-function-declaration]
       read(fd, attr, 1024);
       ^~~~
udmabuf_test.c:145:23: warning: implicit declaration of function 'lseek' [-Wimplicit-function-declaration]
       long last_pos = lseek(fd, 0, 2);
                       ^~~~~
udmabuf_test.c:148:9: warning: implicit declaration of function 'exit' [-Wimplicit-function-declaration]
         exit(-1);
         ^~~~
udmabuf_test.c:148:9: warning: incompatible implicit declaration of built-in function 'exit'
udmabuf_test.c:148:9: note: include '<stdlib.h>' or provide a declaration of 'exit'
root@debian-fpga:~/tmp/udmabuf# ./udmabuf_test.x
phys_addr=0x70100000
size=1048576
check_buf()
sync_mode=0, O_SYNC=0, time = 0.340763 sec
sync_mode=0, O_SYNC=1, time = 0.340823 sec
sync_mode=1, O_SYNC=0, time = 0.340724 sec
sync_mode=1, O_SYNC=1, time = 2.284912 sec
sync_mode=2, O_SYNC=0, time = 0.340711 sec
sync_mode=2, O_SYNC=1, time = 2.284837 sec
sync_mode=3, O_SYNC=0, time = 0.340662 sec
sync_mode=3, O_SYNC=1, time = 2.284818 sec
sync_mode=4, O_SYNC=0, time = 0.340748 sec
sync_mode=4, O_SYNC=1, time = 0.340682 sec
sync_mode=5, O_SYNC=0, time = 2.285120 sec
sync_mode=5, O_SYNC=1, time = 2.284539 sec
sync_mode=6, O_SYNC=0, time = 2.284510 sec
sync_mode=6, O_SYNC=1, time = 2.284424 sec
sync_mode=7, O_SYNC=0, time = 2.284612 sec
sync_mode=7, O_SYNC=1, time = 2.284490 sec
clear_buf()
sync_mode=0, O_SYNC=0, time = 0.025417 sec
sync_mode=0, O_SYNC=1, time = 0.025395 sec
sync_mode=1, O_SYNC=0, time = 0.025388 sec
sync_mode=1, O_SYNC=1, time = 0.026659 sec
sync_mode=2, O_SYNC=0, time = 0.025400 sec
sync_mode=2, O_SYNC=1, time = 0.026655 sec
sync_mode=3, O_SYNC=0, time = 0.025393 sec
sync_mode=3, O_SYNC=1, time = 0.026652 sec
sync_mode=4, O_SYNC=0, time = 0.025401 sec
sync_mode=4, O_SYNC=1, time = 0.025411 sec
sync_mode=5, O_SYNC=0, time = 0.026659 sec
sync_mode=5, O_SYNC=1, time = 0.026659 sec
sync_mode=6, O_SYNC=0, time = 0.026709 sec
sync_mode=6, O_SYNC=1, time = 0.026668 sec
sync_mode=7, O_SYNC=0, time = 0.026654 sec
sync_mode=7, O_SYNC=1, time = 0.026657 sec
root@debian-fpga:~/tmp/udmabuf# ./udmabuf_test.x
phys_addr=0x70100000
size=1048576
check_buf()
sync_mode=0, O_SYNC=0, time = 0.340782 sec
sync_mode=0, O_SYNC=1, time = 0.340959 sec
sync_mode=1, O_SYNC=0, time = 0.340772 sec
sync_mode=1, O_SYNC=1, time = 2.284743 sec
sync_mode=2, O_SYNC=0, time = 0.340702 sec
sync_mode=2, O_SYNC=1, time = 2.284824 sec
sync_mode=3, O_SYNC=0, time = 0.340644 sec
sync_mode=3, O_SYNC=1, time = 2.284745 sec
sync_mode=4, O_SYNC=0, time = 0.340866 sec
sync_mode=4, O_SYNC=1, time = 0.340752 sec
sync_mode=5, O_SYNC=0, time = 2.284731 sec
sync_mode=5, O_SYNC=1, time = 2.284476 sec
sync_mode=6, O_SYNC=0, time = 2.284510 sec
sync_mode=6, O_SYNC=1, time = 2.284402 sec
sync_mode=7, O_SYNC=0, time = 2.284807 sec
sync_mode=7, O_SYNC=1, time = 2.284480 sec
clear_buf()
sync_mode=0, O_SYNC=0, time = 0.026005 sec
sync_mode=0, O_SYNC=1, time = 0.025991 sec
sync_mode=1, O_SYNC=0, time = 0.025989 sec
sync_mode=1, O_SYNC=1, time = 0.026661 sec
sync_mode=2, O_SYNC=0, time = 0.025982 sec
sync_mode=2, O_SYNC=1, time = 0.026657 sec
sync_mode=3, O_SYNC=0, time = 0.025986 sec
sync_mode=3, O_SYNC=1, time = 0.026667 sec
sync_mode=4, O_SYNC=0, time = 0.026007 sec
sync_mode=4, O_SYNC=1, time = 0.025981 sec
sync_mode=5, O_SYNC=0, time = 0.026658 sec
sync_mode=5, O_SYNC=1, time = 0.026659 sec
sync_mode=6, O_SYNC=0, time = 0.026655 sec
sync_mode=6, O_SYNC=1, time = 0.026651 sec
sync_mode=7, O_SYNC=0, time = 0.026664 sec
sync_mode=7, O_SYNC=1, time = 0.026648 sec

root@debian-fpga:~/tmp/udmabuf#

As this result shows, this system could not reproduce your problem.

Please tell me how the cma area is secured. When you boot Linux Kernel, you will see the following message first.

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.14.0-xlnx-v2018.2-zynqmp-fpga (ichiro@sphinx-vm-ubuntu) (gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9)) #1 SMP Fri Aug 10 11:05:02 JST 2018
[    0.000000] Boot CPU: AArch64 Processor [410fd034]
[    0.000000] Machine model: ZynqMP Ultra96
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: UEFI not found.
[    0.000000] cma: Reserved 256 MiB at 0x0000000070000000

commented

Ah, good question. It appears on this system we do not have CMA area initialized.

$ dmesg | grep cma
[    0.000000] Memory: 267434048K/268366912K available (7932K kernel code, 1836K rwdata, 3328K rodata, 1280K init, 7719K bss, 932864K reserved, 0K cma-reserved)

udmabuf uses dma_alloc_coherent () to allocate memory.
dma_alloc_coherent () allocates memory from kernel space when there is no CMA.
When memory is allocated from kernel space, issue #22 points out that there is a problem with fault processing of udmabuf.
Maybe this is the cause.

commented

That sounds likely to me at this point. For our case, we actually need memory that is not marked VM_IO or VM_PFNMAP, or we cannot do what we are wanting to do, so the suggested solution in the other issue would cause problems. I will try to enable a CMA area for our system to test.

commented

I configured our system with 512 MB reserved for cma. The test program is now working! Thank you.

I released it temporarily with v1.4.2-rc1.
I will confirm the operation as it is for a while.

Thank you for your issue.

I have released udmabuf v1.4.2 which fixes this problem.
Thank you.