Kernel Oops when synching
polygon opened this issue · comments
I just started testing this project with a UltraScale+ processor running in ARM64 mode. I am trying to use this with a reserved-memory space. When I try to change the synchronization state, I am receiving a kernel oops. My device-tree looks as follows:
reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;
dma_region: dmaregion@7bf00000 {
compatible = "shared-dma-pool";
reg = <0x0 0x7bf00000 0x0 0x4000000>;
no-map;
};
};
dma: dma@7bf00000 {
compatible = "ikwzm,udmabuf-0.10.a";
device-name = "dma_region";
minor-number = <0>;
memory-region = <&dma_region>;
size = <0x4000000>;
};
When I load the module on the target system, everything looks fine:
# modprobe udmabuf
[ 178.772612] udmabuf dma@7bf00000: assigned reserved memory node dmaregion@7bf00000
[ 178.798115] udmabuf dma_region: major number = 246
[ 178.803007] udmabuf dma_region: minor number = 0
[ 178.807784] udmabuf dma_region: phys address = 0x000000007bf00000
[ 178.814080] udmabuf dma_region: buffer size = 67108864
[ 178.819416] udmabuf dma_region: dma coherent = 0
[ 178.824215] udmabuf dma@7bf00000: driver installed.
# cd /sys/class/udmabuf/dma_region/
# ls
debug_vma power sync_for_cpu sync_owne
dev size sync_for_device sync_size
device subsystem sync_mode uevent
phys_addr sync_direction sync_offset
# cat phys_addr
0x000000007bf00000
However, when I try to write sync_for_cpu
or sync_for_device
, I am receiving an oops:
# echo 1 > sync_for_device
[ 490.115828] Unable to handle kernel paging request at virtual address ffffffc07bf00000
[ 490.123730] Mem abort info:
[ 490.126490] Exception class = DABT (current EL), IL = 32 bits
[ 490.132392] SET = 0, FnV = 0
[ 490.135432] EA = 0, S1PTW = 0
[ 490.138551] Data abort info:
[ 490.141417] ISV = 0, ISS = 0x00000147
[ 490.145235] CM = 1, WnR = 1
[ 490.148182] swapper pgtable: 4k pages, 39-bit VAs, pgd = ffffff8009c76000
[ 490.154970] [ffffffc07bf00000] *pgd=000000007fff6003, *pud=000000007fff6003, *pmd=000000007fff5003, *pte=0000000000000000
[ 490.165910] Internal error: Oops: 96000147 [#1] PREEMPT SMP
[ 490.171444] Modules linked in: udmabuf(O) zdma_uio(O)
[ 490.176482] CPU: 0 PID: 1451 Comm: sh Tainted: G O 4.14.0-ultrazed #8
[ 490.183946] Hardware name: xlnx,zynqmp (DT)
[ 490.188108] task: ffffffc07865cd00 task.stack: ffffff800c210000
[ 490.194019] PC is at __clean_dcache_area_poc+0x20/0x38
[ 490.199135] LR is at __swiotlb_sync_single_for_device+0x50/0x60
[ 490.205036] pc : [<ffffff800808e91c>] lr : [<ffffff800808d15c>] pstate: 80000145
[ 490.212416] sp : ffffff800c213cf0
[ 490.215709] x29: ffffff800c213cf0 x28: ffffffc07865cd00
[ 490.221006] x27: ffffff8008441000 x26: 0000000000000040
[ 490.226301] x25: 0000000000000124 x24: 0000000000000015
[ 490.231596] x23: ffffff800c213eb8 x22: 0000000000000000
[ 490.236891] x21: 0000000004000000 x20: ffffffc078090c10
[ 490.242185] x19: 000000007bf00000 x18: 0000000000000844
[ 490.247480] x17: 0000007fbb41a128 x16: ffffff80081450b8
[ 490.252775] x15: 0000000000000008 x14: 0000007fbb36cdc0
[ 490.258070] x13: 00000000004b5474 x12: 0101010101010101
[ 490.263365] x11: 0000000000000000 x10: 0101010101010101
[ 490.268660] x9 : ffffff800c213d58 x8 : ffffffc001573280
[ 490.273954] x7 : 0000000004000000 x6 : 000000007bf00000
[ 490.279249] x5 : ffffffc078090c10 x4 : 0000000000000001
[ 490.284544] x3 : 000000000000003f x2 : 0000000000000040
[ 490.289839] x1 : ffffffc07ff00000 x0 : ffffffc07bf00000
[ 490.295135] Process sh (pid: 1451, stack limit = 0xffffff800c210000)
[ 490.301472] Call trace:
[ 490.303899] Exception stack(0xffffff800c213bb0 to 0xffffff800c213cf0)
[ 490.310327] 3ba0: ffffffc07bf00000 ffffffc07ff00000
[ 490.318143] 3bc0: 0000000000000040 000000000000003f 0000000000000001 ffffffc078090c10
[ 490.325955] 3be0: 000000007bf00000 0000000004000000 ffffffc001573280 ffffff800c213d58
[ 490.333767] 3c00: 0101010101010101 0000000000000000 0101010101010101 00000000004b5474
[ 490.341579] 3c20: 0000007fbb36cdc0 0000000000000008 ffffff80081450b8 0000007fbb41a128
[ 490.349391] 3c40: 0000000000000844 000000007bf00000 ffffffc078090c10 0000000004000000
[ 490.357203] 3c60: 0000000000000000 ffffff800c213eb8 0000000000000015 0000000000000124
[ 490.365015] 3c80: 0000000000000040 ffffff8008441000 ffffffc07865cd00 ffffff800c213cf0
[ 490.372827] 3ca0: ffffff800808d15c ffffff800c213cf0 ffffff800808e91c 0000000080000145
[ 490.380639] 3cc0: ffffff800c213ce0 ffffff8008435668 0000008000000000 ffffff800823e998
[ 490.388450] 3ce0: ffffff800c213cf0 ffffff800808e91c
[ 490.393309] [<ffffff800808e91c>] __clean_dcache_area_poc+0x20/0x38
[ 490.399480] [<ffffff8000448890>] udmabuf_set_sync_for_device+0xa0/0xe8 [udmabuf]
[ 490.406854] [<ffffff8008294af8>] dev_attr_store+0x18/0x28
[ 490.412232] [<ffffff800819cb78>] sysfs_kf_write+0x38/0x50
[ 490.417612] [<ffffff800819bcc4>] kernfs_fop_write+0x11c/0x184
[ 490.423343] [<ffffff8008144c4c>] __vfs_write+0x1c/0xf8
[ 490.428462] [<ffffff8008144ee4>] vfs_write+0xac/0x160
[ 490.433496] [<ffffff80081450fc>] SyS_write+0x44/0x88
[ 490.438443] Exception stack(0xffffff800c213ec0 to 0xffffff800c214000)
[ 490.444869] 3ec0: 0000000000000001 00000000004f9260 0000000000000002 0000007fbb4ad000
[ 490.452684] 3ee0: 0000000000650031 0000000000000000 0080008080808080 7f7f7f7f7f7f7f7f
[ 490.460496] 3f00: 0000000000000040 fffffffffffffff0 0101010101010101 0000000000000000
[ 490.468308] 3f20: 0101010101010101 00000000004b5474 0000007fbb36cdc0 0000000000000008
[ 490.476120] 3f40: 00000000004f35d8 0000007fbb41a128 0000000000000844 0000000000000001
[ 490.483932] 3f60: 00000000004f9260 0000000000000002 00000000004f4000 00000000004f9260
[ 490.491744] 3f80: 0000000000000020 00000000004f4000 0000000000000000 00000000004f5688
[ 490.499556] 3fa0: 0000000000000000 0000007fdc3d2260 000000000040dcac 0000007fdc3d2260
[ 490.507368] 3fc0: 0000007fbb41a150 0000000080000000 0000000000000001 0000000000000040
[ 490.515180] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 490.522993] [<ffffff8008082c70>] el0_svc_naked+0x24/0x28
[ 490.528285] Code: 9ac32042 8b010001 d1000443 8a230000 (d50b7a20)
[ 490.534360] ---[ end trace 7646d5a8a18a8b56 ]---
Am I doing something fundamentally wrong?
Thank you for the issue.
I reproduced this issue with the following composition.
- UltraZed (ARM64) + Linux 4.9.0 https://github.com/ikwzm/ZynqMP-FPGA-Linux (v0.1.6)
- ZYBO-Z 7 (ARM) + Linux 4.14.34 https://github.com/ikwzm/FPGA-SoC-Linux (V0.8.0)
In either case, it was possible to reproduce that Kernel becomes Panic.
However, there was a slight difference between ARM64 + Linux 4.9.0 and ARM + Linux 4.14.34.
In case of ARM64 + Linux 4.9.0, insmod udmabuf.ko fails unless no-map
is specified.
reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;
dma_region: dmaregion@7bf00000 {
compatible = "shared-dma-pool";
reg = <0x0 0x7bf00000 0x0 0x04000000>;
};
};
dma:dma@7bf00000 {
compatible = "ikwzm,udmabuf-0.10.a";
device-name = "dma_region";
minor-number = <0>;
memory-region = <&dma_region>;
size = <0x04000000>;
};
root@debian-fpga:~# insmod udmabuf
[ 215.085469] memremap attempted on ram 0x000000007bf00000 size: 0x4000000
[ 215.092123] ------------[ cut here ]------------
[ 215.096695] WARNING: CPU: 2 PID: 3530 at kernel/memremap.c:111 memremap+0x14c/0x158
[ 215.104321] Modules linked in: udmabuf(O+) fclkcfg(O) uio_pdrv_genirq
[ 215.110743]
[ 215.112224] CPU: 2 PID: 3530 Comm: insmod Tainted: G O 4.9.0-xlnx-v2017.3-zynqmp-fpga #2
[ 215.121420] Hardware name: ZynqMP UltraZed-EG IO Carrier Card (DT)
[ 215.127584] task: ffffffc066bb7000 task.stack: ffffffc068f54000
[ 215.133488] PC is at memremap+0x14c/0x158
[ 215.137480] LR is at memremap+0x14c/0x158
[ 215.141472] pc : [<ffffff800812e484>] lr : [<ffffff800812e484>] pstate: 00000145
[ 215.148849] sp : ffffffc068f57950
[ 215.152148] x29: ffffffc068f57950 x28: ffffff8009564000
[ 215.157442] x27: 0000000000000001 x26: ffffffc068f57a18
[ 215.162737] x25: 000000007bf00000 x24: 000000007bf00000
[ 215.168032] x23: ffffffc066abfb80 x22: ffffff8008d37ee8
[ 215.173327] x21: 0000000004000000 x20: 0000000004000000
[ 215.178621] x19: 0000000000000004 x18: 0000000000000010
[ 215.183916] x17: 0000000000000000 x16: 00000000ffffffff
[ 215.189211] x15: ffffff8088cf88d7 x14: 0000000000000006
[ 215.194506] x13: ffffff8008cf88e5 x12: 0000000000000007
[ 215.199801] x11: 0000000000000006 x10: 000000000000015a
[ 215.205096] x9 : 000000000000004c x8 : 3030347830203a65
[ 215.210391] x7 : 7a69732030303030 x6 : ffffff8008cf8923
[ 215.215685] x5 : 0000000000000000 x4 : 0000000000000000
[ 215.220980] x3 : 0000000000000000 x2 : ffffffc07ffa76b8
[ 215.226275] x1 : 0000004077373000 x0 : 000000000000003c
[ 215.231569]
[ 215.233046] ---[ end trace 7d3980cdbaae651f ]---
[ 215.237647] Call trace:
[ 215.240080] Exception stack(0xffffffc068f57780 to 0xffffffc068f578b0)
[ 215.246503] 7780: 0000000000000004 0000008000000000 ffffffc068f57950 ffffff800812e484
[ 215.254315] 77a0: ffffff8008cfacf8 000000000000003c ffffffc068f577d0 ffffff80080d8804
[ 215.262127] 77c0: ffffff8008cf8368 ffffff8008ab82d8 ffffffc068f57870 ffffff80080d8b18
[ 215.269939] 77e0: 0000000000000004 0000000004000000 0000000004000000 ffffff8008d37ee8
[ 215.277751] 7800: ffffffc066abfb80 000000007bf00000 000000007bf00000 ffffffc068f57a18
[ 215.285563] 7820: 000000000000003c 0000004077373000 ffffffc07ffa76b8 0000000000000000
[ 215.293375] 7840: 0000000000000000 0000000000000000 ffffff8008cf8923 7a69732030303030
[ 215.301187] 7860: 3030347830203a65 000000000000004c 000000000000015a 0000000000000006
[ 215.308999] 7880: 0000000000000007 ffffff8008cf88e5 0000000000000006 ffffff8088cf88d7
[ 215.316810] 78a0: 00000000ffffffff 0000000000000000
[ 215.321672] [<ffffff800812e484>] memremap+0x14c/0x158
[ 215.326710] [<ffffff8008502e84>] dma_init_coherent_memory+0x5c/0x190
[ 215.333043] [<ffffff800850345c>] rmem_dma_device_init+0x64/0x98
[ 215.338948] [<ffffff80086ce7dc>] of_reserved_mem_device_init_by_idx+0x104/0x1a8
[ 215.346250] [<ffffff8000959a70>] udmabuf_platform_driver_probe+0x248/0x898 [udmabuf]
[ 215.353963] [<ffffff80084ee150>] platform_drv_probe+0x50/0xb8
[ 215.359693] [<ffffff80084ec614>] driver_probe_device+0x1fc/0x2a8
[ 215.365680] [<ffffff80084ec76c>] __driver_attach+0xac/0xb0
[ 215.371150] [<ffffff80084ea630>] bus_for_each_dev+0x60/0xa0
[ 215.376704] [<ffffff80084ebe00>] driver_attach+0x20/0x28
[ 215.381999] [<ffffff80084eba00>] bus_add_driver+0x1d0/0x238
[ 215.387554] [<ffffff80084ecf40>] driver_register+0x60/0xf8
[ 215.393022] [<ffffff80084ee08c>] __platform_driver_register+0x44/0x50
[ 215.399454] [<ffffff800095f19c>] udmabuf_module_init+0x19c/0x1000 [udmabuf]
[ 215.406391] [<ffffff80080830b8>] do_one_initcall+0x38/0x128
[ 215.411945] [<ffffff800812eb5c>] do_init_module+0x5c/0x1c8
[ 215.417416] [<ffffff8008104e00>] load_module+0x1c68/0x2030
[ 215.422883] [<ffffff8008105448>] SyS_finit_module+0xd8/0xe8
[ 215.428437] [<ffffff8008082ef0>] el0_svc_naked+0x24/0x28
[ 215.433756] Reserved memory: failed to init DMA memory pool at 0x000000007bf00000, size 64 MiB
[ 215.469341] udmabuf dma_region: major number = 243
[ 215.474237] udmabuf dma_region: minor number = 0
[ 215.479017] udmabuf dma_region: phys address = 0x000000006c000000
[ 215.485256] udmabuf dma_region: buffer size = 67108864
[ 215.490641] udmabuf dma_region: dma coherent = 0
[ 215.495412] udmabuf dma@7bf00000: driver installed.
However, for ARM + Linux 4.14.34 insmod udmabuf.ko succeeded if reusable
was specified instead of no-map
.
reserved-memory {
#address-cells = <0x1>;
#size-cells = <0x1>;
ranges;
image_buf@0 {
compatible = "shared-dma-pool";
reg = <0x20000000 0x01000000>;
reusable;
phandle = <0x27>;
};
};
udmabuf@0 {
compatible = "ikwzm,udmabuf-0.10.a";
device-name = "udmabuf0";
minor-number = <0x0>;
size = <0x00F00000>;
memory-region = <0x27>;
};
root@debian-fpga:~# insmod udmabuf
[ 26.969844] udmabuf udmabuf@0: assigned reserved memory node image_buf@0
[ 27.020800] udmabuf udmabuf0: major number = 245
[ 27.025514] udmabuf udmabuf0: minor number = 0
[ 27.030114] udmabuf udmabuf0: phys address = 0x20000000
[ 27.035549] udmabuf udmabuf0: buffer size = 15728640
[ 27.040745] udmabuf udmabuf0: dma coherent = 0
[ 27.045303] udmabuf udmabuf@0: driver installed.
Also, sync_for_device
will also terminate normally.
root@debian-fpga:~# echo 1 > /sys/class/udmabuf/udmabuf0/sync_for_device
root@debian-fpga:~#
I do not seem to understand the mechanism of reserved-memory yet. I am not confident that I can solve this issue.
Until this issue is resolved, ARM64 recommends not using reserved-memory.
This is follow-up report.
When using reserved-memory of Linux Kernel from a DMA device, you need to specify either no-map or reuseable for device-tree.
In case of no-map
, the DMA memory pool mechanism in driver/base/dma-coherent.c
is used.
In case of reuseable
, the CMA memory pool mechanism in driver/base/dma-contiguous.c
is used.
The Kernel Panic pointed out in this issue occurs with the no-map
(that is, the DMA memory pool mechanism). I do not know the cause of Kernel Panic yet.
If you use udmabuf with reserved-memory, specify reusable
and use the CMA memory pool mechanism.
When using the CMA memory pool mechanism, please set the device-tree as follows.
reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;
dma_region: dmaregion@7c000000 {
compatible = "shared-dma-pool";
reg = <0x0 0x7C000000 0x0 0x04000000>;
reusable;
};
};
dma:dma@7c000000 {
compatible = "ikwzm,udmabuf-0.10.a";
device-name = "dma_region";
minor-number = <0>;
memory-region = <&dma_region>;
size = <0x04000000>;
};
When using the CMA memory pool mechanism, in addition to specifying reusable
, care must be taken for address and size alignment.