openbmc / linux

OpenBMC Linux kernel source tree

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Panic on accessing MTD file descriptor after unbinding aspeed-smc driver

zevweiss opened this issue · comments

The following script triggers a kernel panic:

#!/bin/bash

mtddev=/dev/mtd/bmc
spidev=1e620000.spi
cp /usr/bin/{hexdump,head} /lib/{lib[cm].so.6,ld-linux.so.3} /tmp
export LD_LIBRARY_PATH=/tmp
echo opening mtd FD... > /dev/kmsg
exec {mtdfd}<$mtddev
echo unbinding aspeed-smc... > /dev/kmsg
echo $spidev > /sys/bus/platform/drivers/aspeed-smc/unbind
echo reading mtd FD... > /dev/kmsg
/tmp/ld-linux.so.3 /tmp/hexdump -C /dev/fd/$mtdfd | /tmp/ld-linux.so.3 /tmp/head

dmesg output:

[  125.768937] opening mtd FD...
[  125.776496] unbinding aspeed-smc...
[  125.781074] Deleting MTD partitions on "bmc":
[  125.785617] Deleting u-boot MTD partition
[  125.812654] Deleting u-boot-env MTD partition
[  125.856276] Deleting kernel MTD partition
[  125.886736] Deleting rofs MTD partition
[  125.947813] Deleting rwfs MTD partition
[  125.973831] Removing MTD device #5 (rwfs) with use count 1
[  125.979384] Error when deleting partition "rwfs" (-16)
[  126.033799] reading mtd FD...
[  126.126175] 8<--- cut here ---
[  126.129289] Unable to handle kernel paging request at virtual address a4000000
[  126.136537] pgd = 43c33ffc
[  126.139274] [a4000000] *pgd=00000000
[  126.142889] Internal error: Oops: 5 [#1] ARM
[  126.147188] CPU: 0 PID: 339 Comm: ld-linux.so.3 Not tainted 5.14.11-7ee2d5b-dirty-78027c9 #1
[  126.155647] Hardware name: Generic DT based system
[  126.160447] PC is at mmiocpy+0x48/0x330
[  126.164330] LR is at aspeed_smc_read+0x5c/0x214
[  126.168906] pc : [<807a6788>]    lr : [<8051fc74>]    psr: 20000013
[  126.175180] sp : 84affd0c  ip : 00000000  fp : 84affd64
[  126.180413] r10: 854b0280  r9 : 00000000  r8 : 8277d000
[  126.185647] r7 : 8277d000  r6 : 00001000  r5 : 854b0048  r4 : 854b0020
[  126.192183] r3 : ffffffff  r2 : 00000f80  r1 : a4000000  r0 : 8277d000
[  126.198718] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  126.205864] Control: 00c5387d  Table: 84808008  DAC: 00000051
[  126.211614] Register r0 information: slab kmalloc-4k start 8277d000 pointer offset 0 size 4096
[  126.220291] Register r1 information: non-paged memory
[  126.225368] Register r2 information: non-paged memory
[  126.230438] Register r3 information: non-paged memory
[  126.235508] Register r4 information: slab kmalloc-1k start 854b0000 pointer offset 32 size 1024
[  126.244262] Register r5 information: slab kmalloc-1k start 854b0000 pointer offset 72 size 1024
[  126.253014] Register r6 information: non-paged memory
[  126.258085] Register r7 information: slab kmalloc-4k start 8277d000 pointer offset 0 size 4096
[  126.266744] Register r8 information: slab kmalloc-4k start 8277d000 pointer offset 0 size 4096
[  126.275401] Register r9 information: NULL pointer
[  126.280123] Register r10 information: slab kmalloc-1k start 854b0000 pointer offset 640 size 1024
[  126.289041] Register r11 information: non-slab/vmalloc memory
[  126.294807] Register r12 information: NULL pointer
[  126.299615] Process ld-linux.so.3 (pid: 339, stack limit = 0xec53d594)
[  126.306171] Stack: (0x84affd0c to 0x84b00000)
[  126.310558] fd00:                            854b0048 00001000 8277d000 8277d000 8277d000
[  126.318757] fd20: 854b0020 8051fc74 802a19c0 80bc1cc8 0004f040 d1696089 84989c40 00000000
[  126.326945] fd40: 854b0048 00000000 00001000 8277d000 84affe1c 854b0280 84affd9c 84affd68
[  126.335136] fd60: 80517090 8051fc24 00001000 8277d000 84affdb8 00000000 84affe1c 854b0048
[  126.343325] fd80: 00000000 00000000 84affe14 00000000 84affdc4 84affda0 8050bdcc 80517020
[  126.351516] fda0: 00001000 84affe1c 8277d000 00000000 854b0048 00000000 84affe04 84affdc8
[  126.359705] fdc0: 8050de34 8050bd34 84affe14 00000001 0000002c 00000000 00000000 854b0048
[  126.367897] fde0: 00000000 84affe14 84affe7c 8277d000 00000000 84afe000 84affe5c 84affe08
[  126.376086] fe00: 8050df64 8050ddb0 84affe14 8025b854 00000010 00000000 00001000 00000000
[  126.384275] fe20: 00000000 00000000 00000000 8277d000 00000000 d1696089 00001000 84afff68
[  126.392465] fe40: 01ae3520 854b0048 8277d000 84ade000 84affed4 84affe60 805129c4 8050df00
[  126.400657] fe60: 00001000 84affe7c 8277d000 00000000 00000000 00000000 824eb390 01ae4000
[  126.408846] fe80: 00001000 84afffb0 84affeb4 84affe98 8010defc 8010cb88 84affeb4 85dff34f
[  126.417035] fea0: 80243988 d1696089 84afff2c 00001000 84863a00 01ae3520 84afff68 00000001
[  126.425227] fec0: 80512898 00000000 84afff64 84affed8 80293664 805128a4 00000255 84808068
[  126.433415] fee0: 84808068 00000000 00000000 00000000 824eb390 84989c7c 00000000 d1696089
[  126.441605] ff00: 84afff2c 8010a62c 01ae4524 00000817 84989c40 d1696089 84afffb0 01ae4524
[  126.449795] ff20: 84afff74 84afff30 8010a62c 8014ef88 00000010 d1696089 350c5282 84863a00
[  126.457986] ff40: 84863a00 00000000 00000000 80100224 84afe000 00000000 84afff94 84afff68
[  126.466179] ff60: 802940ac 802935b8 00000000 00000000 8010aa2c d1696089 76d974a8 000005e8
[  126.474365] ff80: 76f05120 00000003 84afffa4 84afff98 80294150 80294048 00000000 84afffa8
[  126.482555] ffa0: 80100040 80294144 76d974a8 000005e8 00000000 01ae3520 00001000 00000000
[  126.490745] ffc0: 76d974a8 000005e8 76f05120 00000003 00000010 76d97ca4 01ae34f0 01ae3190
[  126.498934] ffe0: fbad2488 7ebd1b28 76c9ebbc 76d087e4 60000010 00000000 00000000 00000000
[  126.507114] Backtrace:
[  126.509583] [<8051fc18>] (aspeed_smc_read) from [<80517090>] (spi_nor_read+0x7c/0x1ac)
[  126.517583]  r10:854b0280 r9:84affe1c r8:8277d000 r7:00001000 r6:00000000 r5:854b0048
[  126.525422]  r4:00000000
[  126.527966] [<80517014>] (spi_nor_read) from [<8050bdcc>] (mtd_read_oob_std+0xa4/0xac)
[  126.535947]  r10:00000000 r9:84affe14 r8:00000000 r7:00000000 r6:854b0048 r5:84affe1c
[  126.543784]  r4:00000000
[  126.546330] [<8050bd28>] (mtd_read_oob_std) from [<8050de34>] (mtd_read_oob+0x90/0x150)
[  126.554382]  r5:00000000 r4:854b0048
[  126.557970] [<8050dda4>] (mtd_read_oob) from [<8050df64>] (mtd_read+0x70/0xa0)
[  126.565238]  r10:84afe000 r9:00000000 r8:8277d000 r7:84affe7c r6:84affe14 r5:00000000
[  126.573078]  r4:854b0048
[  126.575624] [<8050def4>] (mtd_read) from [<805129c4>] (mtdchar_read+0x12c/0x2ec)
[  126.583074]  r9:84ade000 r8:8277d000 r7:854b0048 r6:01ae3520 r5:84afff68 r4:00001000
[  126.590816] [<80512898>] (mtdchar_read) from [<80293664>] (vfs_read+0xb8/0x2e0)
[  126.598180]  r10:00000000 r9:80512898 r8:00000001 r7:84afff68 r6:01ae3520 r5:84863a00
[  126.606013]  r4:00001000
[  126.608557] [<802935ac>] (vfs_read) from [<802940ac>] (ksys_read+0x70/0xfc)
[  126.615565]  r10:00000000 r9:84afe000 r8:80100224 r7:00000000 r6:00000000 r5:84863a00
[  126.623408]  r4:84863a00
[  126.625951] [<8029403c>] (ksys_read) from [<80294150>] (sys_read+0x18/0x1c)
[  126.632957]  r7:00000003 r6:76f05120 r5:000005e8 r4:76d974a8
[  126.638621] [<80294138>] (sys_read) from [<80100040>] (ret_fast_syscall+0x0/0x58)
[  126.646137] Exception stack(0x84afffa8 to 0x84affff0)
[  126.651215] ffa0:                   76d974a8 000005e8 00000000 01ae3520 00001000 00000000
[  126.659409] ffc0: 76d974a8 000005e8 76f05120 00000003 00000010 76d97ca4 01ae34f0 01ae3190
[  126.667592] ffe0: fbad2488 7ebd1b28 76c9ebbc 76d087e4
[  126.672670] Code: ba000002 f5d1f03c f5d1f05c f5d1f07c (e8b151f8)
[  126.678787] ---[ end trace e5ac4959ebfd2398 ]---
[  126.683423] Kernel panic - not syncing: Fatal exception

I'm looking into possible fixes, but figured I'd file it here in case anyone else has any input (and so it doesn't get lost/forgotten).

Unfortunately, the platform code ignores the returned value of the remove handler.

Yeah, there's no failure path for the unbind operation...commit e5e1c20 acknowledges that the driver core ignores the remove operation's return value, but it doesn't sound like there's any movement toward that changing, and if anything sounds like things are going the opposite direction ("The right thing to do would be to make struct platform_driver::remove() return void."). I'm not sure what the reasoning is there; there doesn't appear to have been any further discussion at the time on the list: https://lore.kernel.org/all/20210207211537.19992-1-uwe@kleine-koenig.org/.

So it seems like the intended model is "remove() operations must succeed", but that leaves us in a bit of an awkward situation.

After reading through some code, it looks like this particular problem stems (more immediately) from the aspeed-smc driver allocating the struct spi_nor via devm_kzalloc() (as do all the other spi-nor controller drivers, incidentally). The unbind operation (because it can't be failed) goes through the all the devres cleanup hooks registered for the device and hence deallocates those allocations; the struct spi_nor embeds the struct mtd_info that the read/write paths in mtdchar.c access via file->private_data->mtd, but at that point that pointer is of course stale.

Thinking out loud:

  • Some plumbing could probably be arranged to invalidate the struct file's pointer to the mtd_info and check it before following it in the read/write path, but that just seems to turn it into a synchronization problem instead (if a read races with an unbind you could still easily end up with the same problem). The relevant synchronization point there appears to be mtd_table_mutex (I think?), but acquiring that on every read/write operation just to check for the edge case of the driver having been unbound seems...ugly.
  • As an alternative to invalidating the pointer, some sort of refcounting to keep the object alive while the struct file still refers to it? That seems to weaken the semantics of the unbind operation though; I'd think the FD shouldn't just keep working after the driver's been unbound.

I have another experimental Aspeed SMC driver using spimem and it generates the same kind of issues.

[58555.806133] opening mtd FD...
[58555.808770] unbinding aspeed-smc...
[58555.810676] 8<--- cut here ---
[58555.810885] Unable to handle kernel NULL pointer dereference at virtual address 00000230
[58555.811257] pgd = f7697319
[58555.811422] [00000230] *pgd=00000000
[58555.811944] Internal error: Oops: 5 [#1] ARM
[58555.812225] Modules linked in:
[58555.812487] CPU: 0 PID: 1101 Comm: sh Not tainted 5.14.11-00170-gd9563a2fe5e9-dirty #309
[58555.812918] Hardware name: Generic DT based system
[58555.813205] PC is at klist_next+0x10/0xac
[58555.813420] LR is at device_for_each_child+0x3c/0x98
[58555.813665] pc : [<801f38f8>]    lr : [<802407b0>]    psr: 20000013
[58555.813953] sp : 85f03e60  ip : 80695e84  fp : 00000000
[58555.814195] r10: 00000000  r9 : 85f02000  r8 : 8499df10
[58555.814452] r7 : 00000000  r6 : 00000000  r5 : 85f03e7c  r4 : 85407520
[58555.814749] r3 : 00000000  r2 : 80277e4c  r1 : 85f03e7c  r0 : 00000200
[58555.815094] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[58555.815387] Control: 00093177  Table: 461f0000  DAC: 00000051
[58555.815673] Register r0 information: non-paged memory
[58555.816024] Register r1 information: non-slab/vmalloc memory
[58555.816362] Register r2 information: non-slab/vmalloc memory
[58555.816629] Register r3 information: NULL pointer
[58555.816884] Register r4 information: slab kmalloc-2k start 85407000 pointer offset 1312 size 2048
[58555.817656] Register r5 information: non-slab/vmalloc memory
[58555.817926] Register r6 information: NULL pointer
[58555.818216] Register r7 information: NULL pointer
[58555.818507] Register r8 information: slab kmalloc-192 start 8499df00 pointer offset 16 size 192
[58555.819059] Register r9 information: non-slab/vmalloc memory
[58555.819429] Register r10 information: NULL pointer
[58555.819665] Register r11 information: NULL pointer
[58555.819926] Register r12 information: non-slab/vmalloc memory
[58555.820288] Process sh (pid: 1101, stack limit = 0x60a248fb)
[58555.820632] Stack: (0x85f03e60 to 0x85f04000)
[58555.821108] 3e60: 85407520 00000000 80277e4c 00000000 8499df10 802407b0 85407520 00000200
[58555.821531] 3e80: 00000000 8064ba28 85407520 80daf810 00000000 802769d0 00000000 80daf810
[58555.822018] 3ea0: 80677dd0 8027b7cc 80daf410 80248bf0 80daf410 802473c0 0000000d 80daf410
[58555.822436] 3ec0: 80677dd0 806756cc 8499df10 80245608 0000000d 84ac0660 8499df00 85f03f30
[58555.822886] 3ee0: 8499df10 801681b4 00000000 00000000 00000000 845dc6e0 0000000d 85f03f78
[58555.823292] 3f00: 014bc4e8 800fab20 0000000d 000001b5 014bc4e8 0000000d 00000100 00000000
[58555.823694] 3f20: 00000000 85f03f18 00000000 00000000 845dc6e0 00000000 00000000 00000000
[58555.824137] 3f40: 00000000 00000000 00000000 00000000 00000000 00000000 845dca94 8064ba28
[58555.824584] 3f60: 845dc6e0 014bc4e8 85f03f78 85f03f84 0000000d 800facb8 00000000 00000000
[58555.825032] 3f80: 845dca40 845dc6e0 00000000 8064ba28 00000001 76f76080 76f01b40 00000004
[58555.825481] 3fa0: 80008644 80008460 00000001 76f76080 00000001 014bc4e8 0000000d 00000000
[58555.825930] 3fc0: 00000001 76f76080 76f01b40 00000004 76f020f8 76f01c60 0055a850 00000000
[58555.826326] 3fe0: 0054e720 7eb0a9d8 76e807f0 76e8080c 60000010 00000001 00000000 00000000
[58555.826813] [<801f38f8>] (klist_next) from [<802407b0>] (device_for_each_child+0x3c/0x98)
[58555.827191] [<802407b0>] (device_for_each_child) from [<802769d0>] (spi_unregister_controller+0x1c/0xf4)
[58555.827560] [<802769d0>] (spi_unregister_controller) from [<8027b7cc>] (aspeed_smc_remove+0x10/0x34)
[58555.827960] [<8027b7cc>] (aspeed_smc_remove) from [<80248bf0>] (platform_remove+0x20/0x4c)
[58555.828322] [<80248bf0>] (platform_remove) from [<802473c0>] (device_release_driver_internal+0xb8/0x1a8)
[58555.828734] [<802473c0>] (device_release_driver_internal) from [<80245608>] (unbind_store+0x44/0x68)
[58555.829130] [<80245608>] (unbind_store) from [<801681b4>] (kernfs_fop_write_iter+0xe4/0x194)
[58555.829507] [<801681b4>] (kernfs_fop_write_iter) from [<800fab20>] (vfs_write+0x14c/0x1a8)
[58555.829878] [<800fab20>] (vfs_write) from [<800facb8>] (ksys_write+0x74/0xc4)
[58555.830198] [<800facb8>] (ksys_write) from [<80008460>] (ret_fast_syscall+0x0/0x54)
[58555.830597] Exception stack(0x85f03fa8 to 0x85f03ff0)
[58555.830925] 3fa0:                   00000001 76f76080 00000001 014bc4e8 0000000d 00000000
[58555.831322] 3fc0: 00000001 76f76080 76f01b40 00000004 76f020f8 76f01c60 0055a850 00000000
[58555.831729] 3fe0: 0054e720 7eb0a9d8 76e807f0 76e8080c
[58555.832332] Code: e92d41f0 e1a05000 e5900000 e5956004 (e5907030) 
[58555.833267] ---[ end trace 1c6243d4aa66fcdc ]---

Interesting...though it looks like in that case the panic happened during the unbind, instead of on an FD access after it?