erikarn / athp

freebsd ath10k port

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

memory modified after free / taskq node_alloc/free{_cb}

bzfbd opened this issue · comments

commented

There's a problem with the taskqs in that node_free can run after node_alloc before the node_alloc_cb has been run from the taskq. By the time node_alloc_cb runs, the ni (ath10k_sta object). I believe it's the is_in_peer_table right after the ni based on the offsets.
All the none node_{alloc,free} changes to it seem to be properly covered by a STA check.

I added the lower half of the ni pointer to the free_cb with a 0xdead0000 << 32 prefix to be able to track them a bit better (just don't be confused as to where this address comes from; that's not the issue).

I've two more follow-up observations; I changed the is_in_peer_table to bool and the modified value changed from =1 to 0xdeadc001, which makes sense as the compiler only needs to alloc a char and not an int anymore.

I also have a patch to keep the node_alloc cb taskq on the ath10k_sta and then check on node_free if it's still on the queue, cancel it, and in that case directly go to the ni 80211 callback which will call free.
That alone however seems not to have fully solved the problem as I saw another instance after the change I believe. There might still be another race somewhere else; need to double-check locking and add asserts for that. It's become a lot harder to reproduce though and I am hitting other issues. I'll upload a pull request for this the next days if I don't hit it anymore.

athp0: athp_node_alloc: called; mac=00:80:48:3d:a3:48; ni 0xfffffe003d970000
athp0: athp_node_alloc: add peer for MAC 00:80:48:3d:a3:48 (0xfffffe003d970000)
athp0: athp_node_free: called; mac=00:80:48:3d:a3:48 (0xfffffe003d970000)
athp0: athp_node_free: delete peer for MAC 00:80:48:3d:a3:48 (0xfffffe003d970000)
athp0: athp_node_alloc: called; mac=00:24:21:28:8f:48; ni 0xfffffe003d970000
athp0: athp_node_alloc: add peer for MAC 00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_free: called; mac=00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_alloc_cb: added node for mac 00:80:48:3d:a3:48 (0xfffffe003d970000)
athp0: athp_node_free: delete peer for MAC 00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_alloc: called; mac=00:24:21:28:8f:48; ni 0xfffffe003d970000
athp0: athp_node_alloc: add peer for MAC 00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_free: called; mac=00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_free: delete peer for MAC 00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_alloc: called; mac=00:24:21:28:8f:48; ni 0xfffffe003d970000
athp0: athp_node_alloc: add peer for MAC 00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_free: called; mac=00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_free: delete peer for MAC 00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_free_cb: deleted node for mac 00:80:48:3d:a3:48 (0xdead00003d970000)
athp0: athp_node_alloc_cb: added node for mac 00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_free_cb: deleted node for mac 00:24:21:28:8f:48 (0xdead00003d970000)
athp0: athp_node_alloc_cb: added node for mac 00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_free_cb: deleted node for mac 00:24:21:28:8f:48 (0xdead00003d970000)
athp0: athp_node_alloc_cb: added node for mac 00:24:21:28:8f:48 (0xfffffe003d970000)
athp0: athp_node_free_cb: deleted node for mac 00:24:21:28:8f:48 (0xdead00003d970000)
Memory modified after free 0xfffffe003d970000(16376) val=1 @ 0xfffffe003d973300
panic: Most recently used by 80211node

cpuid = 2
time = 1490135785
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe003b149530
vpanic() at vpanic+0x182/frame 0xfffffe003b149580
panic() at panic+0x43/frame 0xfffffe003b1495e0
mtrash_ctor() at mtrash_ctor+0x81/frame 0xfffffe003b149600
item_ctor() at item_ctor+0x2bb/frame 0xfffffe003b149650
malloc() at malloc+0x99/frame 0xfffffe003b1496a0
athp_node_alloc() at athp_node_alloc+0x32/frame 0xfffffe003b1496e0
ieee80211_tmp_node() at ieee80211_tmp_node+0x1f/frame 0xfffffe003b149720
ieee80211_send_error() at ieee80211_send_error+0x59/frame 0xfffffe003b149750
hostap_input() at hostap_input+0x8fa/frame 0xfffffe003b1497e0
ieee80211_input_mimo() at ieee80211_input_mimo+0x208/frame 0xfffffe003b149890
ieee80211_input_mimo_all() at ieee80211_input_mimo_all+0x48/frame 0xfffffe003b1498d0
ath10k_process_rx() at ath10k_process_rx+0x3bc/frame 0xfffffe003b1499c0
ath10k_htt_txrx_compl_task() at ath10k_htt_txrx_compl_task+0x6f6/frame 0xfffffe003b149b00
taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe003b149b80
taskqueue_thread_loop() at taskqueue_thread_loop+0x94/frame 0xfffffe003b149bb0
fork_exit() at fork_exit+0x80/frame 0xfffffe003b149bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe003b149bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100127 ]
Stopped at      kdb_enter+0x37: movq    $0,0x1091706(%rip)

We fixed this, right? I'm pretty sure I addressed the underlying issue with this.

commented

I think you mangled this into net80211. It might make sense in the future to apply the changes, commit them, and then in a follow-up commit advance them. That way this thing will record the commit and we don't have to go digging 2.5 months later.