mcusim / freebsd-src

sys/dev/dpaa2 drivers work-in-progress

Home Page:https://www.FreeBSD.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dpaa2_ni_rx panic: dpaa2_ni_rx: unexpected frame buffer fd_addr != buf_paddr

mcbridematt opened this issue · comments

Commit: 173aa2a

I ran into this twice when running my stresstest for long periods of time (>1 hour)

panic: dpaa2_ni_rx: unexpected frame buffer: fd_addr(0x305800008c900000) != buf_paddr(0x3058000088ccf000)
cpuid = 5
time = 1652662301
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
kdb_backtrace() at kdb_backtrace+0x38
vpanic() at vpanic+0x17c
panic() at panic+0x44
dpaa2_ni_rx() at dpaa2_ni_rx+0x26c
dpaa2_ni_poll_task() at dpaa2_ni_poll_task+0x1b0
taskqueue_run_locked() at taskqueue_run_locked+0xac
taskqueue_thread_loop() at taskqueue_thread_loop+0xc8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 0 tid 100118 ]
Stopped at      kdb_enter+0x40: undefined       f902027f

First crash trace;

(kgdb) frame 3
#3  0xffff0000007d3394 in dpaa2_ni_rx (chan=0xffff0000fd6f8000, fq=<optimized out>, fd=0xffff0000fda42020) at /usr/src/freebsd-src/sys/dev/dpaa2/dpaa2_ni.c:2630
2630            KASSERT(paddr == buf->paddr, ("%s: unexpected frame buffer: "
(kgdb) info locals
released = {0, 0, 0, 8589934592, 18446462598741807642, 18446462602928396336, 18446462598741044400}
ifp = <optimized out>
sc = <optimized out>
paddr = 3483534314129326080
released_n = 0
buf = <optimized out>
buf_chan = 0xec36f06f7149058a
buf_idx = <optimized out>
m = <optimized out>
buf_len = <optimized out>
buf_data = <optimized out>
error = <optimized out>
bp_dev = <optimized out>
bpsc = <optimized out>
chan_idx = <optimized out>
(kgdb) frame 4
#4  0xffff0000007d2da8 in dpaa2_ni_consume_frames (chan=0xffff0000fd6f8000, src=<optimized out>, consumed=<optimized out>) at /usr/src/freebsd-src/sys/dev/dpaa2/dpaa2_ni.c:2568
2568                                    fq->consume(chan, fq, fd);
(kgdb) info locals
retries = <optimized out>
fq = 0x80
rc = 36
dq = 0xffff0000fda42000
fd = 0xffff0000008c5b8a
frames = <optimized out>
(kgdb) print *dq
$4 = {{common = {verb = 96 '`', _reserved = "\223\000\000\000\000\000̖", '\000' <repeats 16 times>, "\321s\375\000\000\377\377\000P\347\070\202\000\024\000\352\005\000\000\000\000\300\000\000\200\000 \000\000\200\a\000\000\000\000\000\000\000"}, fdr = {desc = {
        verb = 96 '`', stat = 147 '\223', seqnum = 0, oprid = 0, _reserved = 0 '\000', tok = 204 '\314', fqid = 150, _reserved1 = 0, fq_byte_cnt = 0, fq_frm_cnt = 0, fqd_ctx = 18446462602985066752}, fd = {addr = 5630058834644992, data_length = 1514, bpid_ivp_bmt = 0,
        offset_fmt_sl = 192, frame_ctx = 536903680, ctrl = 125829120, flow_ctx = 0}}, scn = {verb = 96 '`', stat = 147 '\223', state = 0 '\000', _reserved = 0 '\000', rid_tok = 3422552064, ctx = 150}}}
(kgdb) print *fd
$5 = {addr = 7165916604720706863, data_length = 1701996079, bpid_ivp_bmt = 25189, offset_fmt_sl = 25715, frame_ctx = 1668444973, ctrl = 1937339183, flow_ctx = 7307986971750918959}
(kgdb) print *fq
Cannot access memory at address 0x80
(kgdb) print fd
$6 = (struct dpaa2_fd *) 0

Second time:

#4  0xffff0000007d2da8 in dpaa2_ni_consume_frames (chan=0xffff0000fc616000, src=<optimized out>, consumed=<optimized out>) at /usr/src/freebsd-src/sys/dev/dpaa2/dpaa2_ni.c:2568
2568                                    fq->consume(chan, fq, fd);
(kgdb) info locals
retries = <optimized out>
fq = 0x80
rc = 36
dq = 0xffff0000fcc58000
fd = 0xffff0000008c5b8a
frames = <optimized out>
(kgdb) print *dq
$1 = {{common = {verb = 96 '`', _reserved = "\022\000\000\000\000\000̵", '\000' <repeats 16 times>, "\215a\374\000\000\377\377\000\000=\214\000\000\270qB\000\000\000\000\000\300@\000\240\000 \000\000\001\000\000\000\000\000\000\000\000"}, fdr = {desc = {verb = 96 '`',
        stat = 18 '\022', seqnum = 0, oprid = 0, _reserved = 0 '\000', tok = 204 '\314', fqid = 181, _reserved1 = 0, fq_byte_cnt = 0, fq_frm_cnt = 0, fqd_ctx = 18446462602967092480}, fd = {addr = 8194299524353425408, data_length = 66, bpid_ivp_bmt = 0,
        offset_fmt_sl = 16576, frame_ctx = 536911872, ctrl = 65536, flow_ctx = 0}}, scn = {verb = 96 '`', stat = 18 '\022', state = 0 '\000', _reserved = 0 '\000', rid_tok = 3422552064, ctx = 181}}}
(kgdb) print *fd
$2 = {addr = 7165916604720706863, data_length = 1701996079, bpid_ivp_bmt = 25189, offset_fmt_sl = 25715, frame_ctx = 1668444973, ctrl = 1937339183, flow_ctx = 7307986971750918959}
commented

@mcbridematt Could you try to reproduce it with 34014de, for example? And with both GENERIC and GENERIC-NODEBUG kernel configurations?

Hit the same(?) problem, but this time in the tx path:

panic: dpaa2_ni_tx_conf: unexpected frame buffer: fd_addr(0x93a5e000) != txb_paddr(0x8cf27000)
cpuid = 5
time = 1656151823
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x13c
panic() at panic+0x44
dpaa2_ni_tx_conf() at dpaa2_ni_tx_conf+0x138
dpaa2_ni_poll_task() at dpaa2_ni_poll_task+0x160
taskqueue_run_locked() at taskqueue_run_locked+0x17c
taskqueue_thread_loop() at taskqueue_thread_loop+0xc8
fork_exit() at fork_exit+0x74
commented

@mcbridematt Could you test with 1a7aba9?

Unfortunately it still happens, I saw both the RX and TX assertions triggered testing today.

commented

@mcbridematt Could you try e95fb52? I've simplified software portals locking mechanism there and tested with several task threads to poll frames in dpaa2_ni_poll_task().

Looking good so far, no panic and no warnings/errors in dmesg when testing 4 ports and debug kernel over 9 hours.
I will try NODEBUG next.

commented

@mcbridematt btw, I noticed that you were using a network interface with several Rx queues/channels (custom DPL?). Could you try it as well?

@dsalychev It looks like NODEBUG is working fine as well :)

I'm pretty sure the multiple Rx queues is from the new DPL which has been default since Ten64 firmware v0.8.10, it was part of the method suggested to me by NXP that allows all 10 ports to balance traffic across all CPUs
https://forum.traverse.com.au/t/more-details-on-interrupt-balancing-dpaa2-config-dpio-splitting/114/4?u=mcbridematt

To be honest I haven't checked if Linux takes advantage of all Rx queues but I might go and check..

commented

@mcbridematt I recently started using multiple threads to receive frames: https://github.com/mcusim/freebsd-src/blob/lx2160acex7-dev/sys/dev/dpaa2/dpaa2_ni.c#L646-L648 That's why I'm interested :) I'll check my Ten64 firmware and try to stress my Ten64, thanks for info!

commented

I haven't noticed this panic on https://github.com/mcusim/freebsd-src/tree/ten64. @mcbridematt Could you confirm after your test?

commented

@mcbridematt Could you conduct the same stress test again on https://github.com/mcusim/freebsd-src/tree/dpaa2 ? The ten64 branch is stale now and almost all of the changes have found their way into the dpaa2 one.

Sorry, I should have closed this issue long ago. But it has definitely not reappeared in the latest code.