mcusim / freebsd-src

sys/dev/dpaa2 drivers work-in-progress

Home Page:https://www.FreeBSD.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"Failed to pull frames" when using multiple DPNIs

mcbridematt opened this issue · comments

Hardware: Ten64
MC firmware: 10.20
Commit: 6efa7d1

When more than one interface / DPNI is transferring data, the following errors appear in the system console / dmesg:

dpaa2_ni0: failed to pull frames: chan_id=15, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: failed to pull frames: chan_id=15, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: failed to pull frames: chan_id=15, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: failed to pull frames: chan_id=15, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: failed to pull frames: chan_id=15, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16
dpaa2_ni1: failed to pull frames: chan_id=23, error=16

An example use case is when the system is being used as a router between two network interfaces.
I don't see any evidence of packet loss which is good.

This message is printed by dpaa2_ni_poll_task around line 2367:

error = dpaa2_swp_pull(swp, chan->id, chan->store.paddr,
ETH_STORE_FRAMES);
if (error) {
device_printf(chan->ni_dev, "failed to pull frames: "
"chan_id=%d, error=%d\n", chan->id, error);
break;
}

commented

It seems those errors appear when different DPNIs use the same DPIO (struct dpaa2_swp). Driver keeps the software portal busy executing a Volatile Dequeue command for too long, i.e.

			/* Make VDQ command available again. */
			atomic_xchg(&swp->vdq.avail, 1);

is set too late, I think.
EDIT: It's my guess though. I'll check it and prepare a patch.

commented

@mcbridematt Could you try the latest commit? I don't have this error reported starting from efe105c.

@dsalychev Yes, no more errors after updating the kernel. I'll move some devices behind this machine and see how it goes.

FYI, for a system that did 4.9TB of traffic over 14 hours I still got a few warnings in dmesg:

dmesg | grep 'failed to pull frames'  | wc -l
     590

590 / 5TB is a very small rate, but I don't know enough to judge how important the warning message is.

commented

Could you show an output of sysctl dev.dpaa2_ni.0 (for dpni0) for all of the interfaces reported the errors? I'm particularly interested in

dev.dpaa2_ni.0.stats.in_discarded_frames: 18
dev.dpaa2_ni.0.stats.in_nobuf_discards: 0

After a 1hr iperf run that logged around 70 failed to pull frames messages, none of the interfaces had discards

dev.dpaa2_ni.1.stats.in_all_frames: 75934381
dev.dpaa2_ni.1.stats.in_all_bytes: 5011725630
dev.dpaa2_ni.1.stats.in_multi_frames: 0
dev.dpaa2_ni.1.stats.eg_all_frames: 157009142
dev.dpaa2_ni.1.stats.eg_all_bytes: 237634857392
dev.dpaa2_ni.1.stats.eg_multi_frames: 0
dev.dpaa2_ni.1.stats.in_filtered_frames: 0
dev.dpaa2_ni.1.stats.in_discarded_frames: 0
dev.dpaa2_ni.1.stats.in_nobuf_discards: 0
dev.dpaa2_ni.1.stats.tx_sg_frames: 157009292
dev.dpaa2_ni.1.stats.tx_single_buf_frames: 0
dev.dpaa2_ni.1.stats.rx_ieoi_err_frames: 0
dev.dpaa2_ni.1.stats.rx_enq_rej_frames: 0
dev.dpaa2_ni.1.stats.rx_sg_buf_frames: 0
dev.dpaa2_ni.1.stats.rx_single_buf_frames: 75934511
dev.dpaa2_ni.1.stats.rx_anomaly_frames: 0
dev.dpaa2_ni.1.channels.7.tx_dropped: 0
dev.dpaa2_ni.1.channels.7.tx_frames: 0
dev.dpaa2_ni.1.channels.6.tx_dropped: 0
dev.dpaa2_ni.1.channels.6.tx_frames: 0
dev.dpaa2_ni.1.channels.5.tx_dropped: 0
dev.dpaa2_ni.1.channels.5.tx_frames: 0
dev.dpaa2_ni.1.channels.4.tx_dropped: 0
dev.dpaa2_ni.1.channels.4.tx_frames: 0
dev.dpaa2_ni.1.channels.3.tx_dropped: 0
dev.dpaa2_ni.1.channels.3.tx_frames: 0
dev.dpaa2_ni.1.channels.2.tx_dropped: 0
dev.dpaa2_ni.1.channels.2.tx_frames: 0
dev.dpaa2_ni.1.channels.1.tx_dropped: 0
dev.dpaa2_ni.1.channels.1.tx_frames: 0
dev.dpaa2_ni.1.channels.0.tx_dropped: 0
dev.dpaa2_ni.1.channels.0.tx_frames: 157009437
dev.dpaa2_ni.1.%parent: dpaa2_rc0
dev.dpaa2_ni.1.%pnpinfo:
dev.dpaa2_ni.1.%location:
dev.dpaa2_ni.1.%driver: dpaa2_ni
dev.dpaa2_ni.1.%desc: DPAA2 Network Interface
dev.dpaa2_ni.2.stats.in_all_frames: 165393160
dev.dpaa2_ni.2.stats.in_all_bytes: 250312613260
dev.dpaa2_ni.2.stats.in_multi_frames: 0
dev.dpaa2_ni.2.stats.eg_all_frames: 48486702
dev.dpaa2_ni.2.stats.eg_all_bytes: 3200223070
dev.dpaa2_ni.2.stats.eg_multi_frames: 0
dev.dpaa2_ni.2.stats.in_filtered_frames: 0
dev.dpaa2_ni.2.stats.in_discarded_frames: 0
dev.dpaa2_ni.2.stats.in_nobuf_discards: 0
dev.dpaa2_ni.2.stats.tx_sg_frames: 48486702
dev.dpaa2_ni.2.stats.tx_single_buf_frames: 0
dev.dpaa2_ni.2.stats.rx_ieoi_err_frames: 0
dev.dpaa2_ni.2.stats.rx_enq_rej_frames: 0
dev.dpaa2_ni.2.stats.rx_sg_buf_frames: 0
dev.dpaa2_ni.2.stats.rx_single_buf_frames: 165392672
dev.dpaa2_ni.2.stats.rx_anomaly_frames: 0
dev.dpaa2_ni.2.channels.7.tx_dropped: 0
dev.dpaa2_ni.2.channels.7.tx_frames: 0
dev.dpaa2_ni.2.channels.6.tx_dropped: 0
dev.dpaa2_ni.2.channels.6.tx_frames: 0
dev.dpaa2_ni.2.channels.5.tx_dropped: 0
dev.dpaa2_ni.2.channels.5.tx_frames: 0
dev.dpaa2_ni.2.channels.4.tx_dropped: 0
dev.dpaa2_ni.2.channels.4.tx_frames: 0
dev.dpaa2_ni.2.channels.3.tx_dropped: 0
dev.dpaa2_ni.2.channels.3.tx_frames: 0
dev.dpaa2_ni.2.channels.2.tx_dropped: 0
dev.dpaa2_ni.2.channels.2.tx_frames: 0
dev.dpaa2_ni.2.channels.1.tx_dropped: 0
dev.dpaa2_ni.2.channels.1.tx_frames: 0
dev.dpaa2_ni.2.channels.0.tx_dropped: 0
dev.dpaa2_ni.2.channels.0.tx_frames: 48486702

(This is with the buffer commits reverted: 19d8245, 846462f, 48d302a)

commented

These are good news. I'll try to prepare a debug code to check whether those frames were processed at all and not dropped silently after an error returned by dpaa2_swp_pull().

commented

@mcbridematt Could you test with 1a7aba9?

@dsalychev I now see a few 'timeout to consume frames' errors as well, is that expected?

dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=16, error=16
dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=23, error=16
dpaa2_ni0: dpaa2_ni_poll_task: timeout to consume frames: chan_id=23
dpaa2_ni1: dpaa2_ni_poll_task: failed to pull frames: chan_id=4, error=16
dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=16, error=16
dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=23, error=16
dpaa2_ni1: dpaa2_ni_poll_task: timeout to consume frames: chan_id=24
dpaa2_ni0: dpaa2_ni_poll_task: timeout to consume frames: chan_id=23
dpaa2_ni1: dpaa2_ni_poll_task: failed to pull frames: chan_id=4, error=16
dpaa2_ni0: dpaa2_ni_poll_task: failed to pull frames: chan_id=16, error=16

sysctls:

dev.dpaa2_ni.0.stats.in_all_frames: 33739237
dev.dpaa2_ni.0.stats.in_all_bytes: 2227163082
dev.dpaa2_ni.0.stats.in_multi_frames: 0
dev.dpaa2_ni.0.stats.eg_all_frames: 76976026
dev.dpaa2_ni.0.stats.eg_all_bytes: 116515198666
dev.dpaa2_ni.0.stats.eg_multi_frames: 0
dev.dpaa2_ni.0.stats.in_filtered_frames: 0
dev.dpaa2_ni.0.stats.in_discarded_frames: 0
dev.dpaa2_ni.0.stats.in_nobuf_discards: 0
dev.dpaa2_ni.0.stats.buf_free: 0
dev.dpaa2_ni.0.stats.buf_num: 2800
dev.dpaa2_ni.0.stats.tx_sg_frames: 76976026
dev.dpaa2_ni.0.stats.tx_single_buf_frames: 0
dev.dpaa2_ni.0.stats.rx_ieoi_err_frames: 0
dev.dpaa2_ni.0.stats.rx_enq_rej_frames: 0
dev.dpaa2_ni.0.stats.rx_sg_buf_frames: 0
dev.dpaa2_ni.0.stats.rx_single_buf_frames: 33739234
dev.dpaa2_ni.0.stats.rx_anomaly_frames: 0
...
dev.dpaa2_ni.0.channels.0.tx_frames: 76976026
dev.dpaa2_ni.1.stats.in_all_frames: 32743170
dev.dpaa2_ni.1.stats.in_all_bytes: 2161390320
dev.dpaa2_ni.1.stats.in_multi_frames: 0
dev.dpaa2_ni.1.stats.eg_all_frames: 75728550
dev.dpaa2_ni.1.stats.eg_all_bytes: 114619322702
dev.dpaa2_ni.1.stats.eg_multi_frames: 0
dev.dpaa2_ni.1.stats.in_filtered_frames: 0
dev.dpaa2_ni.1.stats.in_discarded_frames: 0
dev.dpaa2_ni.1.stats.in_nobuf_discards: 0
dev.dpaa2_ni.1.stats.buf_free: 0
dev.dpaa2_ni.1.stats.buf_num: 2800
dev.dpaa2_ni.1.stats.tx_sg_frames: 75728550
dev.dpaa2_ni.1.stats.tx_single_buf_frames: 0
dev.dpaa2_ni.1.stats.rx_ieoi_err_frames: 0
dev.dpaa2_ni.1.stats.rx_enq_rej_frames: 0
dev.dpaa2_ni.1.stats.rx_sg_buf_frames: 0
dev.dpaa2_ni.1.stats.rx_single_buf_frames: 32743170
dev.dpaa2_ni.1.stats.rx_anomaly_frames: 0
...
dev.dpaa2_ni.1.channels.0.tx_frames: 75728550

commented

@mcbridematt
I've an experimental branch: https://github.com/mcusim/freebsd-src/tree/ten64

Could you try to run a stress test? I've been fighting another panic (Undefined instruction: ..., panic: Unknown kernel exception 0 esr_el1 2000000) and my Ten64 survived the last night under stress test. I wonder whether it helps to solve the issues with frames consuming.

Not seen on commit a85d6c9