Kernel 4.4.X crash on IMQ

Question

Kernel 4.4.X crash on IMQ

litinoveweedle opened this issue 8 years ago · comments

Hello,

we use IMQ for long time since kernel 2.6.X for shaping half duplex radio links between our linux routers. We do that using iptables mangle table and hooking PREROUTING input and POSTROUTING output via same physical ethernet device into IMQ device and than attaching complex HTB qdisc into it. It works OK on 2.6 and 3.4 kernels, unfortunately we getting crash on every single 4.4 version we tried. (4.4.26, 4.4.32, 4.4.38) Under real traffic such crash is almost immediate (please see traces for 4.4.32 and 4.4.38)

Was there any change for IMQ which prevents such usage, or is it some IMQ/kernel bug? Thank you for any help.

frog# [  232.395366] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swap per/0:0]
[  232.398366] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1: 0]
[  232.398445] Modules linked in: xt_recent xt_nat xt_set ip_set_bitmap_ipmac ip _set_hash_ip ip_set nfnetlink xt_IMQ cls_u32 sch_sfq sch_htb imq xt_ACCOUNT(O) i pt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_hashlimit xt_multiport xt_addrt ype xt_conntrack xt_mark iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ ipv4 nf_nat_ipv4 iptable_raw ip6table_filter ip6_tables iptable_filter ip_tables x_tables ipv6 acpi_cpufreq processor i2c_piix4 uas sp5100_tco k10temp r8169 mii i2c_dev i2c_core nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_conntrack_b roadcast nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_f tp nf_nat_amanda nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_ conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp n f_conntrack_amanda nf_conntrack ixgbe(O) hwmon dca vxlan udp_tunnel ip6_udp_tunn el e1000e(O) ptp pps_core e1000
[  232.398467] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O 4.4.32-x8 6_64 #1
[  232.398469] Hardware name: PC Engines APU/APU, BIOS 4.0 09/08/2014
[  232.398474] task: ffff88007b0ce180 ti: ffff88007b15c000 task.ti: ffff88007b15 c000
[  232.398488] RIP: 0010:[<ffffffff8108891e>] [<ffffffff8108891e>] queued_spin_ lock_slowpath+0x13e/0x150
[  232.398491] RSP: 0018:ffff88007dd038b8  EFLAGS: 00000202
[  232.398494] RAX: 0000000000000101 RBX: ffff880079693e00 RCX: 0000000000000001
[  232.398496] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff880079693e80
[  232.398498] RBP: 0000000000000001 R08: 0000000000000101 R09: 00000000000000af
[  232.398501] R10: 0000000000000010 R11: 0000000000000004 R12: 0000000000000002
[  232.398503] R13: ffff88006d0b6400 R14: 0000000000000000 R15: ffff880079264000
[  232.398507] FS:  00007fd6660106a0(0000) GS:ffff88007dd00000(0000) knlGS:00000 00000000000
[  232.398510] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  232.398512] CR2: 00007f441d884a8f CR3: 000000007806f000 CR4: 00000000000006e0
[  232.398514] Stack:
[  232.398520]  ffffffffa02e7451 ffff88006d03b900 0000000100000001 ffff88007b0c6 000
[  232.398525]  01ffc900023ed010 0000000000000004 60a7eb034486e16d ffff88006d0b6 400
[  232.398530]  0000000000000000 ffff88006d0b6400 ffff880079264000 ffff88006d03b 900
[  232.398531] Call Trace:
[  232.398546]  <IRQ>
[  232.398546]  [<ffffffffa02e7451>] ? __imq_nf_queue+0x231/0x6f0 [imq]
[  232.398554]  [<ffffffffa02e7b42>] ? imq_nf_queue+0x232/0x376 [imq]
[  232.398563]  [<ffffffff814991c0>] ? ip_fragment.constprop.5+0x80/0x80
[  232.398568]  [<ffffffff8148e020>] ? nf_queue+0x130/0x190
[  232.398574]  [<ffffffff8148d066>] ? nf_hook_slow+0x56/0xd0
[  232.398579]  [<ffffffff81499d74>] ? ip_output+0xd4/0xf0
[  232.398584]  [<ffffffff814991c0>] ? ip_fragment.constprop.5+0x80/0x80
[  232.398589]  [<ffffffff81495dfa>] ? ip_forward+0x39a/0x490
[  232.398594]  [<ffffffff814959f0>] ? ip_frag_mem+0x40/0x40
[  232.398599]  [<ffffffff8148e157>] ? nf_reinject+0xd7/0x160
[  232.398604]  [<ffffffffa02e71a9>] ? imq_dev_xmit+0x89/0xc0 [imq]
[  232.398611]  [<ffffffff8145aa77>] ? dev_hard_start_xmit+0xa7/0x3e0
[  232.398617]  [<ffffffffa02e7408>] ? __imq_nf_queue+0x1e8/0x6f0 [imq]
[  232.398624]  [<ffffffffa02e7b42>] ? imq_nf_queue+0x232/0x376 [imq]
[  232.398630]  [<ffffffff81493d80>] ? ip_local_deliver_finish+0x1d0/0x1d0
[  232.398634]  [<ffffffff8148e020>] ? nf_queue+0x130/0x190
[  232.398639]  [<ffffffff8148d066>] ? nf_hook_slow+0x56/0xd0
[  232.398643]  [<ffffffff814945d6>] ? ip_rcv+0x306/0x3a0
[  232.398649]  [<ffffffff81493d80>] ? ip_local_deliver_finish+0x1d0/0x1d0
[  232.398654]  [<ffffffff814566e9>] ? __netif_receive_skb_core+0x6d9/0xa20
[  232.398659]  [<ffffffff814977b6>] ? ip_finish_output2+0x1c6/0x330
[  232.398666]  [<ffffffff814cd48a>] ? inet_gro_receive+0x1fa/0x240
[  232.398671]  [<ffffffff81458e3f>] ? netif_receive_skb_internal+0x2f/0xa0
[  232.398676]  [<ffffffff81459712>] ? napi_gro_receive+0xb2/0x100
[  232.398685]  [<ffffffffa018424d>] ? rtl8169_poll+0x2cd/0x600 [r8169]
[  232.398691]  [<ffffffff8145a784>] ? net_rx_action+0x204/0x340
[  232.398697]  [<ffffffff81054f4a>] ? __do_softirq+0xfa/0x290
[  232.398702]  [<ffffffff81055215>] ? irq_exit+0x95/0xa0
[  232.398707]  [<ffffffff8150e0bf>] ? do_IRQ+0x4f/0xd0
[  232.398713]  [<ffffffff8150c63f>] ? common_interrupt+0x7f/0x7f
[  232.398720]  <EOI>
[  232.398720]  [<ffffffff81087281>] ? cpu_startup_entry+0x111/0x330
[  232.398726]  [<ffffffff81036017>] ? start_secondary+0x137/0x160
[  232.398779] Code: 80 c6 07 01 48 8b 02 48 85 c0 75 0a f3 90 48 8b 02 48 85 c0 74 f6 c7 40 08 01 00 00 00 e9 5f ff ff ff 83 fa 01 75 04 eb 11 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 f3 c3 90 0f 1f 44
[  232.820313] Modules linked in: xt_recent xt_nat xt_set ip_set_bitmap_ipmac ip _set_hash_ip ip_set nfnetlink xt_IMQ cls_u32 sch_sfq sch_htb imq xt_ACCOUNT(O) i pt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_hashlimit xt_multiport xt_addrt ype xt_conntrack xt_mark iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ ipv4 nf_nat_ipv4 iptable_raw ip6table_filter ip6_tables iptable_filter ip_tables x_tables ipv6 acpi_cpufreq processor i2c_piix4 uas sp5100_tco k10temp r8169 mii i2c_dev i2c_core nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_conntrack_b roadcast nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_f tp nf_nat_amanda nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_ conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp n f_conntrack_amanda nf_conntrack ixgbe(O) hwmon dca vxlan udp_tunnel ip6_udp_tunn el e1000e(O) ptp pps_core e1000
[  232.901484] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O L 4.4.32-x8 6_64 #1
[  232.909228] Hardware name: PC Engines APU/APU, BIOS 4.0 09/08/2014
[  232.915410] task: ffffffff8180f4c0 ti: ffffffff81800000 task.ti: ffffffff8180 0000
[  232.922897] RIP: 0010:[<ffffffff81088920>] [<ffffffff81088920>] queued_spin_ lock_slowpath+0x140/0x150
[  232.932232] RSP: 0018:ffff88007dc038a0  EFLAGS: 00000202
[  232.937545] RAX: 0000000000000101 RBX: ffff880079692c00 RCX: 0000000000000001
[  232.944677] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff880079692c80
[  232.951811] RBP: 0000000000000000 R08: 0000000000000101 R09: 00000000000000af
[  232.958945] R10: 0000000000000010 R11: 0000000000000004 R12: 0000000000000002
[  232.966078] R13: ffff88007837f900 R14: 0000000000000000 R15: ffff880079022000
[  232.973213] FS:  00007f349ab196a0(0000) GS:ffff88007dc00000(0000) knlGS:00000 00000000000
[  232.981308] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  232.987053] CR2: 0000000000685318 CR3: 000000007807a000 CR4: 00000000000006f0
[  232.994186] Stack:
[  232.996206]  ffffffffa02e7451 ffff880078052100 0000000100000001 ffff88007b0c4 000
[  233.003703]  01ffc900023ed010 0000000000000004 792c55f3324573ba ffff88007837f 900
[  233.011200]  0000000000000003 ffff88007837f900 ffff880079022000 ffff880078052 100
[  233.018696] Call Trace:
[  233.021150]  <IRQ>
[  233.023093]  [<ffffffffa02e7451>] ? __imq_nf_queue+0x231/0x6f0 [imq]
[  233.029649]  [<ffffffffa02e7b42>] ? imq_nf_queue+0x232/0x376 [imq]
[  233.035841]  [<ffffffff814991c0>] ? ip_fragment.constprop.5+0x80/0x80
[  233.042284]  [<ffffffff8148e020>] ? nf_queue+0x130/0x190
[  233.047605]  [<ffffffff8148d066>] ? nf_hook_slow+0x56/0xd0
[  233.053092]  [<ffffffff81499d74>] ? ip_output+0xd4/0xf0
[  233.058328]  [<ffffffff814991c0>] ? ip_fragment.constprop.5+0x80/0x80
[  233.064775]  [<ffffffff81495dfa>] ? ip_forward+0x39a/0x490
[  233.070263]  [<ffffffff814959f0>] ? ip_frag_mem+0x40/0x40
[  233.075671]  [<ffffffff8148e157>] ? nf_reinject+0xd7/0x160
[  233.081168]  [<ffffffffa02e71a9>] ? imq_dev_xmit+0x89/0xc0 [imq]
[  233.087184]  [<ffffffff8145aa77>] ? dev_hard_start_xmit+0xa7/0x3e0
[  233.093372]  [<ffffffffa02e7408>] ? __imq_nf_queue+0x1e8/0x6f0 [imq]
[  233.099732]  [<ffffffffa02e7b42>] ? imq_nf_queue+0x232/0x376 [imq]
[  233.105920]  [<ffffffff81493d80>] ? ip_local_deliver_finish+0x1d0/0x1d0
[  233.112542]  [<ffffffff8148e020>] ? nf_queue+0x130/0x190
[  233.117863]  [<ffffffff8148d066>] ? nf_hook_slow+0x56/0xd0
[  233.123358]  [<ffffffff814945d6>] ? ip_rcv+0x306/0x3a0
[  233.128508]  [<ffffffff81493d80>] ? ip_local_deliver_finish+0x1d0/0x1d0
[  233.135129]  [<ffffffff814566e9>] ? __netif_receive_skb_core+0x6d9/0xa20
[  233.141837]  [<ffffffff814566e9>] ? __netif_receive_skb_core+0x6d9/0xa20
[  233.148543]  [<ffffffff810a4011>] ? ktime_get+0x31/0xa0
[  233.153772]  [<ffffffff81458e3f>] ? netif_receive_skb_internal+0x2f/0xa0
[  233.160480]  [<ffffffff81459015>] ? napi_gro_flush+0x55/0x70
[  233.166141]  [<ffffffff81459091>] ? napi_complete_done+0x61/0x90
[  233.172167]  [<ffffffffa0184017>] ? rtl8169_poll+0x97/0x600 [r8169]
[  233.178441]  [<ffffffff8145a784>] ? net_rx_action+0x204/0x340
[  233.184197]  [<ffffffff81054f4a>] ? __do_softirq+0xfa/0x290
[  233.189776]  [<ffffffff81055215>] ? irq_exit+0x95/0xa0
[  233.194925]  [<ffffffff8150e0bf>] ? do_IRQ+0x4f/0xd0
[  233.199905]  [<ffffffff8150c63f>] ? common_interrupt+0x7f/0x7f
[  233.205737]  <EOI>
[  233.207677]  [<ffffffff81087275>] ? cpu_startup_entry+0x105/0x330
[  233.213976]  [<ffffffff8108724e>] ? cpu_startup_entry+0xde/0x330
[  233.219992]  [<ffffffff818c4ed9>] ? start_kernel+0x406/0x426
[  233.225656] Code: 07 01 48 8b 02 48 85 c0 75 0a f3 90 48 8b 02 48 85 c0 74 f6 c7 40 08 01 00 00 00 e9 5f ff ff ff 83 fa 01 75 04 eb 11 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 66 89 07 c3 f3 c3 90 0f 1f 44 00 00

[  400.123003]  01ffc90002db5010 0000000000000004 dabab543e5e0a561 ffff88002fcf8d00
[  400.123003]  0000000000000002 ffff88002fcf8d00 ffff88003d3a5800 ffff88003ab97e00
[  400.123003] Call Trace:
[  400.123003]  <IRQ>
[  400.123003]  [<ffffffffa02c3451>] ? __imq_nf_queue+0x231/0x6f0 [imq]
[  400.123003]  [<ffffffffa02c3b42>] ? imq_nf_queue+0x232/0x376 [imq]
[  400.123003]  [<ffffffff814995a0>] ? ip_fragment.constprop.5+0x80/0x80
[  400.123003]  [<ffffffff8148e330>] ? nf_queue+0x130/0x190
[  400.123003]  [<ffffffff8148d396>] ? nf_hook_slow+0x56/0xd0
[  400.123003]  [<ffffffff8149a164>] ? ip_output+0xd4/0xf0
[  400.123003]  [<ffffffff814995a0>] ? ip_fragment.constprop.5+0x80/0x80
[  400.123003]  [<ffffffff814961da>] ? ip_forward+0x39a/0x490
[  400.123003]  [<ffffffff81495dd0>] ? ip_frag_mem+0x40/0x40
[  400.123003]  [<ffffffff8148e467>] ? nf_reinject+0xd7/0x160
[  400.123003]  [<ffffffffa02c31a9>] ? imq_dev_xmit+0x89/0xc0 [imq]
[  400.123003]  [<ffffffff8145ad17>] ? dev_hard_start_xmit+0xa7/0x3e0
[  400.123003]  [<ffffffffa02c3408>] ? __imq_nf_queue+0x1e8/0x6f0 [imq]
[  400.123003]  [<ffffffffa02c3b42>] ? imq_nf_queue+0x232/0x376 [imq]
[  400.123003]  [<ffffffff81494160>] ? ip_local_deliver_finish+0x1d0/0x1d0
[  400.123003]  [<ffffffff8148e330>] ? nf_queue+0x130/0x190
[  400.123003]  [<ffffffff8148d396>] ? nf_hook_slow+0x56/0xd0
[  400.123003]  [<ffffffff814949b6>] ? ip_rcv+0x306/0x3a0
[  400.123003]  [<ffffffff81494160>] ? ip_local_deliver_finish+0x1d0/0x1d0
[  400.123003]  [<ffffffff81456989>] ? __netif_receive_skb_core+0x6d9/0xa20
[  400.123003]  [<ffffffff814590df>] ? netif_receive_skb_internal+0x2f/0xa0
[  400.123003]  [<ffffffff814592b5>] ? napi_gro_flush+0x55/0x70
[  400.123003]  [<ffffffff81459331>] ? napi_complete_done+0x61/0x90
[  400.123003]  [<ffffffffa003724f>] ? e1000e_write_itr+0x154f/0x1740 [e1000e]
[  400.123003]  [<ffffffff8145aa24>] ? net_rx_action+0x204/0x340
[  400.123003]  [<ffffffff81054f3a>] ? __do_softirq+0xfa/0x290
[  400.123003]  [<ffffffff81055205>] ? irq_exit+0x95/0xa0
[  400.123003]  [<ffffffff8150e53f>] ? do_IRQ+0x4f/0xd0
[  400.123003]  [<ffffffff8150cabf>] ? common_interrupt+0x7f/0x7f
[  400.123003]  <EOI>
[  400.123003]  [<ffffffff81087271>] ? cpu_startup_entry+0x111/0x330
[  400.123003]  [<ffffffff81036007>] ? start_secondary+0x137/0x160
[  400.123003] Code: c1 74 80 c6 07 01 48 8b 02 48 85 c0 75 0a f3 90 48 8b 02 48 85 c0 74 f6 c7 40 08 01 00 00 00 e9 5f ff ff ff 83 fa 01 75 04 eb 11 <f3> 90 8b 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 f3 c3 90 66
[  415.085003] INFO: rcu_sched detected stalls on CPUs/tasks:
[  415.085003]  0-...: (12 GPs behind) idle=a1f/1/0 softirq=92466/92466 fqs=0
[  415.085003]  (detected by 1, t=240007 jiffies, g=33893, c=33892, q=3)
[  415.085003] Task dump for CPU 0:
[  415.085003] swapper/0       R  running task        0     0 0 0x00200008
[  415.085003]  0000000000200000 0100000000000000 ffffffff8182d1d8 ffff88003fa0c8e0
[  415.085003]  0100000000000000 ffffffffffffff1e ffffffff8108726d 0000000000000010
[  415.085003]  0000000000000202 ffffffff81803f68 0000000000000018 ffffffff8108723e
[  415.085003] Call Trace:
[  415.085003]  [<ffffffff8108726d>] ? cpu_startup_entry+0x10d/0x330
[  415.085003]  [<ffffffff8108723e>] ? cpu_startup_entry+0xde/0x330
[  415.085003]  [<ffffffff818c4ed9>] ? start_kernel+0x406/0x426
[  415.085003] rcu_sched kthread starved for 240007 jiffies! g33893 c33892 f0x2 s3 ->state=0x0
[  440.123003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[  440.123003] Modules linked in: xt_recent xt_nat xt_set ip_set_bitmap_ipmac ip_set_hash_ip ip_set nfnetlink xt_IMQ cls_u32 sch_sfq sch_htb imq xt_ACCOUNT(O) ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_hashlimit xt_multiport xt_addrtype xt_conntrack xt_mark iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_raw ip6table_filter ip6_tables iptable_filter ip_tables x_tables 8021q mrp garp stp llc ipv6 parport_pc parport k8temp powernow_k8 processor i2c_nforce2 i2c_dev i2c_core nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_conntrack_broadcast nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp nf_conntrack_amanda nf_conntrack ixgbe(O) hwmon dca vxlan udp_tunnel ip6_udp_tunnel e1000e(O) ptp pps_core e1000
[  440.123003] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O L  4.4.38-x86_64 #2
[  440.123003] Hardware name: System manufacturer System Product Name/M2N-XE, BIOS 0107    12/10/2007
[  440.123003] task: ffff88003e0f6180 ti: ffff88003e174000 task.ti: ffff88003e174000
[  440.123003] RIP: 0010:[<ffffffff8108890c>] [<ffffffff8108890c>] queued_spin_lock_slowpath+0x13c/0x150
[  440.123003] RSP: 0018:ffff88003fa838c8  EFLAGS: 00000202
[  440.123003] RAX: 0000000000000101 RBX: ffff88003c8c3600 RCX: 0000000000000001
[  440.123003] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff88003c8c3680
[  440.123003] RBP: 0000000000000001 R08: 0000000000000101 R09: 0000000000000000
[  440.123003] R10: 0000000000000010 R11: 0000000000000004 R12: 0000000000000002
[  440.123003] R13: ffff88002fcf8d00 R14: 0000000000000000 R15: ffff88003d3a5800
[  440.123003] FS:  00007f748d8a66a0(0000) GS:ffff88003fa80000(0000) knlGS:0000000000000000
[  440.123003] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  440.123003] CR2: 0000000000615510 CR3: 000000003c434000 CR4: 00000000000006e0
[  440.123003] Stack:
[  440.123003]  ffffffffa02c3451 ffff88003ab97e00 0000000100000001 ffff88003d568000
[  440.123003]  01ffc90002db5010 0000000000000004 dabab543e5e0a561 ffff88002fcf8d00
[  440.123003]  0000000000000002 ffff88002fcf8d00 ffff88003d3a5800 ffff88003ab97e00
[  440.123003] Call Trace:
[  440.123003]  <IRQ>
[  440.123003]  [<ffffffffa02c3451>] ? __imq_nf_queue+0x231/0x6f0 [imq]
[  440.123003]  [<ffffffffa02c3b42>] ? imq_nf_queue+0x232/0x376 [imq]
[  440.123003]  [<ffffffff814995a0>] ? ip_fragment.constprop.5+0x80/0x80
[  440.123003]  [<ffffffff8148e330>] ? nf_queue+0x130/0x190
[  440.123003]  [<ffffffff8148d396>] ? nf_hook_slow+0x56/0xd0
[  440.123003]  [<ffffffff8149a164>] ? ip_output+0xd4/0xf0
[  440.123003]  [<ffffffff814995a0>] ? ip_fragment.constprop.5+0x80/0x80
[  440.123003]  [<ffffffff814961da>] ? ip_forward+0x39a/0x490
[  440.123003]  [<ffffffff81495dd0>] ? ip_frag_mem+0x40/0x40
[  440.123003]  [<[  468.123003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[  468.123003] Modules linked in: xt_recent xt_nat xt_set ip_set_bitmap_ipmac ip_set_hash_ip ip_set nfnetlink xt_IMQ cls_u32 sch_sfq sch_htb imq xt_ACCOUNT(O) ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_hashlimit xt_multiport xt_addrtype xt_conntrack xt_mark iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_raw ip6table_filter ip6_tables iptable_filter ip_tables x_tables 8021q mrp garp stp llc ipv6 parport_pc parport k8temp powernow_k8 processor i2c_nforce2 i2c_dev i2c_core nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_conntrack_broadcast nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp nf_conntrack_amanda nf_conntrack ixgbe(O) hwmon dca vxlan udp_tunnel ip6_udp_tunnel e1000e(O) ptp pps_core e1000
[  468.123003] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O L  4.4.38-x86_64 #2
[  468.123003] Hardware name: System manufacturer System Product Name/M2N-XE, BIOS 0107    12/10/2007
[  468.123003] task: ffff88003e0f6180 ti: ffff88003e174000 task.ti: ffff88003e174000
[  468.123003] RIP: 0010:[<ffffffff81088910>] [<ffffffff81088910>] queued_spin_lock_slowpath+0x140/0x150
[  468.123003] RSP: 0018:ffff88003fa838c8  EFLAGS: 00000202
[  468.123003] RAX: 0000000000000101 RBX: ffff88003c8c3600 RCX: 0000000000000001
[  468.123003] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff88003c8c3680
[  468.123003] RBP: 0000000000000001 R08: 0000000000000101 R09: 0000000000000000
[  468.123003] R10: 0000000000000010 R11: 0000000000000004 R12: 0000000000000002
[  468.123003] R13: ffff88002fcf8d00 R14: 0000000000000000 R15: ffff88003d3a5800
[  468.123003] FS:  00007f748d8a66a0(0000) GS:ffff88003fa80000(0000) knlGS:0000000000000000
[  468.123003] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  468.123003] CR2: 0000000000615510 CR3: 000000003c434000 CR4: 00000000000006e0
[  468.123003] Stack:
[  468.123003]  ffffffffa02c3451 ffff88003ab97e00 0000000100000001 ffff88003d568000
[  468.123003]  01ffc90002db5010 0000000000000004 dabab543e5e0a561 ffff88002fcf8d00
[  468.123003]  0000000000000002 ffff88002fcf8d00 ffff88003d3a5800 ffff88003ab97e00
[  468.123003] Call Trace:
[  468.123003]  <IRQ>
[  468.123003]  [<ffffffffa02c3451>] ? __imq_nf_queue+0x231/0x6f0 [imq]
[  468.123003]  [<ffffffffa02c3b42>] ? imq_nf_queue+0x232/0x376 [imq]
[  468.123003]  [<ffffffff814995a0>] ? ip_fragment.constprop.5+0x80/0x80
[  468.123003]  [<ffffffff8148e330>] ? nf_queue+0x130/0x190
[  468.123003]  [<ffffffff8148d396>] ? nf_hook_slow+0x56/0xd0
[  468.123003]  [<ffffffff8149a164>] ? ip_output+0xd4/0xf0
[  468.123003]  [<ffffffff814995a0>] ? ip_fragment.constprop.5+0x80/0x80
[  468.123003]  [<ffffffff814961da>] ? ip_forward+0x39a/0x490
[  468.123003]  [<ffffffff81495dd0>] ? ip_frag_mem+0x40/0x40
[  468.123003]  [<ffffffff8148e467>] ? nf_reinject+0xd7/0x160
[  468.123003]  [<ffffffffa02c31a9>] ? imq_dev_xmit+0x89/0xc0 [imq]
[  468.123003]  [<ffffffff8145ad17>] ? dev_hard_start_xmit+0xa7/0x3e0
[  468.123003]  [<ffffffffa02c3408>] ? __imq_nf_queue+0x1e8/0x6f0 [imq]
[  468.123003]  [<ffffffffa02c3b42>] ? imq_nf_queue+0x232/0x376 [imq]
[  468.123003]  [<ffffffff81494160>] ? ip_local_deliver_finish+0x1d0/0x1d0
[  468.123003]  [<ffffffff8148e330>] ? nf_queue+0x130/0x190
[  468.123003]  [<ffffffff8148d396>] ? nf_hook_slow+0x56/0xd0
[  468.123003]  [<ffffffff814949b6>] ? ip_rcv+0x306/0x3a0
[  468.123003]  [<ffffffff81494160>] ? ip_local_deliver_finish+0x1d0/0x1d0
[  468.123003]  [<ffffffff81456989>] ? __netif_receive_skb_core+0x6d9/0xa20
[  468.123003]  [<ffffffff814590df>] ? netif_receive_skb_internal+0x2f/0xa0
[  468.123003]  [<ffffffff814592b5>] ? napi_gro_flush+0x55/0x70
[  468.123003]  [<ffffffff81459331>] ? napi_complete_done+0x61/0x90
[  468.123003]  [<ffffffffa003724f>] ? e1000e_write_itr+0x154f/0x1740 [e1000e]
[  468.123003]  [<ffffffff8145aa24>] ? net_rx_action+0x204/0x340
[  468.123003]  [<ffffffff81054f3a>] ? __do_softirq+0xfa/0x290
[  468.123003]  [<ffffffff81055205>] ? irq_exit+0x95/0xa0
[  468.123003]  [<ffffffff8150e53f>] ? do_IRQ+0x4f/0xd0
[  468.123003]  [<ffffffff8150cabf>] ? common_interrupt+0x7f/0x7f
[  468.123003]  <EOI>
[  468.123003]  [<ffffffff81087271>] ? cpu_startup_entry+0x111/0x330
[  468.123003]  [<ffffffff81036007>] ? start_secondary+0x137/0x160
[  468.123003] Code: 07 01 48 8b 02 48 85 c0 75 0a f3 90 48 8b 02 48 85 c0 74 f6 c7 40 08 01 00 00 00 e9 5f ff ff ff 83 fa 01 75 04 eb 11 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 66 89 07 c3 f3 c3 90 66 66 66 66 90

litinoveweedle · Answer 1 · Sat Dec 31 2016 09:53:08 GMT+0800 (China Standard Time)

One additional observation, it seems that this issue is somehow bounded to CPU performance. It occurs with higher probability on slower CPU (almost certainly), while on more powerful CPU it occurs only sporadically.

litinoveweedle · Answer 2 · Fri Feb 24 2017 08:46:28 GMT+0800 (China Standard Time)

Hello, is there any updates on bug? Can we help somehow to troubleshoot this one? It is really serious problem for us, so we are ready to pay bounty for solving it! BR L.

Konstantin Shalygin · Answer 3 · Fri Feb 24 2017 13:33:43 GMT+0800 (China Standard Time)

Can we help somehow to troubleshoot this one?

How community help if you don't post your: environment, hardware, settings, kernel patches and config? This is non standard problem, because for now only you have this one.

You try 4.9 kernel?

micron10 · Answer 4 · Fri Feb 24 2017 13:38:32 GMT+0800 (China Standard Time)

in 4.9 have bug when use imq + hsfq + ppp machine work one day and crash. IPacct ltd.

…

On 24 Feb 2017 7:33 am, "Konstantin Shalygin" ***@***.***> wrote: Can we help somehow to troubleshoot this one? How community help if you don't post your: environment, hardware, settings, kernel patches and config? This is non standard problem, because for now only you have this one. You try 4.9 kernel? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMVh9HgEyIchJosDv5yan_JuCnVl-AaPks5rfmu4gaJpZM4LSzII> .

Konstantin Shalygin · Answer 5 · Fri Feb 24 2017 13:53:17 GMT+0800 (China Standard Time)

What is "hsfq"?

micron10 · Answer 6 · Fri Feb 24 2017 13:55:37 GMT+0800 (China Standard Time)

Sorry hfsc and after switch to htb problem is going but for little more ma hine crash after 4-5 day. but i not get log from crash IPacct ltd.

…

On 24 Feb 2017 7:53 am, "Konstantin Shalygin" ***@***.***> wrote: What is "hsfq"? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMVh9JBPuhu2KRXRGQX07SQe3SWBnd3Aks5rfnBNgaJpZM4LSzII> .

litinoveweedle · Answer 7 · Sat Feb 25 2017 07:37:46 GMT+0800 (China Standard Time)

k0ste, thank for reply. I would say, that hardware and kernel version are obvious from my original report and trace:

Hardware:
[ 232.398469] Hardware name: PC Engines APU/APU, BIOS 4.0 09/08/2014
https://pcengines.ch/apu.htm

Environment: What would you like to know exactly? IMQ is used as described to simulate halfduplex link (IMQ is hooked on PREROUTING and POSTROUTING of same interface) on embeded APU NIC, tc root disc using htb, wth ~ 4000classes, classified by u32 hash tables based on src/dst IPs.

Kernels:
vanilla 4.4.26, 4.4.32, 4.4.38 + IMQ patch, ESFQ patch, Intel NIC drivers (e1000 project). We also tried all SUSE distro kernel patches for 4.4.38 - with no difference.

Kernel Config for 4.4.26 attached.

config.txt

Konstantin Shalygin · Answer 8 · Sat Feb 25 2017 12:48:06 GMT+0800 (China Standard Time)

Intel NIC drivers (e1000 project)

On apu.htm apu1 & apu1d4 (very uncommon hardware) devices have Realtek in specs...

ESFQ patch

Why?

litinoveweedle · Answer 9 · Mon Feb 27 2017 04:02:40 GMT+0800 (China Standard Time)

As the our distribution is used on variety of hardware. From high end Intel servers used as core routers to APU platform used as CE routers. Problem occurs (as in the original bug report) only on low performance CPU (as APU, but not limited to it)

litinoveweedle · Answer 10 · Mon Feb 27 2017 04:05:57 GMT+0800 (China Standard Time)

Regarding ESFQ scheduler, it has huge improvement for sfq - you can selectively hash flows on SRC or DST IP and therefore be more fair to you customers.

EDIT: Just realized, that tc flow could be used for standard SFQ to change flow tuples: http://man7.org/linux/man-pages/man8/tc-flow.8.html

But anyway we tested IMQ without ESFQ (using only SFQ for HTB classes), with same CPU soft lockup results.

Konstantin Shalygin · Answer 11 · Mon Feb 27 2017 20:42:49 GMT+0800 (China Standard Time)

EDIT: Just realized, that tc flow could be used for standard SFQ to change flow tuples: http://man7.org/linux/man-pages/man8/tc-flow.8.html

Yes, you should not use esfq because is unmaintained for ~ 10 years and any filters you can get from flows and use with sfq, drr, fq_codel, etc (see man tc-sfq, tc-drr, etc).

But anyway we tested IMQ without ESFQ (using only SFQ for HTB classes), with same CPU soft lockup results.

Good. I noticed that too - people have issues with old hardware. What advice? Try last kernels or stay on "this last version where works good", because I think some in kernel changed, because modern systems very fast and kernel should not be a limiter for performance.
And may be some of this receipts you should try. Good luck.

litinoveweedle · Answer 12 · Tue Feb 28 2017 01:14:22 GMT+0800 (China Standard Time)

Thanks for reply. Anyway the APU platform is not old at all. It is actually (especially APU2) recent one, and really powerful for CER usage! Regarding mentioned IMQ patches, we tested them all, no improvements.

When we use completely same htb qdisc structure, attached to physical interface egress, it works flawlessly. If the same htb qdisc is attached to IMQ in that "halfduplex mode" it fails quickly. So it doesn't seem to be really related to HW, or kernel generic functions. Interestingly it will not fail on all APU platform nodes, even when having completely same configuration. Therefore we suspect, that this bug is in IMQ and is traffic related.

My assumptions is some kind of deadlocking will happen probably on fragmented IP packet or related to SMP. IMQ is known from past by similar issues (I found someone posting about same CPU softlockup with same IMQ halfduplex setup as we use - in IMQ for 2.X kernels!)

So right now, we have have only two options.

Support IFB to get better support for NAT and get as author state: Advantage over current IMQ; cleaner in particular in SMP; with a lot less code.
Support IMQ to get refactored and get rid of these bugs without required huge changes in our network setup.

I am currently anyway evaluating completely new traffic management with IFB / HFSC / AQM (fq_codel/cake) in our network - to test is as replacement of IMQ / HTB / (E)SFQ. It is painful process to change qos/shaping in ISP size network, but maybe we just needed such impulse to take fresh approach to traffic management. :-)

Kind regards
L.

Konstantin Shalygin · Answer 13 · Tue Feb 28 2017 08:58:42 GMT+0800 (China Standard Time)

that this bug is in IMQ and is traffic related.

Traffic related bug is most hard to reproduce.
Yes, maybe for your wireless clients dtaht/sch_cake is better.

So what you mean half-duplex IMQ? May be you start to show your scripts?

litinoveweedle · Answer 14 · Wed Mar 01 2017 18:34:47 GMT+0800 (China Standard Time)

I am afraid, that everything relevant was already posted in the original bug report - same as for hardware. I am not using script but more dynamic tc commands generation based on provided data files. But I would say that example bellow could help you to understand, what I meant by half-duplex line shaping.

iptables -t mangle -I PREROUTING -i eth0 -j IMQ --todev 0
iptables -t mangle -I POSTROUTING -o eth0 -j IMQ --todev 0

ip link set dev imq0 up
ip link set dev imq0 qlen 100
ip link set dev imq0 mtu 1500

tc qdisc add dev imq0 root handle 1: htb default 20
tc class add dev imq0 parent 1: classid 1:1 htb rate 32mbit ceil 32mbit burst 32k
...
a lot of complicated hash filters and classes stuff here....
...
each class has esfq or fifo qdisc attached

BTW: we tested succesfully ifb on same job using this hooking:

ip link set dev ifb0 up
ip link set dev ifb0 qlen 100
ip link set dev ifb0 mtu 1500

tc qdisc add dev eth0 handle ffff: ingress
tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 action mirred egress redirect dev ifb0
tc qdisc add dev eth0 root handle 1: htb
tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 action mirred egress redirect dev ifb0

tc qdisc add dev ifb0 root handle 1: htb default 20
tc class add dev ifb0 parent 1: classid 1:1 htb rate 32mbit ceil 32mbit burst 32k
...
a lot of complicated hash filters and classes stuff here....
...
each class has esfq or fifo qdisc attached

litinoveweedle · Answer 15 · Fri Mar 03 2017 21:56:21 GMT+0800 (China Standard Time)

Is there any information I can share with you? Any thoughts what could cause CPU soft lock?

Konstantin Shalygin · Answer 16 · Fri Mar 03 2017 22:47:08 GMT+0800 (China Standard Time)

ip link set dev imq0 qlen 100
ip link set dev imq0 mtu 1500

I'm not sure. I don't see this ever, this is drop of performance.

iptables -t mangle -I POSTROUTING -o eth0 -j IMQ --todev 0

You really need this? You can attach qdisc on eth0 directly.

litinoveweedle · Answer 17 · Sat Mar 04 2017 02:14:33 GMT+0800 (China Standard Time)

Root htb qdisc queue is as far as I understand used only for classification, and sfq or codel have much more smaller queues in leaf qdisc anyway. 100packets is more than enough on half duplex 32mbit link. Also we have no jumbo frames, max MTU is 1500b, so I do not see any performance impact here - maybe you can explain.

Regarding the hooks, in some of the our cases we are selecting only some DST/SRC IP to go to imq and it is much more simpler to do it in both directions by iptables, than use TC filter for egress and iptables for ingress. But it could be only performance hit, not cause of CPU lockup.

When running our routers in normal traffic, we are not able to saturate the CPU to more than about 5% per core. CPU soft lockup happen disregarding to load, also in low peak hours. The problem is definitely somewhere else.

With all regards CPU softlock is programmatic error, not problem of usage or load. And as recent patches suggested above by you shows, there are/were such issues in imq code.

P.s.: We did also tried numqeues and precise stat imq module options, without any improvement to stability.

Konstantin Shalygin · Answer 18 · Sat Mar 04 2017 15:46:03 GMT+0800 (China Standard Time)

The best way for ISP - vlan per client. Shape on vlan to clients, instead matching IP addresses if you can do it on L2.

If you don't want change anything may you looking to your hardware? For last 3 years I use HP/DELL servers with BCM5720 adapters. And don't use Intel with e1000e - this is peace of shit.
I not have routers lower than 4.8 for now, this is example - it's not locks:

[k0ste@R1 ~]$ uptime
 14:41:19 up 83 days, 22:22,  5 users,  load average: 0.18, 0.18, 0.34
[k0ste@R1 ~]$ uname -a
Linux R1 4.8.13-1-ARCH #1 SMP PREEMPT Sat Dec 10 14:56:05 +07 2016 x86_64 GNU/Linux
[k0ste@R1 ~]$ sudo iptables -nvL -t mangle | grep IMQ
 363M  224G IMQ        all  --  vlan666 *       0.0.0.0/0            0.0.0.0/0           IMQ: todev 0
8438M 3392G IMQ        all  --  ppp0   *       0.0.0.0/0            0.0.0.0/0           IMQ: todev 1
 206M  170G IMQ        all  --  Tunnel1 *       0.0.0.0/0            0.0.0.0/0           IMQ: todev 2

This router works with ipt_netflow and ipt_ndpi and HTB/SFQ attached of course.

Your BIOS options for hardware all in Performance mode? aspm/vtx/vtd/c-states is disabled? CPU governor in performance mode? tx ring for ethernet interfaces adjusted? lro/tso/gro/txvlan/rxvlan/tx/rx/sg is disabled? Interrupt moderation, Flow Indirection and Recieve-Side Scaling is configured fine?

I have report - runned 4.9 - 2 days, over 3Tb routed - works fine.

litinoveweedle · Answer 19 · Sun Mar 05 2017 01:25:53 GMT+0800 (China Standard Time)

Don't get me wrong, but I did not reported poor IMQ performance. I did reported locking issue with module. Maybe in SMP, maybe in IP fragment handling... Although I appreciate your help with performance suggestions for our network, these are not related to the bug.

Actually what you said is: Look, I have no problem and if you do, it is because of your setup. :-) Well our HW doesn't cause the CPU to lock for 30sec and we did prove that, when used the same HTB structure just with IFB instead of IMQ - without any problem in our production.

I am not really sure, if by continuing this discussion we will be able to solve this bug.

BTW:

most of you performance setting are nice but needed only for speeds over ~ 1Gbit/s. We are here talking about 32Mbit/s line with current CPU load < 5% per core.
NIC offloads were patched at around kernel 3.8 - 3.10 to be usable with htb without further problems.
as our network is complicated structure, we simply can't use central shaping. Instead we do shape on each link. This depends on the network structure and is imperative for us.
no, we are not going to shape last-mile 32Mbit/s link with HP/DELL server :-)

Konstantin Shalygin · Answer 20 · Sun Mar 05 2017 18:06:24 GMT+0800 (China Standard Time)

I am not really sure, if by continuing this discussion we will be able to solve this bug.

Me too.

as our network is complicated structure, we simply can't use central shaping.

As a final council can recommend ipt_ratelimit, I use it for small and need_deploy_shaper_fast routers.

iptables -I FORWARD -m ratelimit --ratelimit-set 32Mbit --ratelimit-mode dst -j DROP
iptables -I FORWARD -m ratelimit --ratelimit-set 32Mbit --ratelimit-mode src -j DROP
echo -e "+198.19.15.14,192.168.16.193 33554432" > /proc/net/ipt_ratelimit/32Mbit #your clients here

Good luck.

litinoveweedle · Answer 21 · Sun Mar 05 2017 21:47:56 GMT+0800 (China Standard Time)

Thank you for you help, but this was not StackOverflow question how-to shape our network. This was originally bug report for IMQ. Honestly we expected a bit different approach - for example that you can add some tracing into code and we will test it in production to isolate problem etc... We never expected that you will tell us to get rid of IMQ as an answer.

We simply wanted to stay with IMQ, because any system change in the size of our network is rather complex to deploy and IMQ was doing the job in the past without glitch. But if we need to change shaping globally anyway, then we decided to give a try to IFB.

So if you see no point for fixing the bug for other users, please be free to close this issue as unresolved.

Konstantin Shalygin · Answer 22 · Sun Mar 05 2017 22:35:14 GMT+0800 (China Standard Time)

I recommend to you - try to use 4.8+ - you ignore this. I tried to pull you out of your class configuration, network design - you post "this is obviously, not performance issue". I almost out of skin, but this is useless. Further you - itself.