imq / linuximq

Pseudo-driver for the intermediate queue device.

Home Page:https://imq.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible memory leak imq patch for kernel 4.8.4

martinmakr opened this issue · comments

Hello, in history i used kernel 4.2.3 with imq patch (only) for 1 year and it work fine. Before 2 days i installed on same router (debian based) kernel 4.8.4 patched with imq (patch founded in closed issue #46). After one day router suddenly rebooted (kernel 4.8.4-imq, 2GB ram). The traffic is about 150Mbps. On my graph i see that it use all memory (about 100MB/hour). Router have 2GB of ram. After this first reboot i try diagnose what happen. Router using more and more memory. But no process use the memory!

router6:~# ps aux | awk '{sum+=$6} END {print sum / 1024}'
85.8398
router6:~# free -m
             total       used       free     shared    buffers     cached
Mem:          1985       1939         46          0          0         41
-/+ buffers/cache:       1897         88
Swap:            0          0          0

I try many thing to analyse what happen, i cannot find what use the memory. Trying restarting services and still same. Memory is exhausting. When i stop imq with "ifconfig imq0 down", exhausting of memory stop! The current state is that router have 2GB, processes used 85MB and 46M is free, so the kernel use 1917M of memory (2048-85-46)

For all information, i have same kernel on other 3 router (with other hardware), and there is no problem with memory leak or using by imq. Used memory on other router with same kernel (4.8.4-imq) is about 400MB in 2GB RAM. I know, it sound strange. If you want, i can make some other test or append some diagnostic output. I could not experiment too much because router is in network with customers.

Hi yes in last version of imq #46 have memory leak i send kernel debug in
other mail and wait for info
i have 3 mashine with same problem when up imq memory is gon and mashine
crash and reboot
m.

IPACCT ltd.

On 8 Nov 2016 11:32 p.m., "martinmakr" notifications@github.com wrote:

Hello, in history i used kernel 4.2.3 with imq patch (only) for 1 year and
it work fine. Before 2 days i installed on same router (debian based)
kernel 4.8.4 patched with imq (patch founded in closed issue #46
#46). After one day router suddenly
rebooted (kernel 4.8.4-imq, 2GB ram). The traffic is about 150Mbps. On my
graph i see that it use all memory (about 100MB/hour). Router have 2GB of
ram. After this first reboot i try diagnose what happen. Router using more
and more memory. But no process use the memory!

router6:# ps aux | awk '{sum+=$6} END {print sum / 1024}'
85.8398
router6:# free -m
total used free shared buffers cached
Mem: 1985 1939 46 0 0 41
-/+ buffers/cache: 1897 88
Swap: 0 0 0

I try many thing to analyse what happen, i cannot find what use the
memory. Trying restarting services and still same. Memory is exhausting.
When i stop imq with "ifconfig imq0 down", exhausting of memory stop! The
current state is that router have 2GB, processes used 85MB and 46M is free,
so the kernel use 1917M of memory (2048-85-46)

For all information, i have same kernel on other 3 router (with other
hardware), and there is no problem with memory leak or using by imq. Used
memory on other router with same kernel (4.8.4-imq) is about 400MB in 2GB
RAM. I know, it sound strange. If you want, i can make some other test or
append some diagnostic output. I could not experiment too much because
router is in network with customers.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#48, or mute the thread
https://github.com/notifications/unsubscribe-auth/AMVh9DQ6l_CZBrCHr0GpyCaQSCVm94TRks5q8OqAgaJpZM4Ks7tK
.

vel21ripn wrote in #46

Maybe need a code "if (to_free) kfree_skb (to_free);" after label "out:" and before return ?

There is no this code in 4.8 patch. I used this part of backported code in my 4.4 kernel already some weeks or monthes ago with such a string, and no any memleaks and craches.

Hi Stansn
PLease write where you add this line :
if (to_free) kfree_skb (to_free);

drivers/net/imq.c, in __imq_nf_queue function after
out:
if (unlikely(to_free))
kfree_skb_list(to_free);

Oki i will try and after test i write status

m.

IPacct ltd. Micron

On Sun, Nov 13, 2016 at 8:26 AM, stasn77 notifications@github.com wrote:

drivers/net/imq.c, in __imq_nf_queue function after
out:
if (unlikely(to_free))
kfree_skb_list(to_free);


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9IyD8Mwuob1ytV2ZjSo2FJ0gy1Isks5q9q2GgaJpZM4Ks7tK
.

And one other problem with last patch for 4.8 kernel machine run with very
high load after update machine work fine but 3-4 hour after that mashine
start to load and load go to high may be near in source have lock and this
is a problem

m.

IPacct ltd. Micron

On Sun, Nov 13, 2016 at 9:16 AM, Martin Zaharinov micron@ipacct.com wrote:

Oki i will try and after test i write status

m.

IPacct ltd. Micron

On Sun, Nov 13, 2016 at 8:26 AM, stasn77 notifications@github.com wrote:

drivers/net/imq.c, in __imq_nf_queue function after
out:
if (unlikely(to_free))
kfree_skb_list(to_free);


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9IyD8Mwuob1ytV2ZjSo2FJ0gy1Isks5q9q2GgaJpZM4Ks7tK
.

perf top -Un ?

this from perf :

37.84% 15149 [kernel] [k] acpi_processor_ffh_cstate_enter
34.69% 10541 [kernel] [k] rht_deferred_worker
9.47% 2909 [kernel] [k] queued_spin_lock_slowpath
1.47% 542 [kernel] [k] e1000_irq_enable
1.19% 436 [kernel] [k] e1000_intr_msi
0.80% 247 [kernel] [k] nf_nat_bysource_hash
0.58% 179 [kernel] [k] nf_nat_cleanup_conntrack
0.49% 176 [kernel] [k] fib_table_lookup
0.44% 158 [kernel] [k] ipt_do_table
0.33% 107 [kernel] [k] __local_bh_enable_ip
0.24% 81 [kernel] [k] iadb_ia
0.23% 78 [kernel] [k] _raw_spin_lock
0.22% 73 [kernel] [k] hfsc_enqueue

IPacct ltd. Micron

On Mon, Nov 14, 2016 at 12:12 PM, stasn77 notifications@github.com wrote:

perf top -Un ?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9MhEFaT6IArguJDP7T187tvKt-7nks5q-DQQgaJpZM4Ks7tK
.

patch for 4.8 is to many bug when i back to kernel 4.7.x with old patch for
kernel 4.7 its ok.

m.

IPacct ltd. Micron

On Mon, Nov 14, 2016 at 4:05 PM, Martin Zaharinov micron@ipacct.com wrote:

this from perf :

37.84% 15149 [kernel] [k] acpi_processor_ffh_cstate_enter
34.69% 10541 [kernel] [k] rht_deferred_worker
9.47% 2909 [kernel] [k] queued_spin_lock_slowpath
1.47% 542 [kernel] [k] e1000_irq_enable
1.19% 436 [kernel] [k] e1000_intr_msi
0.80% 247 [kernel] [k] nf_nat_bysource_hash
0.58% 179 [kernel] [k] nf_nat_cleanup_conntrack
0.49% 176 [kernel] [k] fib_table_lookup
0.44% 158 [kernel] [k] ipt_do_table
0.33% 107 [kernel] [k] __local_bh_enable_ip
0.24% 81 [kernel] [k] iadb_ia
0.23% 78 [kernel] [k] _raw_spin_lock
0.22% 73 [kernel] [k] hfsc_enqueue

IPacct ltd. Micron

On Mon, Nov 14, 2016 at 12:12 PM, stasn77 notifications@github.com
wrote:

perf top -Un ?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9MhEFaT6IArguJDP7T187tvKt-7nks5q-DQQgaJpZM4Ks7tK
.

I think reason is in 2 commits from upstream kernel

torvalds/linux@7c96643
torvalds/linux@870190a

try to revert it temporary and retest again

May be need Feng and Konstantin to recheck codes for kernel 4.8

IPACCT ltd.

On 14 Nov 2016 4:29 p.m., "stasn77" notifications@github.com wrote:

I think reason is in 2 commits from upstream kernel

torvalds/linux@7c96643
torvalds/linux@7c96643
torvalds/linux@870190a
torvalds/linux@870190a

try to revert it temporary and retest again


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9O0_x6c-WY2O7FtijOfW5LJxy3ozks5q-HBWgaJpZM4Ks7tK
.

Your (and my) troubles with high cpu load are not IMQ related.
4.8 kernel + IMQ without those two commits (and added some extra code from net-next in my case) works just fine.

Oki i try but not work
machine work 1 hour after that stop access and ping wait 2-3 min and
machine is back not reboot not error only stop ping and access after that
all is fine ....
i try with latest kernel 4.8.8 and last fix now i not have memory leak but
stop work :)
this machine is run imq + eoip + l2tp+ dhcp+ hfsc+sfq
i try back to 4.7 kernel and work fine.

IPacct ltd. Micron

On Tue, Nov 15, 2016 at 6:08 AM, stasn77 notifications@github.com wrote:

Yours (and my) troubles with high cpu load is not IMQ related.
4.8 kernel + IMQ without those two commits (and added some extra code from
net-next in my case) works just fine.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9OFXOR4H5VOd_KZnpm7b-f0cnC56ks5q-TBIgaJpZM4Ks7tK
.

Did you try to revert two commits?

4.8.8 (with some patches) + imq + ndpi + accel (ipoe-dhcp, ipoe-up, pppoe, l2tp) + hfsc + prio + fq_codel

Yes i revert this two commits but problem is same

IPacct ltd. Micron

On Tue, Nov 15, 2016 at 5:10 PM, stasn77 notifications@github.com wrote:

Did you try to revert two commits?

4.8.8 (with some patches) + imq + ndpi + accel (ipoe-dhcp, ipoe-up, pppoe,
l2tp) + hfsc + prio + fq_codel


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9KQbqOh5rOlVPnw8CzLo_UhbYziGks5q-ctUgaJpZM4Ks7tK
.

Today i trying compile new kernels. However 4.8.8 and 4.4.32 failed compilation on "CC net/core/dev.o" - net/core/dev.c:3050:1: error: redefinition of '__kcrctab_validate_xmit_skb_list'
So i'm preparing 4.8.7, with patch linux-4.8-imq.diff and patch2 with "adding line if (unlikely(to_free))
kfree_skb_list(to_free);". When i have results on memory leaking, i will write. It need install to network, on table i cannot simulate memory leaking..

How i can prepare kernel with reverted changes (torvalds/linux@7c96643 and torvalds/linux@870190a) against for example 4.8.7 kernel? Just get diff of this changes and using patch on original kernel source?

yes need to remove from imq.c this line :
-@@ -3036,6 +3046,8 @@ struct sk_buff *validate_xmit_skb_list(s

  • return head;

- }

-+EXPORT_SYMBOL(validate_xmit_skb_list);
-+

  • static void qdisc_pkt_len_init(struct sk_buff *skb)
  • {
  • const struct skb_shared_info *shinfo = skb_shinfo(skb);

+EXPORT_SYMBOL(validate_xmit_skb_list);
-+

This EXPORT is add in Kernel source and not need to patch

IPacct ltd. Micron

On Tue, Nov 15, 2016 at 5:17 PM, Martin MaKr Kratochvíl <
notifications@github.com> wrote:

Today i trying compile new kernels. However 4.8.8 and 4.4.32 failed
compilation on "CC net/core/dev.o" - net/core/dev.c:3050:1: error:
redefinition of '__kcrctab_validate_xmit_skb_list'
So i'm preparing 4.8.7, with patch linux-4.8-imq.diff and patch2 with
"adding line if (unlikely(to_free))
kfree_skb_list(to_free);". When i have results on memory leaking, i will
write. It need install to network, on table i cannot simulate memory
leaking..

How i can prepare kernel with reverted changes (torvalds/linux@7c96643
torvalds/linux@7c96643 and torvalds/linux@
870190a torvalds/linux@870190a) against for
example 4.8.7 kernel? Just get diff of this changes and using patch on
original kernel source?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9GJM8oEiXDwKUplKm1rTqsffZBCsks5q-cz5gaJpZM4Ks7tK
.

to compile 4.8.8 or 4.4.32 you need to delete from net/core/dev.c folowing line:
+EXPORT_SYMBOL(validate_xmit_skb_list);

to prepare kernel:

  1. download patches adding .diff extension to links from github
    https://github.com/torvalds/linux/commit/7c9664351980aaa6a4b8837a314360b3a4ad382a.diff
    https://github.com/torvalds/linux/commit/870190a9ec9075205c0fa795a09fa931694a3ff1.diff
  2. apply it with patch -p1 -R < patch_name.diff
  3. recompile kernel

With first patch i have too many Failed :

linux-4.8.8$ patch -p1 -R < ../package/kernel26/kernel-4.8.8-patch1.patch
patching file include/net/netfilter/nf_conntrack.h
Hunk #1 FAILED at 117.
1 out of 1 hunk FAILED -- saving rejects to file
include/net/netfilter/nf_conntrack.h.rej
patching file include/net/netfilter/nf_conntrack_extend.h
patching file include/net/netfilter/nf_nat.h
Hunk #1 succeeded at 30 (offset 1 line).
patching file net/netfilter/nf_conntrack_extend.c
patching file net/netfilter/nf_nat_core.c
Hunk #1 FAILED at 198.
Hunk #2 FAILED at 433.
Hunk #3 succeeded at 557 (offset 17 lines).
Hunk #4 FAILED at 553.
Hunk #5 FAILED at 684.
Hunk #6 succeeded at 712 (offset 14 lines).
4 out of 6 hunks FAILED -- saving rejects to file
net/netfilter/nf_nat_core.c.rej

With second :

linux-4.8.8$ patch -p1 -R < ../package/kernel26/kernel-4.8.8-patch2.patch
patching file include/net/netfilter/nf_conntrack.h
patching file include/net/netfilter/nf_nat.h
patching file net/netfilter/nf_nat_core.c
Hunk #6 succeeded at 427 (offset 1 line).
Hunk #7 succeeded at 553 (offset 1 line).
Hunk #8 succeeded at 688 (offset 1 line).
Hunk #9 succeeded at 828 (offset 2 lines).
Hunk #10 succeeded at 861 (offset 2 lines).
Hunk #11 succeeded at 879 (offset 2 lines).

m.

IPacct ltd. Micron

On Tue, Nov 15, 2016 at 5:26 PM, stasn77 notifications@github.com wrote:

to compile 4.8.8 or 4.4.32 you need to delete from net/core/dev.c folowing
line:
+EXPORT_SYMBOL(validate_xmit_skb_list);

to prepare kernel:

  1. download patches adding .diff extension to links from github
    torvalds/linux@7c96643
    b3a4ad382a.diff
    torvalds/linux@870190a
    31694a3ff1.diff
  2. apply it with patch -p1 -R < patch_name.diff
  3. recompile kernel


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9FqgRua-slfYKTriOkvNFzxcttRWks5q-c8mgaJpZM4Ks7tK
.

Apply with -R in reverse order. first patch2, than patch1

Yes work oki i will try and write status

IPacct ltd. Micron

On Tue, Nov 15, 2016 at 5:44 PM, stasn77 notifications@github.com wrote:

Apply with -R in reverse order. first patch2, than patch1


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9Mf-zPrHVVIYF0RaO9azQn4jd1F3ks5q-dNHgaJpZM4Ks7tK
.

no problem is ther but after revert with this patch machine work but after
work 1 hour machine stop respons after login on monitor and keyboard and
down imq machine work fine without any error

problem is to Big need to recheck full code of imq and may be need to fix
many ot struct

m.

IPacct ltd. Micron

On Tue, Nov 15, 2016 at 5:45 PM, Martin Zaharinov micron@ipacct.com wrote:

Yes work oki i will try and write status

IPacct ltd. Micron

On Tue, Nov 15, 2016 at 5:44 PM, stasn77 notifications@github.com wrote:

Apply with -R in reverse order. first patch2, than patch1


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9Mf-zPrHVVIYF0RaO9azQn4jd1F3ks5q-dNHgaJpZM4Ks7tK
.

I go other way. On router6, where i originally detect memleak with 4.8.4 and patch imq (4.8), i've installed 4.8.7 with imq patch (4.8) and patch2 (added line if (unlikely(to_free))
kfree_skb_list(to_free);) now. The router is running for 45 minutes, and the memory usage is steady. No wasting 100MB/hour as with older kernel. Load of machine is normal. I never have problem in history with load and imq.

I have another many other routers (router3, router10) where i have memleaking kernel 4.8.4-imq and there i can see growing using of memory, but slower (about 50MB/day) I will test 4.8.7 for few days. After that i try 4.8.8 with reverting changes as stasn77 recommend. Or i can also go back to kernel 4.7.10, because my original motivation was to have kernel without dirty cow bug.

HI Martin
But the problem if you revert changes as stans77 recommend you back to
other big problem machine stop respons and need to Down imq interface to go
back online.
With 4.7 patch imq work fine but : first 4.7 is EOL and second have bug in
10G driver which is fixed in 4.8.x

m.

IPacct ltd. Micron

On Tue, Nov 15, 2016 at 6:39 PM, Martin MaKr Kratochvíl <
notifications@github.com> wrote:

I go other way. On router6, where i originally detect memleak with 4.8.4
and patch imq (4.8), i've installed 4.8.7 with imq patch (4.8) and patch2
(added line if (unlikely(to_free))
kfree_skb_list(to_free);) now. The router is running for 45 minutes, and
the memory usage is steady. No wasting 100MB/hour as with older kernel.
Load of machine is normal. I never have problem in history with load and
imq.

I have another many other routers (router3, router10) where i have
memleaking kernel 4.8.4-imq and there i can see growing using of memory,
but slower (about 50MB/day) I will test 4.8.7 for few days. After that i
try 4.8.8 with reverting changes as stasn77 recommend. Or i can also go
back to kernel 4.7.10, because my original motivation was to have kernel
without dirty cow bug.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9NFyWro3eL2GJrbDmYElEPuEWbJ1ks5q-eA6gaJpZM4Ks7tK
.

Hi micron,
did you try for better diagnostic do not send local traffic or ssh or other specific traffic to imq (using iptables rules to skip -j IMQ --todev X)? And about bug in 4.7? Is it in generic driver for all network card, or for some specific? I prepare one router with 10G card, so i can avoid some mistakes. The version 4.8.7 with imq patch 4.8 and added line for freeing memory still working good for me after 4 hours with no problem. So if it will be stable i have no strong reason to try 4.8.8 with reverting patches and testing issues what you are facing. Of course, for "development and progress for imq" i could make some test with 4.8.8 with revert changes how stans77 recommend on same router.

Hi Martin

Hear setup is to big and i skip ssh traffic , on imq only its internet
traffic and iptv
Machine run with kernel 4.8.8 + hfsc + sfq , e1000e driver dual 1G card
with kernel 4.8.3,4,5,6,7,8 have problem first problem with memory leak may
be is fix with lines from stans77 but other problem with crash and (when
revert patch from stans77 stop work and need to down imq to back machine
online ) is a problem with IMQ code i try to fix but int kernel 4.8 changes
is to many and may be need Feng or Konstanatin to check code
.

IPacct ltd. Micron

On Tue, Nov 15, 2016 at 10:23 PM, Martin MaKr Kratochvíl <
notifications@github.com> wrote:

Hi micron,
did you try for better diagnostic do not send local traffic or ssh or
other specific traffic to imq (using iptables rules to skip -j IMQ --todev
X)? And about bug in 4.7? Is it in generic driver for all network card, or
for some specific? I prepare one router with 10G card, so i can avoid some
mistakes. The version 4.8.7 with imq patch 4.8 and added line for freeing
memory still working good for me after 4 hours with no problem. So if it
will be stable i have no strong reason to try 4.8.8 with reverting patches
and testing issues what you are facing. Of course, for "development and
progress for imq" i could make some test with 4.8.8 with revert changes how
stans77 recommend on same router.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#48 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AMVh9LDO_03oz5bGmlwJj_TEJIMVP7jwks5q-hTVgaJpZM4Ks7tK
.

Try patch for 4.8.8 from #49. Included this, in my test this fix memory leak.

#ETH0 DOWNLOAD
IMQ0="imq0"
IMQ0_RATE="95Mbit" # or 950Mbit
#ETH0 UPLOAD
ETH0="eth0"
ETH0_RATE="95Mbit" # or 950Mbit

iptables -t mangle -A PREROUTING -i $ETH0 -j IMQ --todev 0

tc qdisc del dev $IMQ0 root
tc qdisc del dev $ETH0 root

tc qdisc add dev $IMQ0 root handle 1:0 htb r2q 10 default 11
tc class add dev $IMQ0 parent 1:0 classid 1:1 htb rate 1Gbit burst 15k mtu 16000
tc class add dev $IMQ0 parent 1:1 classid 1:11 htb rate $IMQ0_RATE burst 15k prio 1 mtu 16000
tc qdisc add dev $IMQ0 parent 1:11 handle 11 sfq perturb 10

tc qdisc add dev $ETH0 root handle 1:0 htb r2q 10 default 11
tc class add dev $ETH0 parent 1:0 classid 1:1 htb rate 1Gbit burst 15k mtu 1500
tc class add dev $ETH0 parent 1:1 classid 1:11 htb rate $ETH0_RATE burst 15k prio 1 mtu 1500
tc qdisc add dev $ETH0 parent 1:11 handle 11 sfq perturb 10
ip link set imq0 up

Start iperf3 -s and dstat -nm.

Thank you k0ste. Excellent work! I compile 4.8.8-imq from #49 and its running on one router now (router10). I have one diagnostic output from another router, with "old" 4.8.4 from #46 with memory leaking problem. And it has problem when reboot, because no reboot happen! On console is still repeating this line, maybe this could be usefull. For reboot it is better to wait to kernel.panic and reboot or use sysrq. The output is - still repeating:
unregister_netdevice: waiting for eth0.204 to become free. Usage count = 948.

And information about 4.8.7 - with #46 and with patch line from stans77 is working correctly on router6. No huge memleak, no other problem. After 2 days with avg 150Mbps traffic, memory used 114MB, processes use 82MB.