Memory leak bringing system to its knees after a few days of uptime

Question

Memory leak bringing system to its knees after a few days of uptime

yavincl opened this issue 4 years ago · comments

Remember the swiotlb buffer issue, this one #3 ?
Well, the iommu workaround keeps the system running and transmitting data wirelessly alright, for as long as it has enough memory.

However, the system must be rebooted often or it will suffer from performance loss due to the driver hogging memory. This has been especially impactful in my home server. Is there any way this could be solved?

lwfinger · Answer 1 · Thu May 21 2020 04:22:28 GMT+0800 (China Standard Time)

I have no new information. I retested to look for standard memory leaks - there are none.

yavincl · Answer 2 · Thu May 21 2020 05:37:52 GMT+0800 (China Standard Time)

Something is happening here, and looks like I can't really call it a memory leak.

Using this driver causes swiotlb to balloon up over time and never free anything.
Using this driver with intel iommu causes memory consumption to balloon up over time.
It seems to be caused by data traffic, not having the driver idling around, but doing work.

If you leave it running for long enough with people using the network for media, the system runs out of memory (with intel iommu), or the
swiotlb buffer gets full and the driver stops working (without intel iommu)

Unloading the driver does nothing.

Unrelated question, but is using intel_iommu=on iommu=force faster than swiotlb (the default)?

Please reopen this issue.

yavincl · Answer 3 · Thu May 21 2020 05:45:59 GMT+0800 (China Standard Time)

Is it possible to have the driver clean up everything it left on memory on unload than dynamically? I assume that would be an easier fix.

Note: If you would like me to make some sort of mega-driver-log with the driver running on some sort of debug mode, during normal daily activities, that'd be doable.

lwfinger · Answer 4 · Thu May 21 2020 05:49:10 GMT+0800 (China Standard Time)

You can reopen the issue, but that won't help me figure out a cause. As far as I can tell, it does not happen here. On unload, the driver does clear up everything it has allocated. It leaks nothing that my diagnostics can see.

Have you generated a kernel with kmemleak enabled? Perhaps some driver not used with my motherboard is leaking there.

yavincl · Answer 5 · Thu May 21 2020 05:52:23 GMT+0800 (China Standard Time)

Mmm, oooh, what a neat tool for the occasion. Kmemleak could give us clues, for sure.
I will be rebuilding with the required options for kmemleak and reading on the matter.

Anything else or complementary I should be aware of?

lwfinger · Answer 6 · Thu May 21 2020 08:18:47 GMT+0800 (China Standard Time)

The only gotcha is to have a large enough memory pool for kmemleak before the system is running. My configuration has
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000

Check dmesg for kmemleak messages early. If it is OK, you will see

Loading compiled-in X.509 certificates
zswap: loaded using pool lzo/zbud
kmemleak: Kernel memory leak detector initialized (mem pool available: 15526)
kmemleak: Automatic memory scanning thread started
input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0

If the pool is too small, you will see a failure there. Once that happens, kmemleak is turned off.

Although running kmemleak has some effect on performance, I always use it for the kernels that I build, at least on my laptop.

yavincl · Answer 7 · Thu May 21 2020 08:53:35 GMT+0800 (China Standard Time)

Currently rebuilding things. Decided to add CONFIG_PAGE_OWNER into the mix as it sounded like it could be useful...? If it isn't, no problem, as I will return to a mostly debugless kernel once my survey is done. For the poolsize hopefully 22000 is enough.

It would make sense that you as a driver maintainer would have some debugging options on all the time. Or maybe kmemleak is more useful than I realize?

yavincl · Answer 8 · Thu May 21 2020 10:03:08 GMT+0800 (China Standard Time)

Oh, gee. I'm sorry for putting this work on you. This seems to not have anything to do with wireless drivers... But at least I know about kmemleak now.

cat /sys/kernel/debug/kmemleak | grep -i unreferenced -C 1 | grep -v unreferenced | grep -v "-" > kmemleak-blame.txt

It's a few minutes after bootup and the generated file is just a mass of 550 (growing!) lines of
comm "uksmd", pid 222, jiffies 4294940109 (age 1294.810s) [<0000000063240273>] ret_from_fork+0x1f/0x30

This seems to put uksm (https://github.com/dolohow/uksm) at fault. It's a patch I use on all my kernels to help memory usage... was it the villain all along?
Htop says kernel thread uksmd is using 0.0 memory, as it is a kernel thread... wish it would display something though.

lwfinger · Answer 9 · Thu May 21 2020 10:17:11 GMT+0800 (China Standard Time)

OK. As you do not use modules, we know which executable to look for this entry. I hope you enabled debugging.

From the kernel source directory, run the following:
gdb vmlinux
At the gdb prompt, then
l *ret_from_fork+0x1f
That first letter is an el. The resulting output will show you the line that allocated the memory.

yavincl · Answer 10 · Thu May 21 2020 10:25:32 GMT+0800 (China Standard Time)

Uh. It appears that I have not. Can I just rebuilt the kernel with the debug symbols on it and point gdb at it? (without having to reboot the machine?, it's currently busy)

It should be alright...

lwfinger · Answer 11 · Thu May 21 2020 10:31:31 GMT+0800 (China Standard Time)

Yes, that should work. I applied the patch to my kernel and will be testing it.

yavincl · Answer 12 · Thu May 21 2020 11:06:30 GMT+0800 (China Standard Time)

0xffffffff81a001bf is at arch/x86/entry/entry_64.S:357.
352		/*
353		 * A kernel thread is allowed to return here after successfully
354		 * calling do_execve().  Exit to userspace to complete the execve()
355		 * syscall.
356		 */
357		movq	$0, RAX(%rsp)
358		jmp	2b
359	SYM_CODE_END(ret_from_fork)
360	
361	/*

I don't know what it means, so here's a sample from kmemleak's output:

  comm "uksmd", pid 222, jiffies 4294940109 (age 2904.450s)
  hex dump (first 32 bytes):
    68 2d 79 ba a9 95 ff ff 80 78 71 c8 b2 fa ff ff  h-y......xq.....
    00 20 b5 c4 be 55 00 00 00 00 00 00 00 00 00 00  . ...U..........
  backtrace:
    [<000000003c047889>] scan_vma_one_page+0x1bc9/0x2500
    [<00000000032b1c5b>] uksm_do_scan+0x39d/0x2380
    [<00000000f898133e>] uksm_scan_thread+0x12f/0x170
    [<00000000d383ddff>] kthread+0x10b/0x130
    [<0000000063240273>] ret_from_fork+0x1f/0x30```

lwfinger · Answer 13 · Thu May 21 2020 13:47:24 GMT+0800 (China Standard Time)

On my system, each one looks like:

unreferenced object 0xffff8881ab584640 (size 80):
comm "uksmd", pid 105, jiffies 4296306016 (age 668.524s)
hex dump (first 32 bytes):
c0 a0 35 d3 82 88 ff ff 80 b8 77 0a 00 ea ff ff ..5.......w.....
00 d0 3b 31 9c 55 00 00 00 00 00 00 00 00 00 00 ..;1.U..........
backtrace:
[<00000000df775e69>] get_next_rmap_item+0x129b/0x1600
[<00000000c3ada4e5>] scan_vma_one_page+0x40/0xf0
[<000000009a1b1eb3>] uksm_do_scan+0x164/0x2040
[<000000009ad9d2b0>] uksm_scan_thread+0x18e/0x1d0
[<00000000b85a4745>] kthread+0x11c/0x160
[<0000000075d8016e>] ret_from_fork+0x35/0x40

Getting the backtrace, I get:

(gdb) l get_next_rmap_item+0x129b
0xffffffff8126918b is in get_next_rmap_item (./include/linux/slab.h:659).
654 /
655 * Shortcuts
656 */
657 static inline void kmem_cache_zalloc(struct kmem_cache k, gfp_t flags)
658 {
659 return kmem_cache_alloc(k, flags | __GFP_ZERO);
660 }
661
662 /
663 * kzalloc - allocate memory. The memory is set to zero.
(gdb) l *scan_vma_one_page+0x40
0xffffffff8126bb80 is in scan_vma_one_page (mm/uksm.c:3413).
3408
3409 mm = vma->vm_mm;
3410 BUG_ON(!mm);
3411 BUG_ON(!slot);
3412
3413 rmap_item = get_next_rmap_item(slot, &hash);
3414 if (!rmap_item)
3415 goto out1;
3416
3417 if (PageKsm(rmap_item->page) && in_stable_tree(rmap_item))
(gdb) l get_next_rmap_item+0x129b
0xffffffff8126918b is in get_next_rmap_item (./include/linux/slab.h:659).
654 /
655 * Shortcuts
656 */
657 static inline void kmem_cache_zalloc(struct kmem_cache k, gfp_t flags)
658 {
659 return kmem_cache_alloc(k, flags | __GFP_ZERO);
660 }
661
662 /
663 * kzalloc - allocate memory. The memory is set to zero.
(gdb) l *uksm_scan_thread+0x18e
0xffffffff8126ddfe is in uksm_scan_thread (mm/uksm.c:4685).
4680 set_user_nice(current, 5);
4681
4682 while (!kthread_should_stop()) {
4683 mutex_lock(&uksm_thread_mutex);
4684 if (ksmd_should_run())
4685 uksm_do_scan();
4686 mutex_unlock(&uksm_thread_mutex);
4687
4688 try_to_freeze();

It is late here, thus I will need to look at this tomorrow. There is always a possibility of kmemleak showing false positives; however, the free command is showing an increase in the total memory in use. At 12:43 AM, I am seeing 2459544 bytes in use. I will check that in the morning.

lwfinger · Answer 14 · Thu May 21 2020 22:32:17 GMT+0800 (China Standard Time)

Those were KB, of course.

In an 8-hour period overnight, the memory usage went up by 355 MB! During that time, the laptop was idling. The daily backup had finished before I recorded the memory usage, and it was only doing routine operations. I suspect that most of the events recorded by kmemleak are false positives; however, it appears that some are real leaks. Now, comes the hard part of following the lifetime of those allocations from kmem_cache_alloc().

yavincl · Answer 15 · Thu May 21 2020 22:52:32 GMT+0800 (China Standard Time)

I recompiled the kernel without UKSM and now my kmemleak output is pretty much empty except for one entry from nvidia graphics driver. I'm fine with it, but will leave kmemleak running to see if it will catch anything after hours. Should we pass this to @dolohow ?

Edit: kmemleak hasn't grown yet.

lwfinger · Answer 16 · Thu May 21 2020 23:12:28 GMT+0800 (China Standard Time)

Can you run without UKSM for long enough to discover any changes with the swiotlb?

The flow is rather tortuous in the section involved, and I'm just getting a first look at where it might leak. It is possible that they are all false, and the increased memory is only to contain the kmemleak buffer. The test w/o UKSM should tell us. I think it a bit early to pass it on to @dolohow. We will have improvements for him even if just to tell kmemleak that the allocation is not a leak.

yavincl · Answer 17 · Thu May 21 2020 23:33:54 GMT+0800 (China Standard Time)

Yep. I will run my system with the following configuration and report back later:

kmemleak on
intel iommu off (swiotlb will be fully active)
no uksm

Everything else as normal, the difference is now I'll not be using Intel IOMMU.

yavincl · Answer 18 · Fri May 22 2020 00:37:41 GMT+0800 (China Standard Time)

Oh well. I used intel_iommu=off and iommu=off and my syslog:

mai 21 13:27:10 gallium kernel: swiotlb_tbl_map_single: 5 callbacks suppressed
mai 21 13:27:10 gallium kernel: rtl8192ee 0000:04:00.0: swiotlb buffer is full (sz: 4000 bytes), total 0 (slots), used 0 (slots)
mai 21 13:27:10 gallium kernel: rtl8192ee 0000:04:00.0: swiotlb buffer is full (sz: 4000 bytes), total 0 (slots), used 0 (slots)
mai 21 13:27:13 gallium kernel: rtl8192ee 0000:04:00.0: swiotlb buffer is full (sz: 4000 bytes), total 0 (slots), used 0 (slots)
mai 21 13:27:14 gallium kernel: rtl8192ee 0000:04:00.0: swiotlb buffer is full (sz: 4000 bytes), total 0 (slots), used 0 (slots)
mai 21 13:27:14 gallium kernel: rtl8192ee 0000:04:00.0: swiotlb buffer is full (sz: 4000 bytes), total 0 (slots), used 0 (slots)
mai 21 13:27:14 gallium kernel: rtl8192ee 0000:04:00.0: swiotlb buffer is full (sz: 4000 bytes), total 0 (slots), used 0 (slots)
mai 21 13:27:14 gallium kernel: rtl8192ee 0000:04:00.0: swiotlb buffer is full (sz: 4000 bytes), total 0 (slots), used 0 (slots)
mai 21 13:27:14 gallium kernel: rtl8192ee 0000:04:00.0: swiotlb buffer is full (sz: 4000 bytes), total 0 (slots), used 0 (slots)
mai 21 13:27:14 gallium kernel: rtl8192ee 0000:04:00.0: swiotlb buffer is full (sz: 4000 bytes), total 0 (slots), used 0 (slots)
mai 21 13:27:14 gallium kernel: rtl8192ee 0000:04:00.0: swiotlb buffer is full (sz: 4000 bytes), total 0 (slots), used 0 (slots)

Am I supposed to use iommu=soft instead? Looks like swiotlb didn't work here. Maybe because I'm not supposed to pass =off to these.

yavincl · Answer 19 · Fri May 22 2020 09:38:58 GMT+0800 (China Standard Time)

Nevermind. I'm not going to bother with the software method, swiotlb, in case things go off the rails and the wireless stops working, I'd have to manually put the system back on track as it used to be. Plus, it would be yet another time consuming task for me to play/read around with several different options and kernel features.

Since kmemleak was pretty inconclusive without UKSM, I'm just going to play the waiting game and let time do the work for me.
By that I mean I'll leave the server running its usual course without UKSM and seeing if memory usage once again balloons without any clear cause to it.

In the case it doesn't, I'll eventually notify dolohow of the happening and see into debugging it once again, as that would be possibly evidence of UKSM being leaky.

yavincl · Answer 20 · Fri May 22 2020 09:41:44 GMT+0800 (China Standard Time)

I greatly appreciated your support throughout the way, @lwfinger. It has cleared up many questions and taught me new things. I am grateful for that. But it seems I will not have any extra material for this issue for some time.

By the way, happy birthday. I hope people will turn out better at recognizing your work, efforts and labor on the Linux community overall... should have happened yesteryear, to be honest.

What you are seeing here and going through is just the result of an entitled society- it manifests even here in the open source community, where it in theory shouldn't.