[nodriver] Memory leak using with Chrome, Edge & Brave
Abdelrahman-Hekal opened this issue · comments
Good day @ultrafunkamsterdam
Thanks for the amazing nodriver project, I'm experiencing a continuous memory leak on a larger project when using nodriver and I tested this leak over the sample code below and as long as the code is running, it is leaking memory by 2MB per web page visited which is causing the program to be leaking around 10 GB when left running for 8 hours.
The code ran on Windows 11 and Linux where the same behaviour is noticed for both OS.
import nodriver as uc
import asyncio
async def main():
url = "https://www.carrefouregypt.com/mafegy/en/c/NFEGY4000000?currentPage="
browser = await uc.start(browser_args=["--incognito"])
for i in range(1, 100):
page, content = None, None
page = await asyncio.wait_for(browser.get(url + str(i)), timeout=30)
content = await asyncio.wait_for(page.get_content(), timeout=30)
uc.loop().run_until_complete(main())
Thanks @ilike2burnthing for referring to the uc issue but unfortunately the unclosed chrome processes of undetected-chrome is different than this issue with the nodrive package. The issue that I face is a memory leak within the python process itself as shown in the snapshot below.
Update: I have tried three different browsers (Chrome, Brave and Edge) and the memory leak is still observed with all of the browsers so far
There are a few other open issues pertaining to leaks, though again they seem to be for UC:
As a test, try reverting to Chrome v119 and v115. If you need clean installers (check the signatures, they're legitimate offline Chrome installers) - https://filecr.com/windows/google-chrome/?id=847089664000
You'll need to uninstall Chrome first, otherwise it won't install the older version if a newer version is present, and then reinstall normally once done with testing, as it won't auto-update.
I encountered the same issue, and switching to an older version of the browser still didn't resolve it. Are there any good solutions?
Is there a solution to this problem?
I noticed the same problem. If I iterate over an array of links, get_content() increases the memory usage of the python process each time. (memory leak).
@life-live no solution is found yet so far, we highly appreciate it if you can give it some time when possible @ultrafunkamsterdam
i think that the problem is in all functions which return elements.
to be exact memory leak happens in the moment when script gets current document to parse found nodes.
doc: cdp.dom.Node = await self.send(cdp.dom.get_document(-1, True))
as far as i understand garbage collector can't handle this variable for some reason and everytime the function gets called it stores more and more copies of the document in memory. probably because of the closure where a child of some object points to the its parent.
UPDATE. FIX FOUND:
i have investigated the code and finally found the issue!!!
problem was in almost what i thought. i hope that my approach doesn't break anything. but i think it shouldn't
nodriver/core/connection.py:558 this line should have had self.connection.mapper.pop(message["id"])
after it.
without this line every transaction with the answer to a command (including "get_document") lives in the memory even after it's not more needed. all of this because transaction id reference is still in the array, which keeps transactions data from unloading. developer just forgot or missed this. i hope i helped somebody. @ultrafunkamsterdam please fix
use this file as temporary fix:
https://gist.github.com/zxsleebu/3602dff4be43c73e32e7d2884b7a1ac3#file-uc_fix-py-L99
This fix really works, Great job @zxsleebu
Works great, hoping this gets merged pretty soon!