ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

Home Page:https://github.com/UltrafunkAmsterdam/undetected-chromedriver

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[nodriver] Memory leak using with Chrome, Edge & Brave

Abdelrahman-Hekal opened this issue · comments

Good day @ultrafunkamsterdam
Thanks for the amazing nodriver project, I'm experiencing a continuous memory leak on a larger project when using nodriver and I tested this leak over the sample code below and as long as the code is running, it is leaking memory by 2MB per web page visited which is causing the program to be leaking around 10 GB when left running for 8 hours.

The code ran on Windows 11 and Linux where the same behaviour is noticed for both OS.

import nodriver as uc
import asyncio

async def main():

    url = "https://www.carrefouregypt.com/mafegy/en/c/NFEGY4000000?currentPage="
    browser = await uc.start(browser_args=["--incognito"])
    for i in range(1, 100):
        page, content = None, None
        page = await asyncio.wait_for(browser.get(url + str(i)), timeout=30) 
        content = await asyncio.wait_for(page.get_content(), timeout=30) 

uc.loop().run_until_complete(main())

Thanks @ilike2burnthing for referring to the uc issue but unfortunately the unclosed chrome processes of undetected-chrome is different than this issue with the nodrive package. The issue that I face is a memory leak within the python process itself as shown in the snapshot below.

image

Update: I have tried three different browsers (Chrome, Brave and Edge) and the memory leak is still observed with all of the browsers so far

There are a few other open issues pertaining to leaks, though again they seem to be for UC:

As a test, try reverting to Chrome v119 and v115. If you need clean installers (check the signatures, they're legitimate offline Chrome installers) - https://filecr.com/windows/google-chrome/?id=847089664000

You'll need to uninstall Chrome first, otherwise it won't install the older version if a newer version is present, and then reinstall normally once done with testing, as it won't auto-update.

I encountered the same issue, and switching to an older version of the browser still didn't resolve it. Are there any good solutions?

Is there a solution to this problem?

I noticed the same problem. If I iterate over an array of links, get_content() increases the memory usage of the python process each time. (memory leak).

Is there a solution to this problem?

image

@life-live no solution is found yet so far, we highly appreciate it if you can give it some time when possible @ultrafunkamsterdam

i think that the problem is in all functions which return elements.
to be exact memory leak happens in the moment when script gets current document to parse found nodes.
doc: cdp.dom.Node = await self.send(cdp.dom.get_document(-1, True))
as far as i understand garbage collector can't handle this variable for some reason and everytime the function gets called it stores more and more copies of the document in memory. probably because of the closure where a child of some object points to the its parent.

UPDATE. FIX FOUND:
i have investigated the code and finally found the issue!!!
problem was in almost what i thought. i hope that my approach doesn't break anything. but i think it shouldn't
nodriver/core/connection.py:558 this line should have had self.connection.mapper.pop(message["id"]) after it.
without this line every transaction with the answer to a command (including "get_document") lives in the memory even after it's not more needed. all of this because transaction id reference is still in the array, which keeps transactions data from unloading. developer just forgot or missed this. i hope i helped somebody. @ultrafunkamsterdam please fix

use this file as temporary fix:
https://gist.github.com/zxsleebu/3602dff4be43c73e32e7d2884b7a1ac3#file-uc_fix-py-L99

This fix really works, Great job @zxsleebu

Works great, hoping this gets merged pretty soon!