seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.

Home Page:https://seleniumbase.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CDP events partially resolved in Undecteted mode

bjornkarlsson opened this issue · comments

Given the task to retrieve the 'Network.responseReceived' for the url being requested, this is not possible when using the uc mode.

Example code highlighting the issue.

import json
import time
from seleniumbase import Driver


def _responses(messages):
    responses = {}
    for m in messages:
        message = m['message']['message']
        if message['method'] == 'Network.responseReceived':
            params = message['params']
            responses[params['response']['url']] = params
    return responses


def main():
    url = 'https://www.metalreviews.com/reviews/album/10355'

    with Driver(uc=True,
                log_cdp=True,
                ) as driver:
        driver.open(url)

        # Regularly obtaining performance logs as the standard Selenium Driver
        logs = [dict(m, message=json.loads(m['message'])) for m in driver.get_log('performance')]
        responses = _responses(logs)

        print(responses[url])  # OK

    with Driver(uc=True,
                log_cdp=True,
                ) as driver:
        driver.uc_open_with_reconnect(url)

        logs = [dict(m, message=json.loads(m['message'])) for m in driver.get_log('performance')]
        assert logs  # These are either empty or contains a limited set of messages compared to standard mode
        responses = _responses(logs)

        try:
            print(responses[url])  # Key Error
        except KeyError:
            pass

    # Same problem using a cdp_listener
    logs = []

    def add_log(m):
        m = dict(m, message=json.loads(m['message']))
        logs.append(m)

    with Driver(uc=True,
                log_cdp=True,
                ) as driver:
        driver.add_cdp_listener("Network.responseReceived", add_log)
        driver.uc_open_with_reconnect(url)
        time.sleep(2)
        responses = _responses(logs)

        try:
            print(responses[url])  # Key Error
        except KeyError:
            pass


if __name__ == '__main__':
    main()

Using untedeteched-chrome directly I have the same exact same issue:

    import undetected_chromedriver as uc
    
    options = uc.ChromeOptions()
    options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})

    driver = uc.Chrome(executable_path='/opt/homebrew/bin/chromedriver',
                       options=options)

    url = 'https://www.metalreviews.com/reviews/album/10355'
    with driver:
        driver.get(url)
        time.sleep(2)
        logs = driver.get_log('performance')

        responses = _responses(logs)

        print(responses[url])  # OK

It's unclear to me wether seleniumbase is a fork/continuation of undetected-chromedriver, as it has been discontinued, as such if this is a real issue (and not using the api wrong) could be fixed in this codebase, otherwise this feature remains broken?

I also tried to activating the uc_cdp flag but helds the same result.

Thanks for your support!

Looks like a duplicate of #2162 (comment).

You will need to do a refresh() to get some logs because otherwise some logs are lost during the disconnect/reconnect process of UC Mode where the driver is disconnected from the browser.

Tested, and seems fine. Would the refresh hit the browser cache with the default options, are are there any options to enforce that?

That is mainly to halve the amount of requests that could be performed for a rate limited site in a certain timespan.

Refreshing the page will keep the options that were already set when you launched the web browser, plus any new ones that were added or changed via driver.execute_cdp_cmd(), such as for changing the GeoLocation. There's a good example of that GeoLocation changing here: SeleniumBase/examples/test_geolocation.py. I would experiment to learn more. Be sure to try out the various examples in the SeleniumBase/examples folder.