VinciGit00 / Scrapegraph-ai

Python scraper based on AI

Home Page:https://scrapegraphai.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

深入Playwright的特性slow_mo,解决ascrape_playwright异步加载机制下,未等JavaScript回调内容完成,提前关闭页面URL请求。

wangdongpeng1 opened this issue · comments

Why Use?

Slow_mo参数不仅是在Debugger调式工具中的作用,更重要的是当页面请求内容是以JavaScript调用形式返回时,因ascrape_playwright异步加载机制导致未等页面请求URL全部加载完后就关闭,导致页面抓取的内容不全时,起到延时加载的作用。

(注意:上述结论最适用于scrapegraphai框架中的ChromiumLoader组件)

相关代码截图:
image

playwright官网链接:https://playwright.bootcss.com/docs/debug

原文:You can also use the slowMo option to slow down execution and follow along while debugging.

解析:
slow_mo : Union[float, None]
            Slows down Playwright operations by the specified amount of milliseconds. Useful so that you can see what is going
            on.

使用指南

参数loader_kwargs增加配置项,比如:"slow_mo": 10000

代码如下:

graph_config = {
    "llm": {
        "api_key": "<Your API KEY>",
        "model": "oneapi/qwen-turbo",
        "base_url": "http://127.0.0.1:13000/v1",  # 设置 OneAPI URL
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "base_url": "http://127.0.0.1:11434",  # 设置 Ollama URL
    },
    "loader_kwargs": {
        "slow_mo": 10000
    }
}

Thanks, you can already specify it inside loader_kwargs as you have shown