VinciGit00 / Scrapegraph-ai

Python scraper based on AI

Home Page:https://scrapegraphai.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gemini api is not set proxy , resulting in a 60s timeout.

wrench1997 opened this issue · comments

I can't find any specific functionality in the LangChain framework or the langchain_google_genai module that allows you to set or modify the user agent in requests to the Google SDK.
I found that using the following code can force the proxy in Linux, but not in Windows.

import os
os.environ["http_proxy"] = 'http://192.168.166.8:7890'
os.environ["https_proxy"] = 'http://192.168.166.8:7890'

Try a configuration like this:

"""
Basic example of scraping pipeline using SmartScraper
"""

from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info

************************************************

Define the configuration for the graph

************************************************

graph_config = {
"llm": {
"api_key": "key",
"model": "gpt-3.5-turbo",
},
"loader_kwargs": {
"proxy" : {
"server": "http://65.87.29.253:3120",
"username": "marcodemo",
"password": "stat",
},
},
"verbose": True,
"headless": False,
}

************************************************

Create the SmartScraperGraph instance and run it

************************************************

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the projects with their description",
# also accepts a string with the already downloaded HTML code
source="https://perinim.github.io/projects/",
config=graph_config
)

result = smart_scraper_graph.run()
print(result)

************************************************

Get graph execution info

************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))