VinciGit00 / Scrapegraph-ai

Python scraper based on AI

Home Page:https://scrapegraphai.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can ScrapeGraph-AI used in Kaggle Notebook?

Kingki19 opened this issue · comments

When i try to use scrapegraphai in Kaggle Notebook, i got the error message. This is my code:

Let's assume i have installed scrapegraphai in notebook using !pip install scrapegraphai and have correct API_key. I use Gemini in this case.

import os
from dotenv import load_dotenv
load_dotenv()
from scrapegraphai.graphs import SmartScraperGraph

prompt ="""
    list me all presidents in Indonesia, their presidency timespan, and their livespan.
"""
url = "https://en.wikipedia.org/wiki/List_of_presidents_of_Indonesia"

graph_config = {
    "llm": {
        "api_key": gemini_key,
        "model": "gemini-pro",
    },
}
smart_scraper_graph = SmartScraperGraph(
   prompt=prompt,
   # also accepts a string with the already downloaded HTML code
   source=url,
   config=graph_config
)

result = smart_scraper_graph.run()
print(result)

The message error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[17], line 15
      6 url = "[https://en.wikipedia.org/wiki/List_of_presidents_of_Indonesia](https://en.wikipedia.org/wiki/List_of_presidents_of_Indonesia%3C/span%3E%3Cspan) style="color:rgb(175,0,0)">"
      8 graph_config = {
      9     "llm": {
     10         "api_key": gemini_key,
     11         "model": "gemini-pro",
     12     },
     13 }
---> 15 smart_scraper_graph = SmartScraperGraph(
     16    prompt=prompt,
     17    # also accepts a string with the already downloaded HTML code
     18    source=url,
     19    config=graph_config
     20 )
     22 result = smart_scraper_graph.run()
     23 print(result)

File /opt/conda/lib/python3.10/site-packages/scrapegraphai/graphs/smart_scraper_graph.py:47, in SmartScraperGraph.__init__(self, prompt, source, config)
     46 def __init__(self, prompt: str, source: str, config: dict):
---> 47     super().__init__(prompt, config, source)
     49     self.input_key = "url" if source.startswith("http") else "local_dir"

File /opt/conda/lib/python3.10/site-packages/scrapegraphai/graphs/abstract_graph.py:48, in AbstractGraph.__init__(self, prompt, config, source)
     46 self.source = source
     47 self.config = config
---> 48 self.llm_model = self._create_llm(config["llm"], chat=True)
     49 self.embedder_model = self._create_default_embedder(llm_config=config["llm"]
     50                                                     ) if "embeddings" not in config else self._create_embedder(
     51     config["embeddings"])
     53 # Create the graph

File /opt/conda/lib/python3.10/site-packages/scrapegraphai/graphs/abstract_graph.py:152, in AbstractGraph._create_llm(self, llm_config, chat)
    150     except KeyError as exc:
    151         raise KeyError("Model not supported") from exc
--> 152     return Gemini(llm_params)
    153 elif llm_params["model"].startswith("claude"):
    154     try:

File /opt/conda/lib/python3.10/site-packages/scrapegraphai/models/gemini.py:20, in Gemini.__init__(self, llm_config)
     17 def __init__(self, llm_config: dict):
     18     # replace "api_key" to "google_api_key"
     19     llm_config["google_api_key"] = llm_config.pop("api_key", None)
---> 20     super().__init__(**llm_config)

File /opt/conda/lib/python3.10/site-packages/pydantic/v1/main.py:339, in BaseModel.__init__(__pydantic_self__, **data)
    333 """
    334 Create a new model by parsing and validating input data from keyword arguments.
    335 
    336 Raises ValidationError if the input data cannot be parsed to form a valid model.
    337 """
    338 # Uses something other than `self` the first arg to allow "self" as a settable attribute
--> 339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    340 if validation_error:
    341     raise validation_error

File /opt/conda/lib/python3.10/site-packages/pydantic/v1/main.py:1102, in validate_model(model, input_data, cls)
   1100     continue
   1101 try:
-> 1102     values = validator(cls_, values)
   1103 except (ValueError, TypeError, AssertionError) as exc:
   1104     errors.append(ErrorWrapper(exc, loc=ROOT_KEY))

File /opt/conda/lib/python3.10/site-packages/langchain_google_genai/chat_models.py:602, in ChatGoogleGenerativeAI.validate_environment(cls, values)
    599     if isinstance(google_api_key, SecretStr):
    600         google_api_key = google_api_key.get_secret_value()
--> 602     genai.configure(
    603         api_key=google_api_key,
    604         transport=values.get("transport"),
    605         client_options=values.get("client_options"),
    606         default_metadata=default_metadata,
    607     )
    608 if (
    609     values.get("temperature") is not None
    610     and not 0 <= values["temperature"] <= 1
    611 ):
    612     raise ValueError("temperature must be in the range [0.0, 1.0]")

File ~/.local/lib/python3.10/site-packages/sitecustomize.py:96, in post_import_logic.<locals>.new_configure(*args, **kwargs)
     94 else:
     95     default_metadata = []
---> 96 default_metadata.append(("x-kaggle-proxy-data", os.environ['KAGGLE_DATA_PROXY_TOKEN']))
     97 user_secrets_token = os.environ['KAGGLE_USER_SECRETS_TOKEN']
     98 default_metadata.append(('x-kaggle-authorization', f'Bearer {user_secrets_token}'))

AttributeError: 'tuple' object has no attribute 'append'

Summary:
This error occurs because there's a problem with this line:

default_metadata.append(("x-kaggle-proxy-data", os.environ['KAGGLE_DATA_PROXY_TOKEN']))

The error 'tuple' object has no attribute 'append' means you're trying to use the .append() operation on a tuple object, which should not be possible because tuples are immutable. As a solution, you need to ensure that default_metadata is a mutable object before attempting to add items to it.

I think this the reason why no one create a notebook in Kaggle about Scrapegraph-AI, i think they dealing same problem like this.

can you set the temperature to 0?
values.get("temperature") is not None
610 and not 0 <= values["temperature"] <= 1

can you set the temperature to 0? values.get("temperature") is not None 610 and not 0 <= values["temperature"] <= 1

Where i put that parameter in my code?

EDIT:

I set the temperature to 0 like this:

graph_config = {
    "llm": {
        "api_key": gemini_key,
        "model": "gemini-pro",
        'temperature': 0
    },
}

but I still get the same error.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[3], line 16
      6 url = "[https://en.wikipedia.org/wiki/List_of_presidents_of_Indonesia](https://en.wikipedia.org/wiki/List_of_presidents_of_Indonesia%3C/span%3E%3Cspan) style="color:rgb(175,0,0)">"
      8 graph_config = {
      9     "llm": {
     10         "api_key": gemini_key,
   (...)
     13     },
     14 }
---> 16 smart_scraper_graph = SmartScraperGraph(
     17    prompt=prompt,
     18    # also accepts a string with the already downloaded HTML code
     19    source=url,
     20    config=graph_config
     21 )
     23 result = smart_scraper_graph.run()
     24 print(result)

File /opt/conda/lib/python3.10/site-packages/scrapegraphai/graphs/smart_scraper_graph.py:47, in SmartScraperGraph.__init__(self, prompt, source, config)
     46 def __init__(self, prompt: str, source: str, config: dict):
---> 47     super().__init__(prompt, config, source)
     49     self.input_key = "url" if source.startswith("http") else "local_dir"

File /opt/conda/lib/python3.10/site-packages/scrapegraphai/graphs/abstract_graph.py:48, in AbstractGraph.__init__(self, prompt, config, source)
     46 self.source = source
     47 self.config = config
---> 48 self.llm_model = self._create_llm(config["llm"], chat=True)
     49 self.embedder_model = self._create_default_embedder(llm_config=config["llm"]
     50                                                     ) if "embeddings" not in config else self._create_embedder(
     51     config["embeddings"])
     53 # Create the graph

File /opt/conda/lib/python3.10/site-packages/scrapegraphai/graphs/abstract_graph.py:152, in AbstractGraph._create_llm(self, llm_config, chat)
    150     except KeyError as exc:
    151         raise KeyError("Model not supported") from exc
--> 152     return Gemini(llm_params)
    153 elif llm_params["model"].startswith("claude"):
    154     try:

File /opt/conda/lib/python3.10/site-packages/scrapegraphai/models/gemini.py:20, in Gemini.__init__(self, llm_config)
     17 def __init__(self, llm_config: dict):
     18     # replace "api_key" to "google_api_key"
     19     llm_config["google_api_key"] = llm_config.pop("api_key", None)
---> 20     super().__init__(**llm_config)

File /opt/conda/lib/python3.10/site-packages/pydantic/v1/main.py:339, in BaseModel.__init__(__pydantic_self__, **data)
    333 """
    334 Create a new model by parsing and validating input data from keyword arguments.
    335 
    336 Raises ValidationError if the input data cannot be parsed to form a valid model.
    337 """
    338 # Uses something other than `self` the first arg to allow "self" as a settable attribute
--> 339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    340 if validation_error:
    341     raise validation_error

File /opt/conda/lib/python3.10/site-packages/pydantic/v1/main.py:1102, in validate_model(model, input_data, cls)
   1100     continue
   1101 try:
-> 1102     values = validator(cls_, values)
   1103 except (ValueError, TypeError, AssertionError) as exc:
   1104     errors.append(ErrorWrapper(exc, loc=ROOT_KEY))

File /opt/conda/lib/python3.10/site-packages/langchain_google_genai/chat_models.py:602, in ChatGoogleGenerativeAI.validate_environment(cls, values)
    599     if isinstance(google_api_key, SecretStr):
    600         google_api_key = google_api_key.get_secret_value()
--> 602     genai.configure(
    603         api_key=google_api_key,
    604         transport=values.get("transport"),
    605         client_options=values.get("client_options"),
    606         default_metadata=default_metadata,
    607     )
    608 if (
    609     values.get("temperature") is not None
    610     and not 0 <= values["temperature"] <= 1
    611 ):
    612     raise ValueError("temperature must be in the range [0.0, 1.0]")

File ~/.local/lib/python3.10/site-packages/sitecustomize.py:96, in post_import_logic.<locals>.new_configure(*args, **kwargs)
     94 else:
     95     default_metadata = []
---> 96 default_metadata.append(("x-kaggle-proxy-data", os.environ['KAGGLE_DATA_PROXY_TOKEN']))
     97 user_secrets_token = os.environ['KAGGLE_USER_SECRETS_TOKEN']
     98 default_metadata.append(('x-kaggle-authorization', f'Bearer {user_secrets_token}'))

AttributeError: 'tuple' object has no attribute 'append'

@Kingki19, are you able to use gemini with langchain alone in a kaggle notebook?

seems unrelated to scrapegraph from the error log

@Kingki19, are you able to use gemini with langchain alone in a kaggle notebook?

seems unrelated to scrapegraph from the error log

I am sorry, but i never use Langchain. But when i watch others notebook, they can run and use Langchain.