ScrapeGraphAI / Scrapegraph-ai

Python scraper based on AI

Home Page:https://scrapegraphai.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'SmartScraperGraph' object has no attribute 'model_token'

Naman-Bhrgv opened this issue · comments

Describe the bug
Hi,
I am trying to scrape webpage using SmartScraperGraph, but am constantly getting the following error-

'SmartScraperGraph' object has no attribute 'model_token'

To Reproduce

This is the code-

repo_id = "meta-llama/Llama-2-7b-hf"

llm_model_instance = HuggingFaceEndpoint(
    repo_id=repo_id,  max_length=128, temperature=0.5, huggingfacehub_api_token='MY_API_TOKEN')


embedder_model_instance = HuggingFaceInferenceAPIEmbeddings(
    api_key='MY_API_TOKEN', model_name="sentence-transformers/all-MiniLM-l6-v2"
)

from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {"model_instance": llm_model_instance},
    "embeddings": {"model_instance": embedder_model_instance}
}


smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time, event_end_date, event_end_time, location, event_mode, event_category, third_party_redirect, no_of_days, time_in_hours, hosted_or_attending, refreshments_type,  registration_available, registration_link",
    # also accepts a string with the already downloaded HTML code
    source="https://www.hmhco.com/event",
    config=graph_config
)

result = smart_scraper_graph.run()
print(result)

Hi, please make a trial with repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

Thanks it worked for me!

I am getting this error also. This is my code:

import os
import json
from typing import List
from pydantic import BaseModel, Field
from dotenv import load_dotenv
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from scrapegraphai.graphs import SmartScraperGraph

load_dotenv()

# Define the output schema for the graph
class FAQLink(BaseModel):
    text: str = Field(description="The text of the link")
    url: str = Field(description="The URL of the link")

class FAQCategory(BaseModel):
    header: str = Field(description="The header of the FAQ category")
    links: List[FAQLink] = Field(description="The list of links in this category")

class FAQStructure(BaseModel):
    categories: List[FAQCategory] = Field(description="The list of FAQ categories")

# Initialize the model instances
llm_model_instance = AzureChatOpenAI(
    openai_api_key = os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
    openai_api_version = os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment = os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"],
)

embedder_model_instance = AzureOpenAIEmbeddings(
    openai_api_key = os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
    openai_api_version = os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment = os.environ["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME"],
)

graph_config = {
    "llm": {"model_instance": llm_model_instance},
    "embeddings": {"model_instance": embedder_model_instance}
}

# Create the SmartScraperGraph instance and run it
smart_scraper_graph = SmartScraperGraph(
    prompt="Extract all FAQ categories, their headers, and the links (text and URL) within each category from the CIMB bank FAQ page",
    source="https://www.cimb.com.my/en/personal/help-support/faq.html",
    schema=FAQStructure,
    config=graph_config
)

result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))