VinciGit00 / Scrapegraph-ai

Python scraper based on AI

Home Page:https://scrapegraphai.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[feature request] Add support for schema definition with tools/function calling

nobilelucifero opened this issue · comments

Is your feature request related to a problem? Please describe.
I was playing with Scrapegraph, and I wanted to define my structured output using tools both in the SmartScraperGraph() function and the graph_config configuration object.

Describe the solution you'd like
Is there a way to do something like this currently?

chat_response = chat_completion_request(
    messages, tools=tools, tool_choice=...
)
...

Describe alternatives you've considered
Besides re-prompting the initial output, I've tried:

graph_config = {
    "llm": {
        "api_key": OPENAI_API_KEY,
        "model": "gpt-3.5-turbo",
        "temperature": 0,
        "model_kwargs": {
          "tools": tools,
          "tool_choice" : {"type": "function", "function": {"name": "get_tov"}}
        }
    },
}

and then

...
result = smart_scraper_graph.run()

Which returns the following error:

JSONDecodeError                           Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/langchain_core/output_parsers/json.py](https://localhost:8080/#) in parse_result(self, result, partial)
     65             try:
---> 66                 return parse_json_markdown(text)
     67             except JSONDecodeError as e:

15 frames
[/usr/local/lib/python3.10/dist-packages/langchain_core/utils/json.py](https://localhost:8080/#) in parse_json_markdown(json_string, parser)
    146             json_str = match.group(2)
--> 147     return _parse_json(json_str, parser=parser)
    148 

[/usr/local/lib/python3.10/dist-packages/langchain_core/utils/json.py](https://localhost:8080/#) in _parse_json(json_str, parser)
    159     # Parse the JSON string into a Python dictionary
--> 160     return parser(json_str)
    161 

[/usr/local/lib/python3.10/dist-packages/langchain_core/utils/json.py](https://localhost:8080/#) in parse_partial_json(s, strict)
    119     # for the original string.
--> 120     return json.loads(s, strict=strict)
    121 

[/usr/lib/python3.10/json/__init__.py](https://localhost:8080/#) in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    358         kw['parse_constant'] = parse_constant
--> 359     return cls(**kw).decode(s)

[/usr/lib/python3.10/json/decoder.py](https://localhost:8080/#) in decode(self, s, _w)
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()

[/usr/lib/python3.10/json/decoder.py](https://localhost:8080/#) in raw_decode(self, s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

OutputParserException                     Traceback (most recent call last)
[<ipython-input-22-2af99396d10b>](https://localhost:8080/#) in <cell line: 1>()
----> 1 result = smart_scraper_graph.run()

[/usr/local/lib/python3.10/dist-packages/scrapegraphai/graphs/smart_scraper_graph.py](https://localhost:8080/#) in run(self)
    107 
    108         inputs = {"user_prompt": self.prompt, self.input_key: self.source}
--> 109         self.final_state, self.execution_info = self.graph.execute(inputs)
    110 
    111         return self.final_state.get("answer", "No answer found.")

[/usr/local/lib/python3.10/dist-packages/scrapegraphai/graphs/base_graph.py](https://localhost:8080/#) in execute(self, initial_state)
    104 
    105             with get_openai_callback() as cb:
--> 106                 result = current_node.execute(state)
    107                 node_exec_time = time.time() - curr_time
    108                 total_exec_time += node_exec_time

[/usr/local/lib/python3.10/dist-packages/scrapegraphai/nodes/generate_answer_node.py](https://localhost:8080/#) in execute(self, state)
    144             # Chain
    145             single_chain = list(chains_dict.values())[0]
--> 146             answer = single_chain.invoke({"question": user_prompt})
    147 
    148         # Update the state with the generated answer

[/usr/local/lib/python3.10/dist-packages/langchain_core/runnables/base.py](https://localhost:8080/#) in invoke(self, input, config)
   2497         try:
   2498             for i, step in enumerate(self.steps):
-> 2499                 input = step.invoke(
   2500                     input,
   2501                     # mark each step as a child run

[/usr/local/lib/python3.10/dist-packages/langchain_core/output_parsers/base.py](https://localhost:8080/#) in invoke(self, input, config)
    167     ) -> T:
    168         if isinstance(input, BaseMessage):
--> 169             return self._call_with_config(
    170                 lambda inner_input: self.parse_result(
    171                     [ChatGeneration(message=inner_input)]

[/usr/local/lib/python3.10/dist-packages/langchain_core/runnables/base.py](https://localhost:8080/#) in _call_with_config(self, func, input, config, run_type, **kwargs)
   1624             output = cast(
   1625                 Output,
-> 1626                 context.run(
   1627                     call_func_with_variable_args,  # type: ignore[arg-type]
   1628                     func,  # type: ignore[arg-type]

[/usr/local/lib/python3.10/dist-packages/langchain_core/runnables/config.py](https://localhost:8080/#) in call_func_with_variable_args(func, input, config, run_manager, **kwargs)
    345     if run_manager is not None and accepts_run_manager(func):
    346         kwargs["run_manager"] = run_manager
--> 347     return func(input, **kwargs)  # type: ignore[call-arg]
    348 
    349 

[/usr/local/lib/python3.10/dist-packages/langchain_core/output_parsers/base.py](https://localhost:8080/#) in <lambda>(inner_input)
    168         if isinstance(input, BaseMessage):
    169             return self._call_with_config(
--> 170                 lambda inner_input: self.parse_result(
    171                     [ChatGeneration(message=inner_input)]
    172                 ),

[/usr/local/lib/python3.10/dist-packages/langchain_core/output_parsers/json.py](https://localhost:8080/#) in parse_result(self, result, partial)
     67             except JSONDecodeError as e:
     68                 msg = f"Invalid json output: {text}"
---> 69                 raise OutputParserException(msg, llm_output=text) from e
     70 
     71     def parse(self, text: str) -> Any:

OutputParserException: Invalid json output:

Additional context
This guide explains it better than I could:
https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models

@nobilelucifero could you share the full relevant code excerpt which you tried running and how should it look like to achieve tools customizability?

will think how we might improve that area.

Wi, we are defining something similar, stay tuned!

@nobilelucifero could you share the full relevant code excerpt which you tried running and how should it look like to achieve tools customizability?

will think how we might improve that area.

Amazing! This is the code (more or less because I've been playing with it) without using Scrapegraph:

# %%capture
# !pip install openai

import requests
import json

from google.colab import userdata
from openai import OpenAI

llm_client = OpenAI(
    api_key="YOUR_OPENAI_KEY"
)

"""Set up tools"""

def get_tov(audience, purpose):
  result = {
      "audience": audience,
      "purpose": purpose
  }

  return json.dumps(result)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_tov",
            "description": "Get the Tone of Voice of a content source piece written by the author themselves.",
            "parameters": {
                "type": "object",
                "properties": {
                    "audience": {
                        "type": "string",
                        "description": "Defines who are the primary audiences or target demographics of the input. Each item will be a 2-3 word description.",
                    },
                    "purpose": {
                        "type": "string",
                        "description": "What's the main purpose or goal of the text?",
                    },
                },
                "required": ["audience", "purpose"],
            },
        }
    }
]

tool_choice = {"type": "function", "function": {"name": "get_tov"}}

def converse(messages):
    response = llm_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        tools=tools,
        tool_choice="auto"
        # tool_choice=tool_choice
    )

    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls

    if tool_calls:
      messages.append(response_message)

      available_functions = {
          "get_tov": get_tov
      }

      for tool_call in tool_calls:
        print(f"Function: {tool_call.function.name}")
        print(f"Params: {tool_call.function.arguments}")

        function_name = tool_call.function.name
        function_to_call = available_functions[function_name]
        function_args = json.loads(tool_call.function.arguments)
        function_response = function_to_call(
            audience = function_args.get("audience"),
            purpose = function_args.get("purpose"),
        )
        print(f"Tool reponse: {function_response}")

        messages.append({
            "tool_call_id": tool_call.id,
            "role": "tool",
            "name": function_name,
            "content": function_response
        })

        second_reponse = llm_client.chat.completions.create(
            model = "gpt-3.5-turbo",
            messages = messages,
        )

        return second_reponse.choices[0].message.content


    # return response.choices[0].message;

input = """
A poor Woodman was cutting down a tree near the edge of a deep pool in the forest. It was late in the day and the Woodman was tired. He had been working since sunrise and his strokes were not so sure as they had been early that morning. Thus it happened that the axe slipped and flew out of his hands into the pool.

The Woodman was in despair. The axe was all he possessed with which to make a living, and he had not money enough to buy a new one. As he stood wringing his hands and weeping, the god Mercury suddenly appeared and asked what the trouble was. The Woodman told what had happened, and straightway the kind Mercury dived into the pool. When he came up again he held a wonderful golden axe.

"Is this your axe?" Mercury asked the Woodman.

"No," answered the honest Woodman, "that is not my axe."

Mercury laid the golden axe on the bank and sprang back into the pool. This time he brought up an axe of silver, but the Woodman declared again that his axe was just an ordinary one with a wooden handle.

Mercury dived down for the third time, and when he came up again he had the very axe that had been lost.

The poor Woodman was very glad that his axe had been found and could not thank the kind god enough. Mercury was greatly pleased with the Woodman's honesty.

"I admire your honesty," he said, "and as a reward you may have all three axes, the gold and the silver as well as your own."

The happy Woodman returned to his home with his treasures, and soon the story of his good fortune was known to everybody in the village. Now there were several Woodmen in the village who believed that they could easily win the same good fortune. They hurried out into the woods, one here, one there, and hiding their axes in the bushes, pretended they had lost them. Then they wept and wailed and called on Mercury to help them.

And indeed, Mercury did appear, first to this one, then to that. To each one he showed an axe of gold, and each one eagerly claimed it to be the one he had lost. But Mercury did not give them the golden axe. Oh no! Instead he gave them each a hard whack over the head with it and sent them home. And when they returned next day to look for their own axes, they were nowhere to be found.

Honesty is the best policy.
"""

result = converse(messages=[
      {
        "role": "system",
        "content": "You are a savvy copywriter for SEO, Social Media, and Blogs."
      }, {
         "role": "user",
         "content": "What's the tone of voice of this text?"
      }, {
          "role": "user",
          "content": input,
      },
])

print(result)