VinciGit00 / Scrapegraph-ai

Python scraper based on AI

Home Page:https://scrapegraphai.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add Pydantic Schema Validation

PeriniM opened this issue · comments

Discussed in #328

Originally posted by pdavis68 June 3, 2024
I've been trying to set the schema and I've tried a number of variations, but I can't seem to hit upon the magic combination to get it to work.

For example, with this code:

from scrapegraphai.graphs import OmniScraperGraph

# Define the configuration for the graph
graph_config = {
    "llm": {
        "api_key": "<key>",
        "model": "gpt-4-turbo",
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "base_url": "http://localhost:11434/",  # set ollama URL arbitrarily
    },
    "max_results": 5,
    "verbose": True
}

schema= """
    { 
    "top_stories": [
            { 
                "title": "...", 
                "url": "...", 
            }, 
            { 
                "title": "...", 
                "url": "...", 
            } 
        ] 
    } 
"""

omni_scraper_graph = OmniScraperGraph(
   prompt="List the 10 most important things in the news right now. Give the headline and the url.",
   source="http://news.google.com",
   config=graph_config,
   schema=schema
)

result = omni_scraper_graph.run()
print(result)

But what I get is this:

{'news': [{'headline': 'Mexico election 2024: Live updates, results and latest news', 'url': 'http://news.google.com/articles/CBMiN2h0dHBzOi8vYXBuZXdzLmNvbS9saXZlL21leGljby1lbGVjdGlvbi1yZXN1bHRzLXVwZGF0ZXPSAQA?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': 'Mexico election live updates: Voters set to elect first female president', 'url': 'http://news.google.com/articles/CBMiX2h0dHBzOi8vd3d3LmNubi5jb20vYW1lcmljYXMvbGl2ZS1uZXdzL21leGljby1wcmVzaWRlbnRpYWwtZWxlY3Rpb24tcmVzdWx0cy0wNi0wMi0yNC9pbmRleC5odG1s0gFjaHR0cHM6Ly9hbXAuY25uLmNvbS9jbm4vYW1lcmljYXMvbGl2ZS1uZXdzL21leGljby1wcmVzaWRlbnRpYWwtZWxlY3Rpb24tcmVzdWx0cy0wNi0wMi0yNC9pbmRleC5odG1s?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': 'Mexico election 2024 results live: Will Sheinbaum win the presidency?', 'url': 'http://news.google.com/articles/CBMidGh0dHBzOi8vd3d3LmFsamF6ZWVyYS5jb20vbmV3cy9saXZlYmxvZy8yMDI0LzYvMy9tZXhpY28tZWxlY3Rpb24tMjAyNC1yZXN1bHRzLWxpdmUtd2lsbC1zaGVpbmJhdW0td2luLXRoZS1wcmVzaWRlbmN50gF4aHR0cHM6Ly93d3cuYWxqYXplZXJhLmNvbS9hbXAvbmV3cy9saXZlYmxvZy8yMDI0LzYvMy9tZXhpY28tZWxlY3Rpb24tMjAyNC1yZXN1bHRzLWxpdmUtd2lsbC1zaGVpbmJhdW0td2luLXRoZS1wcmVzaWRlbmN5?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': "Claudia Sheinbaum Will Be Mexico's Next President. But Which Version of Her Will Govern?", 'url': 'http://news.google.com/articles/CBMiaWh0dHBzOi8vd3d3LnBvbGl0aWNvLmNvbS9uZXdzL21hZ2F6aW5lLzIwMjQvMDYvMDEvY2xhdWRpYS1zaGVpbmJhdW0tbWV4aWNvLXByZXNpZGVudGlhbC1lbGVjdGlvbi0wMDE2MTA4MdIBAA?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': 'Majority of independents say Trump received ‘fair trial’: Poll', 'url': 'http://news.google.com/articles/CBMicWh0dHBzOi8vdGhlaGlsbC5jb20vcmVndWxhdGlvbi9jb3VydC1iYXR0bGVzLzQ2OTkxNjgtbWFqb3JpdHktb2YtaW5kZXBlbmRlbnRzLXNheS10cnVtcC1yZWNlaXZlZC1mYWlyLXRyaWFsLXBvbGwv0gF1aHR0cHM6Ly90aGVoaWxsLmNvbS9yZWd1bGF0aW9uL2NvdXJ0LWJhdHRsZXMvNDY5OTE2OC1tYWpvcml0eS1vZi1pbmRlcGVuZGVudHMtc2F5LXRydW1wLXJlY2VpdmVkLWZhaXItdHJpYWwtcG9sbC9hbXAv?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': "Looming over Trump's conviction: reversal by the '13th juror'", 'url': 'http://news.google.com/articles/CBMiT2h0dHBzOi8vd3d3LnBvbGl0aWNvLmNvbS9uZXdzLzIwMjQvMDYvMDIvdHJ1bXAtY29udmljdGlvbi1hcHBlYWwtanVyb3ItMDAxNjExMTDSAQA?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': 'Half say Trump verdict correct, should end campaign: Poll', 'url': 'http://news.google.com/articles/CBMiZmh0dHBzOi8vdGhlaGlsbC5jb20vaG9tZW5ld3MvY2FtcGFpZ24vNDY5OTMyOS1oYWxmLXNheS10cnVtcC12ZXJkaWN0LWNvcnJlY3Qtc2hvdWxkLWVuZC1jYW1wYWlnbi1wb2xsL9IBamh0dHBzOi8vdGhlaGlsbC5jb20vaG9tZW5ld3MvY2FtcGFpZ24vNDY5OTMyOS1oYWxmLXNheS10cnVtcC12ZXJkaWN0LWNvcnJlY3Qtc2hvdWxkLWVuZC1jYW1wYWlnbi1wb2xsL2FtcC8?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': 'Opinion | Holy Cow, 34 for 45!', 'url': 'http://news.google.com/articles/CBMiQmh0dHBzOi8vd3d3Lm55dGltZXMuY29tLzIwMjQvMDYvMDEvb3Bpbmlvbi9kb25hbGQtdHJ1bXAtdHJpYWwuaHRtbNIBAA?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': "US has 'every expectation' Israel will expect ceasefire proposal", 'url': 'http://news.google.com/articles/CBMiLmh0dHBzOi8vd3d3LmJiYy5jb20vbmV3cy9hcnRpY2xlcy9jZDExbDU5MHFxd2_SATJodHRwczovL3d3dy5iYmMuY29tL25ld3MvYXJ0aWNsZXMvY2QxMWw1OTBxcXdvLmFtcA?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': "As Hunter Biden goes to trial on gun charges, here's how his attorneys plan to defend a 'simple case'", 'url': 'http://news.google.com/articles/CBMiYWh0dHBzOi8vYWJjbmV3cy5nby5jb20vVVMvaHVudGVyLWJpZGVuLXRyaWFsLWd1bi1jaGFyZ2VzLWF0dG9ybmV5cy1wbGFuLWRlZmVuZC9zdG9yeT9pZD0xMTA3MjM0NjbSAQA?hl=en-US≷=US&ceid;=US%3Aen'}]}

Very cool tool, BTW. Really enjoying exploring it.