Add Pydantic Schema Validation
PeriniM opened this issue · comments
Discussed in #328
Originally posted by pdavis68 June 3, 2024
I've been trying to set the schema and I've tried a number of variations, but I can't seem to hit upon the magic combination to get it to work.
For example, with this code:
from scrapegraphai.graphs import OmniScraperGraph
# Define the configuration for the graph
graph_config = {
"llm": {
"api_key": "<key>",
"model": "gpt-4-turbo",
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"base_url": "http://localhost:11434/", # set ollama URL arbitrarily
},
"max_results": 5,
"verbose": True
}
schema= """
{
"top_stories": [
{
"title": "...",
"url": "...",
},
{
"title": "...",
"url": "...",
}
]
}
"""
omni_scraper_graph = OmniScraperGraph(
prompt="List the 10 most important things in the news right now. Give the headline and the url.",
source="http://news.google.com",
config=graph_config,
schema=schema
)
result = omni_scraper_graph.run()
print(result)
But what I get is this:
{'news': [{'headline': 'Mexico election 2024: Live updates, results and latest news', 'url': 'http://news.google.com/articles/CBMiN2h0dHBzOi8vYXBuZXdzLmNvbS9saXZlL21leGljby1lbGVjdGlvbi1yZXN1bHRzLXVwZGF0ZXPSAQA?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': 'Mexico election live updates: Voters set to elect first female president', 'url': 'http://news.google.com/articles/CBMiX2h0dHBzOi8vd3d3LmNubi5jb20vYW1lcmljYXMvbGl2ZS1uZXdzL21leGljby1wcmVzaWRlbnRpYWwtZWxlY3Rpb24tcmVzdWx0cy0wNi0wMi0yNC9pbmRleC5odG1s0gFjaHR0cHM6Ly9hbXAuY25uLmNvbS9jbm4vYW1lcmljYXMvbGl2ZS1uZXdzL21leGljby1wcmVzaWRlbnRpYWwtZWxlY3Rpb24tcmVzdWx0cy0wNi0wMi0yNC9pbmRleC5odG1s?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': 'Mexico election 2024 results live: Will Sheinbaum win the presidency?', 'url': 'http://news.google.com/articles/CBMidGh0dHBzOi8vd3d3LmFsamF6ZWVyYS5jb20vbmV3cy9saXZlYmxvZy8yMDI0LzYvMy9tZXhpY28tZWxlY3Rpb24tMjAyNC1yZXN1bHRzLWxpdmUtd2lsbC1zaGVpbmJhdW0td2luLXRoZS1wcmVzaWRlbmN50gF4aHR0cHM6Ly93d3cuYWxqYXplZXJhLmNvbS9hbXAvbmV3cy9saXZlYmxvZy8yMDI0LzYvMy9tZXhpY28tZWxlY3Rpb24tMjAyNC1yZXN1bHRzLWxpdmUtd2lsbC1zaGVpbmJhdW0td2luLXRoZS1wcmVzaWRlbmN5?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': "Claudia Sheinbaum Will Be Mexico's Next President. But Which Version of Her Will Govern?", 'url': 'http://news.google.com/articles/CBMiaWh0dHBzOi8vd3d3LnBvbGl0aWNvLmNvbS9uZXdzL21hZ2F6aW5lLzIwMjQvMDYvMDEvY2xhdWRpYS1zaGVpbmJhdW0tbWV4aWNvLXByZXNpZGVudGlhbC1lbGVjdGlvbi0wMDE2MTA4MdIBAA?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': 'Majority of independents say Trump received ‘fair trial’: Poll', 'url': 'http://news.google.com/articles/CBMicWh0dHBzOi8vdGhlaGlsbC5jb20vcmVndWxhdGlvbi9jb3VydC1iYXR0bGVzLzQ2OTkxNjgtbWFqb3JpdHktb2YtaW5kZXBlbmRlbnRzLXNheS10cnVtcC1yZWNlaXZlZC1mYWlyLXRyaWFsLXBvbGwv0gF1aHR0cHM6Ly90aGVoaWxsLmNvbS9yZWd1bGF0aW9uL2NvdXJ0LWJhdHRsZXMvNDY5OTE2OC1tYWpvcml0eS1vZi1pbmRlcGVuZGVudHMtc2F5LXRydW1wLXJlY2VpdmVkLWZhaXItdHJpYWwtcG9sbC9hbXAv?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': "Looming over Trump's conviction: reversal by the '13th juror'", 'url': 'http://news.google.com/articles/CBMiT2h0dHBzOi8vd3d3LnBvbGl0aWNvLmNvbS9uZXdzLzIwMjQvMDYvMDIvdHJ1bXAtY29udmljdGlvbi1hcHBlYWwtanVyb3ItMDAxNjExMTDSAQA?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': 'Half say Trump verdict correct, should end campaign: Poll', 'url': 'http://news.google.com/articles/CBMiZmh0dHBzOi8vdGhlaGlsbC5jb20vaG9tZW5ld3MvY2FtcGFpZ24vNDY5OTMyOS1oYWxmLXNheS10cnVtcC12ZXJkaWN0LWNvcnJlY3Qtc2hvdWxkLWVuZC1jYW1wYWlnbi1wb2xsL9IBamh0dHBzOi8vdGhlaGlsbC5jb20vaG9tZW5ld3MvY2FtcGFpZ24vNDY5OTMyOS1oYWxmLXNheS10cnVtcC12ZXJkaWN0LWNvcnJlY3Qtc2hvdWxkLWVuZC1jYW1wYWlnbi1wb2xsL2FtcC8?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': 'Opinion | Holy Cow, 34 for 45!', 'url': 'http://news.google.com/articles/CBMiQmh0dHBzOi8vd3d3Lm55dGltZXMuY29tLzIwMjQvMDYvMDEvb3Bpbmlvbi9kb25hbGQtdHJ1bXAtdHJpYWwuaHRtbNIBAA?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': "US has 'every expectation' Israel will expect ceasefire proposal", 'url': 'http://news.google.com/articles/CBMiLmh0dHBzOi8vd3d3LmJiYy5jb20vbmV3cy9hcnRpY2xlcy9jZDExbDU5MHFxd2_SATJodHRwczovL3d3dy5iYmMuY29tL25ld3MvYXJ0aWNsZXMvY2QxMWw1OTBxcXdvLmFtcA?hl=en-US≷=US&ceid;=US%3Aen'}, {'headline': "As Hunter Biden goes to trial on gun charges, here's how his attorneys plan to defend a 'simple case'", 'url': 'http://news.google.com/articles/CBMiYWh0dHBzOi8vYWJjbmV3cy5nby5jb20vVVMvaHVudGVyLWJpZGVuLXRyaWFsLWd1bi1jaGFyZ2VzLWF0dG9ybmV5cy1wbGFuLWRlZmVuZC9zdG9yeT9pZD0xMTA3MjM0NjbSAQA?hl=en-US≷=US&ceid;=US%3Aen'}]}
Very cool tool, BTW. Really enjoying exploring it.