VinciGit00 / Scrapegraph-ai

Python scraper based on AI

Home Page:https://scrapegraphai.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A version SmartScraperGraph that scans the whole website

amittos opened this issue · comments

Is your feature request related to a problem? Please describe.

This issue is not related to a problem, but I believe that SmartScraperGraph has not reached its full potential. It is often the case that the information the user is looking for is on a page other than the input page. For example, let's assume that I want to extract the contact email of a website. Often, this can be found in the footer, but sometimes it can only be found on the "Contact Us" page. If the latter is the case, then SmartScraperGraph fails to retrieve the requested information.

Describe the solution you'd like

SmartScraperGraph should have a parameter option that allows it to scan all of a website's subpages. There should also be a subpage threshold for safety reasons, for example, in the case of the website having hundreds or thousands of subpages.

Describe alternatives you've considered

N/A

Additional context

N/A

Hey @amittos working on this in #260