jojorb / google-knowledge-scraper

Scraper for business leads from Google knowledge panel in Python3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

code.png

Google Knowledge Panel Scraper GitHub

Retrieve available business leads from Google knowledge panel in Python3

gkps is very inspire by knowledge-panel-scraper a scraper in CLI for Google's Knowledge Panels

Highlights

  • scape with less false negatives
  • segment results
  • fancy prompt

Install

Use git to clone the repository, then install required libraries with the package manager pip.

requirements.txt generated by pipreqs

git clone https://github.com/RobyRemzy/google-knowledge-scraper.git
cd google-knowledge-scraper
pip install -r requirements.txt

Usage

python gkps.py inputfile.csv

inputfile.csv should be a plain text CSV file with each row containing data to generate a search query for a specific business. For example:

"Bobcat of Monroe,Monroe,NC",1711 MORGAN MILL ROAD,MONROE,NC,28110,(704) 289-2200
"Kelly's Garage,Perry,NY",2868 STATE ROUTE 246,PERRY,NY,14530,(585) 237-2504
"Hoxie Implement Co,Hoxie,KS",933 OAK AVENUE,HOXIE,KS,67740-0587,(785) 675-3201
"Duhon Machinery,St. Rose,LA",10460 WEST AIRLINE HIGHWAY,ST. ROSE,LA,70087,(504) 466-5495

demo.png

The script will try to fetch data on Google knowledge panel and if it fail it will try it again (as it can be successful this time!). If it fail for the second time it will jump to the next row.

  • Green => data has been saved
  • Cyan => data has been re fetch
  • Red => data has been re fetch but not sucessfully

When finished it will prompt you to tweak by hand failed queries on your default editor.

If gkps.py finish with successful response, files will be copied in a timed folder

  • results.csv contains all existing results
  • results_true.csv contains only successful responses
  • results_false.csv contains only failed responses

Generated files from the last commande are also in the root directory and will be overridden on next attempt.

After some tweaks (or not) you can re launch the party with this command until you cannot retrieve any good data.

python gkps.py results_false.csv

Contributing

Pull requests are welcome. Let's do this in Rust lang?

Maintainers

About

Scraper for business leads from Google knowledge panel in Python3

License:MIT License


Languages

Language:Python 100.0%