WindBlaze1 / Google-Scraper

It scrapes google for multiple search queries by using user-provided proxies and stores the result in user's google spreadsheet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Google-Scraper

This Project Scrapes Google for a Search Query and Stores the results in a Google Spreadsheet

It has an Elegant and Clean UI developed in Bootstrap Studio and the backend is written in Django

Some Screenshots of its working:

image

image

Results of Scraping:

image

Logs:

image

How to Run:

First, extract the ZIP files in a folder, let's call it main.

You will need a Google service account json credentials file for authorization in pygshees:

  1. Head to Google Developers Console and create a new project (or select the one you have.)

image

  1. You will be redirected to the Project Dashboard, there click on “Enable Apis and services”, search for “Sheets API”.

image

  1. In the API screen click on ‘ENABLE’ to enable this API.

image

  1. Similarly enable the “Drive API”. We require drives api for getting list of spreadsheets, deleting them etc.
  2. Go to Sheets API screen, then go to “Credentials” tab and choose “Create Credentials > Service Account Key”.
  3. Next choose the service account as ‘App Engine default’ and Key type as JSON and click create:

image

  1. You will now be prompted to download a .json file. This file contains the necessary private key for account authorization. Name the file as 'service_account_sheets.json'.

image

This is how this file may look like:

{ "type": "service_account", "project_id": "p....sdf", "private_key_id": "48.....", "private_key": "-----BEGIN PRIVATE KEY-----\nNrDyLw … jINQh/9\n-----END PRIVATE KEY-----\n", "client_email": "p.....@appspot.gserviceaccount.com", "client_id": "10.....454", }

  1. After getting the json file, paste it in the main folder.

Now, to install all the dependencies:

  1. Open the main folder in the terminal.
  2. Write this command:

pip install -r requirements.txt

  1. After all dependencies are installed successfully, run the server:

python manage.py runserver

  1. Now, you will get a link in the terminal, open that in a browser.

About

It scrapes google for multiple search queries by using user-provided proxies and stores the result in user's google spreadsheet


Languages

Language:JavaScript 47.5%Language:CSS 40.9%Language:Python 6.6%Language:HTML 5.0%