Monitors Knowledge Graphs by taking a list of brand searches, queries Google, checks for the Knowledge Graph (KG) and records the image. It then takes the previous day's results and compares the images; results that have changed are flagged with a '1'
in the 'data'
tab and the records are placed in a sheet called 'image_change_tracking'
.
The script is intended to be ran everyday. This can be accomplished by running it local manually (Ewww), setting up a batch file (windows) + task schedule to run automagically, or adding proxies to the get_serp(url)
function's use of Selenium and throwing it on an EC2 instance + cronjob.
Create your virtual using the environment.yml file associated with the repo. It makes use of Gsheets API via the gspread
library as well as Selenium
+ BeautifulSoup
to get the Google SERP and pandas
to handle/compare data. After your enivonrment is setup, you'll need to get serviceaccount credentials through the Google's Developer Console saving the credentials as client_secret.json
in the script's directory. You'll also need chromedriver.exe
in the scripts path, which you can get from here
Once you've setup the script to work, create a new gsheet (example) with the following tabs:
- data (stores all historical data)
- 1 row + 5 columns, With these headers:
Business Name
,google_query
,kg_image_url
,timestamp
,change_detected
- 1 row + 5 columns, With these headers:
- image_change_tracking (store only records where images changed)
- 1 row + 5 columns, with these headers:
Business Name
,date_discovered
,google_query
,new_kg_image_url
,old_kg_image_url
- 1 row + 5 columns, with these headers:
- brands_to_query (the brands to query)
- 1 column with this header:
Business Name
- 1 column with this header:
...update the gsheet_workbook_name
variable to your sheet's name and invite your serviceaccount
with edit privileges (its address will be something like: the-name-you-gave-it@gsheets-205000.iam.gserviceaccount.com).
This thing was written originally 2 years ago, it was an adventure figuring out what everything did a couple years removed (thank god for comments) and there's some cringe worthy code here; I'll improve it overtime (like removing terrible itterators eg- range(len(df))
). If you have problems just hit me up on Twitter and/or fork it.