built & tested using Python 3.9.7
- framework aka. programs and workflows to collect data from Google Maps (as HTML) & extract it to a format (CSV) in which it can be used to do cold calls efficiently & effectively (without duplicates)
reads plz-column from
zuordnung_plz_ort.csv
- helper script to initially get plz's (was only relevant once - relevant list of PLZ's now in
plzList.py
- may be used/extended later)
file containing list of plz's generated from
readRegions.py
- setup once & always unedited (actually rather a database then a python file)
- spreadsheet to list potential contacts & document cold calls
- starts chrome browser & searches through list of given search terms on Google Maps (after denying cookies)
- works only for German Google Maps so far (selectors given in German, because classes defined dynamically)
- super unstable
3 different versions for 3 different systems:
ccallmaps.py
(Windows)ccallmaps-slow.py
(Raspberry Pi)ccallmaps-slow-systemd.py
(systemd service on Raspberry Pi)
systemd service, which starts & restarts
ccallmaps.py
on Linux consistently when it fails
- saved in
/etc/systemd/user/ccallmaps.service
(see usage > systemd > setup & start below) - meant to run infinitely (no auto-restart after reboot; only restart after crash of
ccallmaps.py
)
finds target elements from HTML files of
ccallmaps.py
-searches & writes them to CSV files with pre-sorted data & prepares it to be concatenated & filtered (bypdMergeOnlyUnique.py
) to amend existing contacts in searches sheet ofdata_ColdCalls.ods
- finds Google Business Profile, website (or no website), job
- finds job category (within medical sector (physician, alternative practitioner, healer, physio, ergo, dentist, yoga, animal, psychologist)) based on text in article & writes it in same row as respective link to Google Business Profile in CSV
format of result sheet: firstSheetOfListStem-lastSheetOfListStem.csv
- move one or more search-result-HTMLs to same directory as
findFilterInfo.py
- delete search results from single searches eventually!
- open directory in windows file explorer
- sort files by name (ascending)
- highlight one or more HTML files
- click on HTML file with biggest plz first (bottom), then scroll to top, press+hold Shift & click HTML file with smalles plz (to highlight all files)
Do it that way so that smalles plz is at beginning & biggest plz at end of list in
findFilterInfo.py
later - so that output-CSV named smallestPlz_targetGroup-biggestPlz_targetGroup.csv - press CTRL+C to copy filenames
- click on HTML file with biggest plz first (bottom), then scroll to top, press+hold Shift & click HTML file with smalles plz (to highlight all files)
- open
findFilterInfo.py
in VSCode- create
fileInput
list at beginning offindFilterInfo.py
& paste copied HTML-filenames between brackets - delete .html-extension from all filenames (can be done by highlighting all files using CTRL+D (several times))
- surround all file-stems with quotes & place comma behind to create valid Python list
- create
- needs bs4 installed on python
- currently only works if
findFilterInfo.py
in same directory as HTML files to be extracted- extracts to same-named CSV-files in same directory (format of resulting csv files: originalFileStem.csv)
- doesn't accept input; filenames (stems!) or list of them must be written to
fileInput
-variable at beginning offindFilerInfo.py
script - merges CSV sheets into new sheet while discarding duplicate entries AND then only keep the new ones, which haven't been in the note-taking df before
format of result sheet: YY-MM-DD_uniqueNew_activeSheetStem+newDataSheet
- prepare contact-info-CSVs in same directory as
pdMergeOnlyUnique.py
- save results sheet from
data_ColdCalls.ods
as CSV (e.g. with format YY-MM-DD_data_ColdCalls.csv) - copy firstSheetOfListStem-lastSheetOfListStem.csv (containing filtered search results; created by
findFilerInfo.py
above) to directory ofpdMergeOnlyUnique.py
- save results sheet from
- create file which only contains unique new contacts (which haven't been in results sheet of
data_ColdCalls.ods
before)- navigate to directory of saved files &
pdMergeOnlyUnique.py
- run
py pdMergeOnlyUnique.py data_ColdCalls.ods firstSheetOfListStem-lastSheetOfListStem.csv
on command line (on windows)
- navigate to directory of saved files &
- open resulting csv file (saved in same directory)
- highlight rows & columns containing data (CTRL+Shift+arrow-keys) & copy data
- paste copied data form new file at the end of result sheet in active
data_ColdCalls.ods
- zuordnung_plz_ort (CSV of regions & zip codes in Germany)
requirements.txt
(ensures to install all dependencies correctly in python virtual environment)
- install playwright
- create venv & install requirements (🔗Linux; 🔗Windows)
- once systemctl service set up & running correctly -> runs until shutdown
- following steps recommended python systemd tutorial in reply of reddit question
- 🛑❗basic assumption: repository cloned into
/home/pi/Dokumente/pwmaps
of Linux (Raspberry Pi)
- otherwise adjustments of path's in
ccallmaps.py
&ccallmaps.service
needed- using
--user
flag on command line (instead ofUser=
inccallmaps.service
) to run user- & not system-service
- because setting up user-service as in python-systemd-tutorial failed (because of wrong user definition or so)
- probably not clean solution (difference between
--user
flag &User=
)
- service:
ccallmaps.service
- systemd usage similar to runVenv commands
- (
sudo apt-get install -y systemd
) (may be pre-installed on linux already) sudo nano /etc/systemd/user/ccallmaps.service
- to create user service (not system service as in runVenv) (copy+paste content fromccallmaps.service
)systemctl --user daemon-reload
- (
systemctl --user enable ccallmaps.service
- not relevant here because service not configured to automatically start on reboot) systemctl --user start ccallmaps.service
systemctl --user status ccallmaps.service
systemctl --user stop ccallmaps.service
systemctl --user restart ccallmaps.service
journalctl --user-unit ccallmaps
(debug info)journalctl _PID=?????
(to debug failed services)- (
journalctl --vacuum-time=10min
) (delete systemd log -> docs)- not sure if working for systemd log of user or only for system
- use
sudo journalctl --rotate
instead for now (proably only archives the user journalctl & eventually fills up storage, but results in clean journalctl again)
using
findFilterInfo.py
(see above)
search-result-CSV's to non-duplicate-CSV (YY-MM-DD_targetGroup_oldestPlz-latestPlz.csv; for amendment of results-sheet in data_ColdCalls.ods
)
using
pdMergeOnlyUnique.py
(see above)
.log
-file created byccallmaps.py
in same directory (& with same name)- doesn't contain
print()
-statements fromfileOperations.py
- doesn't contain error-messages from actual script failing errors (only the expected, catched & described one's in script)
journalctl --user-unit ccallmaps
contains info about actual errors but may need to be cleaned with from time to time (see usage > systemd > debug & stop above)
- doesn't contain
- Google Maps Search in Playwright (non-headless Chrome Browser on Linux Raspberry Pi)
- based on recommended python systemd tutorial in reply of reddit question
- searches for certain target group along list of zip codes of Germany & saves search results as HTML's in main directory
- python programs used to extract, filter & transform relevant data (google business profile link, website, guess of job-type) into CSV format with no duplicates from different searches
- further python programs eventually compare existing contact-data with new data & filter duplicates
- result: ongoing collection of non-duplicate contacts for e.g. cold calling
- playwright
- bs4
- pandas
- Python's basic logging
- replace all print statements with logging (maybe good practice in general)☑
- would actually like to catch all errors by default in log file (as soon as error stops python program) - didn't find a simple enough solution yet
- future idea: use command line syntax to log to file automatically - tried - doesn't work with service normally - but may work when explicitly called via bash
- build ascending list from pathlib.Path.glob() (Python sorting basics)
- use IGNORECASE (py-docs 1, py-docs 2) to find results independent of case
- use bs4's
find_all(string=)
along with re.compile - use
isinstance()
to check for type (conditionals) - use
*args
parameter to accept single & multiple elements as function arguments
- slow
- region of search only determined by zip code (PLZ)
find1stfilePart()
infileOperations.py
just prints (& doesn't log) (so it's informations won't appear in the logs ofccallmaps.py
)- potential solution: put all functions in
ccallmaps.py
(and make them all part of one class)
- potential solution: put all functions in
- don't know how to delete systemd log for user properly (raspi expected to run out of storage sooner or later, because logs probably only archived, using
--rotate
)
- docs: create video examples/walk-throughs of programs & workflows
- make
ccallmaps.py
run headless on Raspberry Pi (or any other server) - automate further parts of the workflow (find & store HTML search results with
ccallmaps.py
& immediately convert into non-duplicate-CSV of search results)- move step of comparing non-duplicate search-result-CSV with active coldCall-CSV from
pdMerge_onlyUnqiue.py
as optional(?) function tofindFilterInfo.py
- !(easy&time-saving:) make
findFilterInfo.py
automatically create csv from all HTML files in directory (which match certain naming pattern)
- move step of comparing non-duplicate search-result-CSV with active coldCall-CSV from
- use geo-coordinates of plz center instead of searching for PLZs
- use googlemaps API Python package (docs for gmaps python package) instead of Playwright
- or use actual Google Maps Nearby Places API
- exmaplary nearby business search on Google Maps
- or use actual Google Maps Nearby Places API