callmaps aka. pwmaps

built & tested using Python 3.9.7

framework aka. programs and workflows to collect data from Google Maps (as HTML) & extract it to a format (CSV) in which it can be used to do cold calls efficiently & effectively (without duplicates)

Elements of Framework

`readRegions.py`

reads plz-column from zuordnung_plz_ort.csv

helper script to initially get plz's (was only relevant once - relevant list of PLZ's now in plzList.py - may be used/extended later)

`plzList.py`

file containing list of plz's generated from readRegions.py

setup once & always unedited (actually rather a database then a python file)

`data_ColdCalls.ods`

spreadsheet to list potential contacts & document cold calls

`ccallmaps.py`

starts chrome browser & searches through list of given search terms on Google Maps (after denying cookies)

works only for German Google Maps so far (selectors given in German, because classes defined dynamically)
super unstable

3 different versions for 3 different systems:

ccallmaps.py (Windows)
ccallmaps-slow.py (Raspberry Pi)
ccallmaps-slow-systemd.py (systemd service on Raspberry Pi)

systemd service (on Linux): `ccallmaps.service`

systemd service, which starts & restarts ccallmaps.py on Linux consistently when it fails

saved in /etc/systemd/user/ccallmaps.service (see usage > systemd > setup & start below)
meant to run infinitely (no auto-restart after reboot; only restart after crash of ccallmaps.py)

`findFilterInfo.py`

finds target elements from HTML files of ccallmaps.py-searches & writes them to CSV files with pre-sorted data & prepares it to be concatenated & filtered (by pdMergeOnlyUnique.py) to amend existing contacts in searches sheet of data_ColdCalls.ods

finds Google Business Profile, website (or no website), job
finds job category (within medical sector (physician, alternative practitioner, healer, physio, ergo, dentist, yoga, animal, psychologist)) based on text in article & writes it in same row as respective link to Google Business Profile in CSV

How to `findFilterInfo.py`

format of result sheet: firstSheetOfListStem-lastSheetOfListStem.csv

move one or more search-result-HTMLs to same directory as findFilterInfo.py
- delete search results from single searches eventually!
- open directory in windows file explorer
  - sort files by name (ascending)
  - highlight one or more HTML files
    - click on HTML file with biggest plz first (bottom), then scroll to top, press+hold Shift & click HTML file with smalles plz (to highlight all files)
      
      Do it that way so that smalles plz is at beginning & biggest plz at end of list in findFilterInfo.py later - so that output-CSV named smallestPlz_targetGroup-biggestPlz_targetGroup.csv
    - press CTRL+C to copy filenames
open findFilterInfo.py in VSCode
- create fileInput list at beginning of findFilterInfo.py & paste copied HTML-filenames between brackets
- delete .html-extension from all filenames (can be done by highlighting all files using CTRL+D (several times))
- surround all file-stems with quotes & place comma behind to create valid Python list

limitations/specifics of `findFilterInfo.py`

needs bs4 installed on python
currently only works if findFilterInfo.py in same directory as HTML files to be extracted
- extracts to same-named CSV-files in same directory (format of resulting csv files: originalFileStem.csv)
doesn't accept input; filenames (stems!) or list of them must be written to fileInput-variable at beginning of findFilerInfo.py

`pdMergeOnlyUnique.py`

script - merges CSV sheets into new sheet while discarding duplicate entries AND then only keep the new ones, which haven't been in the note-taking df before

How to `pdMergeOnlyUnique.py`

format of result sheet: YY-MM-DD_uniqueNew_activeSheetStem+newDataSheet

prepare contact-info-CSVs in same directory as pdMergeOnlyUnique.py
- save results sheet from data_ColdCalls.ods as CSV (e.g. with format YY-MM-DD_data_ColdCalls.csv)
- copy firstSheetOfListStem-lastSheetOfListStem.csv (containing filtered search results; created by findFilerInfo.py above) to directory of pdMergeOnlyUnique.py
create file which only contains unique new contacts (which haven't been in results sheet of data_ColdCalls.ods before)
- navigate to directory of saved files & pdMergeOnlyUnique.py
- run py pdMergeOnlyUnique.py data_ColdCalls.ods firstSheetOfListStem-lastSheetOfListStem.csv on command line (on windows)
open resulting csv file (saved in same directory)
- highlight rows & columns containing data (CTRL+Shift+arrow-keys) & copy data
- paste copied data form new file at the end of result sheet in active data_ColdCalls.ods

tools & sources

zuordnung_plz_ort (CSV of regions & zip codes in Germany)
requirements.txt (ensures to install all dependencies correctly in python virtual environment)

installation

install playwright
create venv & install requirements (🔗Linux; 🔗Windows)

usage

systemd

once systemctl service set up & running correctly -> runs until shutdown

following steps recommended python systemd tutorial in reply of reddit question

🛑❗basic assumption: repository cloned into /home/pi/Dokumente/pwmaps of Linux (Raspberry Pi)

otherwise adjustments of path's in ccallmaps.py & ccallmaps.service needed

using --user flag on command line (instead of User= in ccallmaps.service) to run user- & not system-service

because setting up user-service as in python-systemd-tutorial failed (because of wrong user definition or so)

probably not clean solution (difference between --user flag & User=)

service: ccallmaps.service
systemd usage similar to runVenv commands

setup & start

(sudo apt-get install -y systemd) (may be pre-installed on linux already)
sudo nano /etc/systemd/user/ccallmaps.service - to create user service (not system service as in runVenv) (copy+paste content from ccallmaps.service)
systemctl --user daemon-reload
(systemctl --user enable ccallmaps.service - not relevant here because service not configured to automatically start on reboot)
systemctl --user start ccallmaps.service

debug & stop

systemctl --user status ccallmaps.service
systemctl --user stop ccallmaps.service

systemctl --user restart ccallmaps.service
journalctl --user-unit ccallmaps (debug info)
journalctl _PID=????? (to debug failed services)
(journalctl --vacuum-time=10min) (delete systemd log -> docs)
- not sure if working for systemd log of user or only for system
- use sudo journalctl --rotate instead for now (proably only archives the user journalctl & eventually fills up storage, but results in clean journalctl again)

search-result-HTML's to search-result-CSV

using findFilterInfo.py (see above)

search-result-CSV's to non-duplicate-CSV (YY-MM-DD_targetGroup_oldestPlz-latestPlz.csv; for amendment of results-sheet in `data_ColdCalls.ods`)

using pdMergeOnlyUnique.py (see above)

debug info

.log-file created by ccallmaps.py in same directory (& with same name)
- doesn't contain print()-statements from fileOperations.py
- doesn't contain error-messages from actual script failing errors (only the expected, catched & described one's in script)
  - journalctl --user-unit ccallmaps contains info about actual errors but may need to be cleaned with from time to time (see usage > systemd > debug & stop above)

what actually happens

Google Maps Search in Playwright (non-headless Chrome Browser on Linux Raspberry Pi)
- based on recommended python systemd tutorial in reply of reddit question
- searches for certain target group along list of zip codes of Germany & saves search results as HTML's in main directory
python programs used to extract, filter & transform relevant data (google business profile link, website, guess of job-type) into CSV format with no duplicates from different searches
- further python programs eventually compare existing contact-data with new data & filter duplicates
result: ongoing collection of non-duplicate contacts for e.g. cold calling

resources

Raspberry Pi Setup (Ubuntu Desktop 22)

libraries/frameworks

playwright
bs4
pandas

code

`ccallmaps.py`

Python's basic logging
- replace all print statements with logging (maybe good practice in general)☑
- would actually like to catch all errors by default in log file (as soon as error stops python program) - didn't find a simple enough solution yet
- future idea: use command line syntax to log to file automatically - tried - doesn't work with service normally - but may work when explicitly called via bash

approaches

limitations

slow
region of search only determined by zip code (PLZ)

known issues

logging/debugging

find1stfilePart() in fileOperations.py just prints (& doesn't log) (so it's informations won't appear in the logs of ccallmaps.py)
- potential solution: put all functions in ccallmaps.py (and make them all part of one class)
don't know how to delete systemd log for user properly (raspi expected to run out of storage sooner or later, because logs probably only archived, using --rotate)

potential improvements

docs: create video examples/walk-throughs of programs & workflows
make ccallmaps.py run headless on Raspberry Pi (or any other server)
automate further parts of the workflow (find & store HTML search results with ccallmaps.py & immediately convert into non-duplicate-CSV of search results)
- move step of comparing non-duplicate search-result-CSV with active coldCall-CSV from pdMerge_onlyUnqiue.py as optional(?) function to findFilterInfo.py
- !(easy&time-saving:) make findFilterInfo.py automatically create csv from all HTML files in directory (which match certain naming pattern)
use geo-coordinates of plz center instead of searching for PLZs
- examplary Maps search, based on geo-coordinates
use googlemaps API Python package (docs for gmaps python package) instead of Playwright
- or use actual Google Maps Nearby Places API
  - exmaplary nearby business search on Google Maps

Sammeeey / ccallmaps

callmaps aka. pwmaps

Elements of Framework

`readRegions.py`

`plzList.py`

`data_ColdCalls.ods`

`ccallmaps.py`

systemd service (on Linux): `ccallmaps.service`

`findFilterInfo.py`

How to `findFilterInfo.py`

limitations/specifics of `findFilterInfo.py`

`pdMergeOnlyUnique.py`

How to `pdMergeOnlyUnique.py`

tools & sources

installation

usage

systemd

setup & start

debug & stop

search-result-HTML's to search-result-CSV

search-result-CSV's to non-duplicate-CSV (YY-MM-DD_targetGroup_oldestPlz-latestPlz.csv; for amendment of results-sheet in `data_ColdCalls.ods`)

debug info

what actually happens

resources

libraries/frameworks

code

`ccallmaps.py`

`fileOperations.py`

`pdMergeOnlyUnique.py`

`findFilterInfo.py`

approaches

limitations

known issues

logging/debugging

potential improvements

results/attempts

About

Languages

callmaps aka. pwmaps

Elements of Framework

readRegions.py

plzList.py

data_ColdCalls.ods

ccallmaps.py

systemd service (on Linux): ccallmaps.service

findFilterInfo.py

How to findFilterInfo.py

limitations/specifics of findFilterInfo.py

pdMergeOnlyUnique.py

How to pdMergeOnlyUnique.py

tools & sources

installation

usage

systemd

setup & start

debug & stop

search-result-HTML's to search-result-CSV

search-result-CSV's to non-duplicate-CSV (YY-MM-DD_targetGroup_oldestPlz-latestPlz.csv; for amendment of results-sheet in data_ColdCalls.ods)

debug info

what actually happens

resources

libraries/frameworks

code

ccallmaps.py

fileOperations.py

pdMergeOnlyUnique.py

findFilterInfo.py

approaches

limitations

known issues

logging/debugging

potential improvements

results/attempts

About

Languages

`readRegions.py`

`plzList.py`

`data_ColdCalls.ods`

`ccallmaps.py`

systemd service (on Linux): `ccallmaps.service`

`findFilterInfo.py`

How to `findFilterInfo.py`

limitations/specifics of `findFilterInfo.py`

`pdMergeOnlyUnique.py`

How to `pdMergeOnlyUnique.py`

search-result-CSV's to non-duplicate-CSV (YY-MM-DD_targetGroup_oldestPlz-latestPlz.csv; for amendment of results-sheet in `data_ColdCalls.ods`)

`ccallmaps.py`

`fileOperations.py`

`pdMergeOnlyUnique.py`

`findFilterInfo.py`