- run
pip3 install -r requirements.txt
- open terminal, and run
python3 app.py
- result will be in
scraped_content
folder
folder naming: <person_id>_<person_name> e.g. 4252_Stephen_Kinnock
We want to create an archive of candidates’ websites over the General Election so we can analyse what issues they focused on, the language, branding, etc that they use.
Write a crawler which archives the HTML of candidate’s websites. Suggested skills: Webscraping
Len
We have planned a workflow and written code for the initial scrape. We need help to complete this code, specifically We also are open to better ways of archiving the websites and/or other data sources to archive (e.g. social media posts).