lenmetson/scraping-candidate-websites

Scraping candidate websites

Get started:

run pip3 install -r requirements.txt
open terminal, and run python3 app.py
result will be in scraped_content folder

File structure

folder naming: <person_id>_<person_name> e.g. 4252_Stephen_Kinnock

Content copy from the master spreasheet

Background:

We want to create an archive of candidates’ websites over the General Election so we can analyse what issues they focused on, the language, branding, etc that they use.

Challenge:

Write a crawler which archives the HTML of candidate’s websites. Suggested skills: Webscraping

Team Members:

Len

Useful links:

https://democracyclub.org.uk/blog/2024/05/27/2024-general-election-data-and-resources-for-campaigners-journalists-and-researchers/

Recorded Progress:

We have planned a workflow and written code for the initial scrape. We need help to complete this code, specifically We also are open to better ways of archiving the websites and/or other data sources to archive (e.g. social media posts).

lenmetson / scraping-candidate-websites