lenmetson / scraping-candidate-websites

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scraping candidate websites

Get started:

  1. run pip3 install -r requirements.txt
  2. open terminal, and run python3 app.py
  3. result will be in scraped_content folder

File structure

folder naming: <person_id>_<person_name> e.g. 4252_Stephen_Kinnock

Screenshot 2024-05-29 at 15 48 09

Content copy from the master spreasheet

Background:

We want to create an archive of candidates’ websites over the General Election so we can analyse what issues they focused on, the language, branding, etc that they use.

Challenge:

Write a crawler which archives the HTML of candidate’s websites. Suggested skills: Webscraping

Team Members:

Len

Useful links:

https://democracyclub.org.uk/blog/2024/05/27/2024-general-election-data-and-resources-for-campaigners-journalists-and-researchers/

Recorded Progress:

We have planned a workflow and written code for the initial scrape. We need help to complete this code, specifically We also are open to better ways of archiving the websites and/or other data sources to archive (e.g. social media posts).

About


Languages

Language:HTML 100.0%Language:Python 0.0%