njoubert / motorcycle-specs-scraper

Data retriever, parser, and cleaner to extract motorcyclespecs.co.za to a spreadsheet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Motorcycle Specs Scraper

I love the data on the website motorcyclespecs.co.za from my South African compatriots. It's my goto place to find motorcycle info, even if its a little clunky and sometimes wrong.

I also love spreadsheets. Google Sheets is the most useful cognitive tool I know of. I'd love to have a giant spreadsheet with specs of all the motorcycles I'm interested in.

This project scrapes data from motorcyclespecs.co.za into a nice spreadsheet format.

Sources of Motorcycle Data

There's more than just motorcyclespecs.co.za. Here's a few useful websites. They don't always agree, which is just infuriating:

Useful Resources

Python Code Layout and Structure:

Code Organization ideas:

Need to use some sort of virtual environment. Dropping everything into global surely isnt the best idea.

TODO

  • For each page, record the transformation applied from raw data to cleaned data, as a way to log all the errors I find to generate a report for the site authors.

About

Data retriever, parser, and cleaner to extract motorcyclespecs.co.za to a spreadsheet


Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%