trumpfheller / datacollection_car

scraping data from cargurus, carmax, craigslist, edmunds

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

datacollection_car

My goal is to do more data analysis. So far my database was based on existing data collections publicly available. Which comes with its pros and cons:

  • saving time
  • not finding what I really want

In an effort to see what is possible I started to flirt with the idea, to collect my own data.

scraping data from

  • cargurus,
  • carmax,
  • craigslist,
  • edmunds

using

  • python
  • selenium
  • request and beautifulsoup

Result:

  • CAPTCHA prevented the data collection
  • for craigslist, since I focused on the craigslist title which was not standarized, I had to use regular expressions to create at least two useful columns

Next Step:

  • have to learn how to circumvent mechanisms that prevent automated scraping OR
  • I go back to public available datasets.

About

scraping data from cargurus, carmax, craigslist, edmunds


Languages

Language:Python 100.0%