Carlist.my Web Scraping with scrapy

1.0 Project Background

Carlist.my is website for used cars listing for sale in Malaysia.

This project uses scrapy to extract the web car listing data from carlist.my

pip install scrapy

pip lxml

The Chrome Extension

xpath helper

Remove the column that is not relevant like 'type', 'position', 'item_type', 'item_additionalType', 'item_url', 'item_image', 'item_offers_type', 'item_offers_priceCurrency', 'item_offers_itemCondition', 'item_offers_seller_url', etc.
Extract the car model year and engine capacity (cc) from the 'item_name' column by using regular expression (RegEx).

5.1 Toyota

The top listing model: Vios

The top listing model year: 2014

The top listing body type: Sedan then followed by MPV

5.2 Peroduo

The top listing model: Myvi

The top listing model year: 2015

The top listing body type: Hatchback

scrap carlist

Language:Jupyter Notebook 97.7%Language:Python 2.3%