Web-Scraping using Scrapy

Pre-requisites: pip install scrapy

Remember before scraping:

There is a robot.txt file for each website for what they allow. Check that out before you scrape so that you don't scrape endpoints which are not allowed.
If there is any API available for getting the same info. Use that instead of scraping.

Performed Activities:

Learnt using scrapy shell scrapy shell
- Fetch Command fetch(<put_your_scraping_url_here/endpoint>)
- The Crawler returned response can be viewed using. view(response) This will open the raw HTML in the default browser.
- Print received response print(response.text)
- Extracting element using css selector response.css(".value::text").extract()
- Using XPath to get the elements. response.xpath("//div").extract()
Creating a Scrapy project and custom Spider
- scrapy startproject aliexpress
- The command to create a spider scrapy genspider aliexpress_tablets <url>

References:

About

Web Scraping using Scrapy

Language:Python 100.0%