Beautyscraper
Scraping top 10 beauty e-commerce sites.
How do I get set up?
pip install requirements.txt
scrapy crawl <site_name>
Example of scraping:
scrapy crawl sephora -o sephora.json
View Sample Data
- Sample Data for this can be found on all_scraped_data folders.
List of sites to crawl
- maccosmetics.com
- beautybay.com
- cultbeauty.co.uk
- sephora.com
- maybelline.co.uk
- selfridges.com
- Polyvore.com
- net-a-porter.com
- shopstyle.co.uk
- beautylish.com
Libraries used:
The project runs on Python 2.7.
1. Scrapy
2. Pillow for saving images.
3. scrapy-fake-useragent for rotating browser headers.