-
Install Vagrant
-
Clone our repository
https://github.com/bagiks/property_crawler
cd property_crawler
- Install Berkshelf
- Download ChefSDK
https://downloads.chef.io/chef-dk/
- Install berkshelf plugin
vagrant plugin install vagrant-berkshelf
-
If there is no
cookbooks/dev
director insite the folder.mkdir -p cookbooks/dev cd cookbooks berks cookbook dev
Add cookbook to
cookbooks/Dev/Berksfile
source "https://supermarket.chef.io" metadata cookbook 'poise-python', '~> 1.4.0' cookbook 'vim', '~> 2.0.1' cookbook 'mongodb', '~> 0.16.2' cookbook 'apt', '~> 4.0.0'
- Run vagrant
vagrant up
- ssh to virtual machine
vagrant ssh
- go to project folder at
cd /vagrant_data
.
Scrapy / python / mongodb are installed. Enjoy !
- Run demo on commandline
scrapy crawl Flats-Property-Crawler
- Run with Pycharm
-
"Vagrant up " in
Tools -> Vagrant -> ...
-
Go to
property_crawler/spiders/9flats.py
, uncomment this block
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'
})
process.crawl(FlatsPropertyCrawlSpider)
process.start()
- Run 9flats.py
-
Install Anaconda, python 2.7
-
Creat a new enviroment
scraping
with scrapy package
conda create --name scraping Scrapy
- Activate / deactivate the enviroment
source actiavate scraping
source deactivate
- Clone
property_crawler
repository
https://github.com/bagiks/property_crawler
cd property_crawler
- Install python package
pip install -r requirements.txt
- Install mongodb and start service
- Create db
bagiks
, collectionproperty
- Run ``` scrapy crawl Flats-Property-Crawler``
- Dev with Pycharm
- Change Python Interpreter to
scraping
enviroment [https://www.jetbrains.com/help/pycharm/2016.1/configuring-python-interpreter-for-a-project.html] - Go to
property_crawler/spiders/9flats.py
, uncomment this block
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'
})
process.crawl(FlatsPropertyCrawlSpider)
process.start()
- Run 9flats.py