gmanroney / scrapyCraigslistMongo

Uses scrapy to read section of craigslist site and store results in a mongo database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This code written in python uses scrapy to read section of craigslist site and store results in a mongo database

This code is an amalgamation of code and concepts described in a number of webpages

https://www.analyticsvidhya.com/blog/2017/07/web-scraping-in-python-using-scrapy/

This code uses craigslist as the source and ammends records to the MongoDB collection each time it is run; in later versions the record will be checked to avoid occurence of duplicate records in the database

The code needs python to run; the specific requirements can be found in the file settings.txt

The code also needs the latest version of scrapy (1.4.0) and mongo (3.4.7) installed; for this implementation no username or password should be set in mongo. Again, something for improvement in the future

Once the above pre-requisites are met the scraper can be run using the command: scrapy crawl jobsIE

About

Uses scrapy to read section of craigslist site and store results in a mongo database

License:Apache License 2.0


Languages

Language:Python 100.0%