raphapassini / pyjobs

Its a crawler with the goal of extract offers of python jobs from websites, mostly Brazilian websites.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pyjobs

Its a crawler with the goal of extract offers of python jobs from websites, mostly Brazilian websites.

How to install

  1. Check if you have libxml2-dev, libffi-dev, libssl-dev libxml2-dev libxslt-dev and mongodb, if you doesn't install it:

sudo apt-get install libxml2-dev libffi-dev libssl-dev libxml2-dev libxslt-dev mongodb

  1. Install project requirements

pip install -r requirements.txt

Please, be kind with yourself and install it in an virtualenv! :)

How to run it

scrapy crawl ceviu scrapy crawl catho scrapy crawl vagas scrapy crawl empregos

ROADMAP

[x] - Iterate over CEVIU search pages

[x] - Store items in database, preferably a NoSQL database such as MongoDB

[x] - Implement Catho.com.br spider

[x] - Implement Empregos.com.br spider

[x] - Implement Vagas.com.br spider

[] - Build an web interface to search for jobs

About

Its a crawler with the goal of extract offers of python jobs from websites, mostly Brazilian websites.

License:MIT License


Languages

Language:JavaScript 78.4%Language:Python 16.6%Language:HTML 4.5%Language:CSS 0.5%