fabiopiovam / realestate-scraper

A scraper that gathers data from real estate ads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Real Estate - Scraper

A scraper that gathers data from real estate ads.

Currently Suported Websites

Country Website
Brazil ZAP Imóveis

Installation

Requirements
Python 3.6
MongoDB

1. Clone this repository

Clone this repository using git and cd into the project folder:

git clone https://github.com/pauloromeira/realestate-scraper.git && \
cd realestate-scraper

2. Install Python requirements

Inside project folder, install python requirements using pip:

pip install -r requirements.txt

Usage

First, run MongoDB server:

mongod &

Then use the following command to start crawling:

scrapy crawl zap [-a url=<zapimoveis-url>] [-a start=n] [-a count=n] [-a seed=<seed>]

Curently, only ZAP Imóveis is suported

Arguments:

  • count: limits the number of pages the crawler will search for. The default is to crawl till the end.

  • start: start crawling from a given page. The default is 1.

  • url: website url to perform search.

  • seed: seed for the website search engine.

Examples

  • Default values - properties in Pernambuco, Brazil. Crawl all pages.

    scrapy crawl zap
    
  • Olinda-PE. Crawl the first 4 pages.

    scrapy crawl zap -a count=4 -a url="https://www.zapimoveis.com.br/venda/imoveis/pe+olinda/"
    
  • Rio de Janeiro-RJ - south zone. Starting at page 100, crawl till the end:

    scrapy crawl zap -a start=100 -a url="https://www.zapimoveis.com.br/venda/imoveis/agr+rj+rio-de-janeiro+zona-sul/"
    
  • All places. Starting from page 4, crawl 3 pages:

    scrapy crawl zap -a start=4 -a count=3 -a url="https://www.zapimoveis.com.br/venda/imoveis/"
    

About

A scraper that gathers data from real estate ads

License:MIT License


Languages

Language:Python 100.0%