nukchuk / Ebay-kleinanzeigen-scrapy-elastic

An Ebay-kleinanzeigen Web scraper using Python and Scrapy to fetch data into an ElasticSearch cluster with Kibana

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

An Ebay-kleinanzeigen Web scraper using Python and Scrapy to fetch data into an ElasticSearch cluster with Kibana

The aim here is to extract data from https://www.ebay-kleinanzeigen.de/ automatically and rapidly in order to store them into an ElasticSearch cluster and get fast insights with Kibana.

Requirements

Python 3
Elasticsearchb 7.0.2
Scrapy 1.6.0

How to set it :

  1. Start your ElasticSearch cluster with Kibana installed on it. If you don't have it, a fast way to get it could be to install and use a Docker image with the following steps :
git clone https://github.com/deviantony/docker-elk.git
cd /docker-elk
docker-compose up -d
  1. Set the URLs (like https://www.ebay-kleinanzeigen.de/s-berlin/l3331 for example) you want to scrape in JSON file : start_urls.json
  2. Set the various configuration parameters you wish :
{
  "protocol": "http or https",
  "elastic_username": "the username to connect on your ElasticSearch cluster",
  "elastic_password": "the needed password to connect on your ElasticSearch cluster",
  "elastic_address": "the binded ip address of your ElasticSearch cluster",
  "elastic_port": "the binded port of your ElasticSearch cluster",
  "elastic_index_name": "the index name of your ElasticSearch cluster",
  "elastic_connection_retry": "the number of tries to reconnect on your ElasticSearch in case of failure",
  "scrape_next_pages": "boolean to indicate if the web scraper check the next pages (1,2,3...) displayed at the bottom of page."
}

The default login and server parameters of the ElasticSearch Docker images are entered.

  1. Change your current directory to the Scraper's one and start it through :
cd .../ebaykleinanzeigen
scrapy crawl ebay_kleinanzeigen
  1. The results are automatically updated into ElasticSearch and Kibana as soon as the data are being scraped. Just enjoy the insights by connecting on your Kibana home page (by default in the Docker image : http://localhost:5601) !

NB: The number of concurrent requests and time between has been defined in settings.py respectively to 20 and 0.8 by default in order to avoid problems on Ebay-kleinanzeigen's server.

References

-Ebay-kleinanzeigen
-Elasticsearch-py

Credits

Copyright (c) 2019, HicBoux. Work released under Apache 2.0 License.

(Please contact me if you wish to use my work in specific conditions not allowed automatically by the Apache 2.0 License.)

Disclaimer

This solution has been made available for informational and educational purposes only. I hereby disclaim any and all liability to any party for any direct, indirect, implied, punitive, special, incidental or other consequential damages arising directly or indirectly from any use of this content, which is provided as is, and without warranties. I also disclaim all responsibility for web scraping at a disruptive rate and eventual damages caused by a such use.

About

An Ebay-kleinanzeigen Web scraper using Python and Scrapy to fetch data into an ElasticSearch cluster with Kibana

License:Apache License 2.0


Languages

Language:Python 100.0%