dfdeshom / scrapy-kafka

Kafka-based components for Scrapy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scrapy-kafka

Kafka-based components for Scrapy. There are 2 components:

  • A custom Spider that waits for URLs to crawl via a Kafka topic. When there are no more messages to read for the topic, the Spider just stays idle.
  • A custom ItemPipeline component that stores a JSON-ified Item back into another Kafka topic.

Please see the example directory for how to use this.

Contributors

Contributors to scrapy-kafka, listed alphabetically:

About

Kafka-based components for Scrapy

License:Apache License 2.0


Languages

Language:Python 100.0%