Scrapinghub's repositories
wappalyzer-python
UNMAINTAINED Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)
page_clustering
A simple algorithm for clustering web pages, suitable for crawlers
docker-devpi
pypi caching service using devpi and docker
kafka-scanner
High Level Kafka Scanner
shub-image
Deprecated client side tool to prepare docker images to run crawlers in Scrapinghub - please use shub>=2.5.0 instead
custom-images-examples
Examples of custom images running on Scrapinghub platform
hubstorage-frontera
Hubstorage crawl frontier backend for Frontera
xpathcsstutorial
[Work in progress] XPath & CSS for web scraping tutorial
scrapinghub-conda-recipes
Conda packages for scrapinghub channel
docker-kibana
Balsamiq kibana webapp docker container
scrapinghub-stack-hworker
[DEPRECATED] Software stack fully compatible with Scrapy Cloud 1.0
confd
Manage local application configuration files using templates and data from etcd or consul
confluent-kafka-python
Confluent's Apache Kafka Python client
docker-custodian
Keep docker hosts tidy
drone-0.3-build-images
[DEPRECATED]
happybase
A developer-friendly Python library to interact with Apache HBase
kafka
Mirror of Apache Kafka
mrjob
Run MapReduce jobs on Hadoop or Amazon Web Services
newrelic-python-agent
Mirror of the New Relic Python agent source
scrapinghub-image-casperjs
Recommended base Docker image for CasperJS spiders at Scrapinghub