Scrapinghub (scrapinghub)

Scrapinghub

scrapinghub

Geek Repo

Turn web content into useful data

Location:The Internet

Home Page:https://scrapinghub.com

Twitter:@Scrapinghub

Github PK Tool:Github PK Tool

Scrapinghub's repositories

mdr

A python library detect and extract listing data from HTML page.

wappalyzer-python

UNMAINTAINED Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)

page_clustering

A simple algorithm for clustering web pages, suitable for crawlers

Language:HTMLLicense:BSD-3-ClauseStargazers:35Issues:0Issues:0

scrapylib

Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)

docker-devpi

pypi caching service using devpi and docker

kafka-scanner

High Level Kafka Scanner

Language:PythonLicense:BSD-3-ClauseStargazers:19Issues:113Issues:1

shubc

Go bindings for Scrapinghub HTTP API and a sweet command line tool for Scrapy Cloud

luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Language:PythonLicense:Apache-2.0Stargazers:4Issues:79Issues:0

shub-image

Deprecated client side tool to prepare docker images to run crawlers in Scrapinghub - please use shub>=2.5.0 instead

Language:PythonLicense:BSD-3-ClauseStargazers:4Issues:0Issues:0

custom-images-examples

Examples of custom images running on Scrapinghub platform

Language:ErlangLicense:MITStargazers:3Issues:12Issues:0

hubstorage-frontera

Hubstorage crawl frontier backend for Frontera

Language:PythonLicense:BSD-3-ClauseStargazers:3Issues:0Issues:0

xpathcsstutorial

[Work in progress] XPath & CSS for web scraping tutorial

Language:Jupyter NotebookStargazers:3Issues:7Issues:0

Zappa

Serverless Python Web Services

Language:PythonLicense:MITStargazers:3Issues:0Issues:0

pymesos

A pure python implementation of Mesos scheduler and executor

Language:PythonLicense:NOASSERTIONStargazers:2Issues:0Issues:0

scrapinghub-conda-recipes

Conda packages for scrapinghub channel

Language:ShellStargazers:2Issues:0Issues:0

docker-kibana

Balsamiq kibana webapp docker container

Language:ShellLicense:MITStargazers:1Issues:19Issues:0

otp

Erlang/OTP

Language:ErlangStargazers:1Issues:0Issues:0

scrapinghub-stack-hworker

[DEPRECATED] Software stack fully compatible with Scrapy Cloud 1.0

Language:PythonLicense:BSD-3-ClauseStargazers:1Issues:15Issues:1

spark

Mirror of Apache Spark

Language:ScalaLicense:Apache-2.0Stargazers:1Issues:3Issues:0

confd

Manage local application configuration files using templates and data from etcd or consul

Language:GoLicense:MITStargazers:0Issues:0Issues:0

confluent-kafka-python

Confluent's Apache Kafka Python client

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

docker-custodian

Keep docker hosts tidy

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

happybase

A developer-friendly Python library to interact with Apache HBase

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

kafka

Mirror of Apache Kafka

Language:JavaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

newrelic-python-agent

Mirror of the New Relic Python agent source

Language:PythonStargazers:0Issues:4Issues:0
Language:ShellLicense:GPL-3.0Stargazers:0Issues:78Issues:0

scrapinghub-image-casperjs

Recommended base Docker image for CasperJS spiders at Scrapinghub

Language:PythonStargazers:0Issues:0Issues:0