Scrapinghub (scrapinghub)

Scrapinghub

scrapinghub

Geek Repo

Turn web content into useful data

Location:The Internet

Home Page:https://scrapinghub.com

Twitter:@Scrapinghub

Github PK Tool:Github PK Tool

Scrapinghub's repositories

portia

Visual scraping for Scrapy

Language:PythonLicense:BSD-3-ClauseStargazers:9227Issues:503Issues:451

splash

Lightweight, scriptable browser as a service with an HTTP API

Language:PythonLicense:BSD-3-ClauseStargazers:4049Issues:213Issues:856

extruct

Extract embedded metadata from HTML markup

Language:PythonLicense:BSD-3-ClauseStargazers:832Issues:114Issues:97

scrapyrt

HTTP API for Scrapy spiders

Language:PythonLicense:BSD-3-ClauseStargazers:826Issues:43Issues:95

python-crfsuite

A python binding for crfsuite

Language:PythonLicense:MITStargazers:768Issues:32Issues:98

spidermon

Scrapy Extension for monitoring spiders execution.

Language:PythonLicense:BSD-3-ClauseStargazers:524Issues:76Issues:168

scrapy-poet

Page Object pattern for Scrapy

Language:PythonLicense:BSD-3-ClauseStargazers:118Issues:13Issues:40

web-poet

Web scraping Page Objects core library

Language:PythonLicense:BSD-3-ClauseStargazers:93Issues:9Issues:37

scrapinghub-stack-scrapy

Software stack with latest Scrapy and updated deps

Language:DockerfileLicense:BSD-3-ClauseStargazers:61Issues:23Issues:14

aduana

Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).

Language:CLicense:BSD-3-ClauseStargazers:54Issues:118Issues:19

exporters

Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations

Language:PythonLicense:BSD-3-ClauseStargazers:40Issues:101Issues:20

scrapinghub-entrypoint-scrapy

Scrapy entrypoint for Scrapinghub job runner

Language:PythonLicense:BSD-3-ClauseStargazers:25Issues:25Issues:14

andi

Library for annotation-based dependency injection

Language:PythonLicense:BSD-3-ClauseStargazers:20Issues:9Issues:3

shublang

Pluggable DSL that uses pipes to perform a series of linear transformations to extract data

Language:PythonLicense:BSD-3-ClauseStargazers:15Issues:80Issues:37
Language:PythonLicense:BSD-3-ClauseStargazers:13Issues:18Issues:8

autologin

A project to attempt to automatically login to a website given a single seed

Language:PythonLicense:Apache-2.0Stargazers:9Issues:5Issues:0
Language:PythonStargazers:8Issues:119Issues:0

hcf-backend

Crawl Frontier HCF backend

Language:PythonLicense:BSD-3-ClauseStargazers:7Issues:84Issues:9

scrapy-monkeylearn

A Scrapy pipeline to categorize items using MonkeyLearn

Language:PythonStargazers:7Issues:7Issues:0

varanus

A command line spider monitoring tool

Formasaurus

Formasaurus tells you the type of an HTML form and its fields using machine learning

Language:HTMLStargazers:5Issues:0Issues:0

luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Language:PythonLicense:Apache-2.0Stargazers:4Issues:75Issues:0

webstruct-demo

HTTP demo for https://github.com/scrapinghub/webstruct

Language:PythonLicense:MITStargazers:4Issues:4Issues:0

pgcontents

A Postgres-backed ContentsManager implementation for IPython

Language:PythonLicense:Apache-2.0Stargazers:2Issues:3Issues:0

docker-registry

Registry server for Docker (hosting/delivering of repositories and images)

Language:PythonLicense:Apache-2.0Stargazers:1Issues:2Issues:0

python-intercom

Python wrapper for the Intercom API.

Language:PythonLicense:NOASSERTIONStargazers:1Issues:3Issues:0

sklearn-crfsuite

scikit-learn inspired API for CRFsuite

Language:PythonStargazers:1Issues:0Issues:0

docker-custodian

Keep docker hosts tidy

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services

Language:PythonLicense:NOASSERTIONStargazers:0Issues:6Issues:0

woodpecker

An opinionated fork of the Drone CI system

Language:GoLicense:NOASSERTIONStargazers:0Issues:0Issues:0