Scrapinghub (scrapinghub)

Scrapinghub

scrapinghub

Geek Repo

Turn web content into useful data

Location:The Internet

Home Page:https://scrapinghub.com

Twitter:@Scrapinghub

Github PK Tool:Github PK Tool

Scrapinghub's repositories

article-extraction-benchmark

Article extraction benchmark: dataset and evaluation scripts

Language:PythonLicense:MITStargazers:255Issues:20Issues:2

webstruct

NER toolkit for HTML data

js2xml

Convert Javascript code to an XML document

Language:PythonLicense:MITStargazers:185Issues:10Issues:15

sample-projects

Sample projects showcasing Scrapinghub tech

scrapy-autoextract

Zyte Automatic Extraction integration for Scrapy

Language:PythonLicense:BSD-3-ClauseStargazers:55Issues:10Issues:5

scrapy-autounit

Automatic unit test generation for Scrapy.

Language:PythonLicense:BSD-3-ClauseStargazers:55Issues:10Issues:52
Language:PythonLicense:BSD-3-ClauseStargazers:49Issues:11Issues:9

exporters

Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations

Language:PythonLicense:BSD-3-ClauseStargazers:40Issues:103Issues:20

autoextract-spiders

Pre-built Scrapy spiders for AutoExtract

Language:PythonLicense:BSD-3-ClauseStargazers:20Issues:22Issues:5
Language:Jupyter NotebookLicense:MITStargazers:16Issues:0Issues:0

autopager

Detect and classify pagination links

Language:HTMLStargazers:15Issues:5Issues:0

shublang

Pluggable DSL that uses pipes to perform a series of linear transformations to extract data

Language:PythonLicense:BSD-3-ClauseStargazers:15Issues:85Issues:37

varanus

A command line spider monitoring tool

autoextract-poet

web-poet definitions for AutoExtract

Language:PythonLicense:BSD-3-ClauseStargazers:6Issues:6Issues:2

collection-scanner

HubStorage collection scanner library

Language:PythonLicense:BSD-3-ClauseStargazers:5Issues:113Issues:0

hadoop-jmx-exporter

HDFS & YARN jmx metrics prometheus exporter

Stargazers:5Issues:0Issues:0
Language:Jupyter NotebookStargazers:4Issues:20Issues:0
Language:PythonLicense:BSD-3-ClauseStargazers:3Issues:0Issues:0

pgcontents

A Postgres-backed ContentsManager implementation for IPython

Language:PythonLicense:Apache-2.0Stargazers:2Issues:3Issues:0

baseimage-docker

A minimal Ubuntu base image modified for Docker-friendliness

Language:ShellLicense:MITStargazers:1Issues:0Issues:0

dockerfiles-stunnel

secure services with stunnel

Language:ShellLicense:MITStargazers:1Issues:0Issues:0

kafka-consumer-group-exporter

Prometheus Kafka Consumer Group Exporter

Language:PythonLicense:MITStargazers:1Issues:0Issues:0

docker-erlang-otp

the Official Erlang OTP image on Docker Hub

Language:DockerfileLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:ErlangStargazers:0Issues:12Issues:0

hbase-thirdparty

Mirror of Apache HBase Third Party Libs

License:Apache-2.0Stargazers:0Issues:0Issues:0

jira

Python JIRA Library is the easiest way to automate JIRA. Support for py27 was dropped on 2019-10-14, do not raise bugs related to it.

Language:PythonLicense:BSD-2-ClauseStargazers:0Issues:1Issues:0
Language:ShellLicense:Apache-2.0Stargazers:0Issues:13Issues:0

mochiweb

MochiWeb is an Erlang library for building lightweight HTTP servers.

Language:ErlangLicense:NOASSERTIONStargazers:0Issues:88Issues:0
Language:JavaScriptStargazers:0Issues:0Issues:0

woodpecker

An opinionated fork of the Drone CI system

Language:GoLicense:NOASSERTIONStargazers:0Issues:0Issues:0