DataHenHQ

DataHenHQ

Geek Repo

DataHen provides services and platform for scalable web scraping, data processing & ETL

Home Page:https://www.datahen.com

Github PK Tool:Github PK Tool

DataHenHQ's repositories

till

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

Language:GoLicense:Apache-2.0Stargazers:813Issues:6Issues:6

useragent

DataHen useragent tool is a Golang package and standalone tool that generates a random combination of millions of user-agents strings. Currently used in production at DataHen to crawl/scrape through billions of pages.

Language:GoLicense:Apache-2.0Stargazers:10Issues:3Issues:0

datahen-ruby

Datahen Client for Ruby

Language:RubyLicense:MITStargazers:2Issues:2Issues:0

datahen-python

DataHen Python Library

Language:PythonLicense:MITStargazers:1Issues:4Issues:0

henqa

HenQA is a standalone tool for validating massive amounts of data using the JSON schema.

Language:GoLicense:Apache-2.0Stargazers:1Issues:3Issues:0

license

license package signs and verifies responses based on public and private key and timestamp

Language:GoLicense:Apache-2.0Stargazers:1Issues:3Issues:0

afero

A FileSystem Abstraction System for Go

Language:GoLicense:Apache-2.0Stargazers:0Issues:2Issues:0

cookie_store

An implementation of RFC6265

Language:RustLicense:Apache-2.0Stargazers:0Issues:1Issues:0

dh_easy-qa

QA library that runs on Fetch

Language:RubyLicense:MITStargazers:0Issues:2Issues:0

go-envparse

Minimal environment variable parser for Go

Language:GoLicense:MPL-2.0Stargazers:0Issues:2Issues:0
Language:GoLicense:Apache-2.0Stargazers:0Issues:2Issues:0

henqa_shared

HenQA shared components

Language:GoStargazers:0Issues:2Issues:0

proxy_benchmark

Proxy benchmark script

Language:RubyStargazers:0Issues:0Issues:0
Language:RubyStargazers:0Issues:3Issues:0

ujson

ujson package does marshalling like json but without escaping html

Language:GoLicense:Apache-2.0Stargazers:0Issues:3Issues:0

datahen-api-doc

DataHen API Documentation

Language:HTMLStargazers:0Issues:3Issues:0
License:MITStargazers:0Issues:3Issues:0

dh_easy-core

Datahen Easy Core Toolkit

Language:RubyLicense:MITStargazers:0Issues:2Issues:0

dh_easy-router

Datahen Router Core Toolkit

Language:RubyLicense:MITStargazers:0Issues:2Issues:0

docker-pgbouncer

Minimal PgBouncer image that is easy to configure

Language:ShellLicense:MITStargazers:0Issues:2Issues:0
Language:HTMLStargazers:0Issues:2Issues:0

gid

gid package is a golang package that is used to generate globally unique IDs (GID) for web pages (HTTP requests). Useful for troubleshooting web scrapers, and reusing web page caches.

License:Apache-2.0Stargazers:0Issues:3Issues:0

hudsucker

Intercepting HTTP/S proxy

Language:RustLicense:Apache-2.0Stargazers:0Issues:1Issues:0

imgix-rails

A Rails gem for integrating imgix into Rails projects

Language:RubyLicense:BSD-2-ClauseStargazers:0Issues:2Issues:0

lightning-fs

A lean and fast 'fs' for the browser

Language:JavaScriptLicense:MITStargazers:0Issues:2Issues:0

reqwest-actix-stream

A Stream to link between Reqwest and Actix-web two systems.

Language:RustLicense:BSD-2-ClauseStargazers:0Issues:1Issues:0
Language:RustLicense:Apache-2.0Stargazers:0Issues:0Issues:0

test-scraper

Test Scraper

Language:RubyStargazers:0Issues:3Issues:0

useragentr

a Rust library that generates a random combination of millions of user-agents strings.

Language:RustLicense:Apache-2.0Stargazers:0Issues:2Issues:0

website-crawler

Crawls a web site

Language:RubyLicense:MITStargazers:0Issues:3Issues:0