TeamHG-Memex's repositories
scrapy-rotating-proxies
use multiple proxies with Scrapy
tensorboard_logger
Log TensorBoard events without touching TensorFlow
sklearn-crfsuite
scikit-learn inspired API for CRFsuite
Formasaurus
Formasaurus tells you the type of an HTML form and its fields using machine learning
page-compare
Simple heuristic for measuring web page similarity (& data set)
undercrawler
A generic crawler
scrapy-crawl-once
Scrapy middleware which allows to crawl only new content
autologin-middleware
Scrapy middleware for the autologin
json-lines
Read JSON lines (jl) files, including gzipped and broken
scrapy-kafka-export
Scrapy extension which writes crawled items to Kafka
sitehound-frontend
Site Hound (previously THH) is a Domain Discovery Tool
domain-discovery-crawler
Broad crawler for domain discovery
url-summary
Show summary of a large number of URLs in a Jupyter Notebook
docker-tor-rotator
A rotating socks proxy using Tor, Delegate and Haproxy
hh-page-classifier
Headless Horseman Page Classifier service
scrapy-cdr
Item definition and utils for storing items in CDR format for scrapy
scrash-lua-examples
A collection of example LUA scripts and JS utilities
sitehound-backend
Sitehound's backend
sshadduser
A simple tool to add a new user with OpenSSH keys.