Bin Wang's repositories

hadoop_raspberrypi

setting up hadoop on raspberry pi

Language:ShellStargazers:1Issues:2Issues:0

docker-selenium-hub

docker image for selenium server with headless firefox

Language:ShellStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

docker_scrapy

a scrapy template with bare minimum effort to be able to get the html of a list of urls

Language:PythonStargazers:0Issues:0Issues:0

getout

this is a python library to extract outlinks for a given URL

Language:PythonStargazers:0Issues:0Issues:0

namemapping

A name mapping library by Dan and Bin to cluster company names using Yahoo Boss API

Language:PythonStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

nutch-selenium-grid-plugin

A Nutch 2.2.1 plugin which allows users to shuffle off the responsibility for retrieving pages to a selenium hub/node spoke system. This allows Nutch to rely on Selenium/Firefox to fetch and load javascript/content; while keeping Nutch in charge of what it does best: crawling and further parsing.

Stargazers:0Issues:0Issues:0

rgetout

A R package to get all the outlinks for a given URL

Language:RStargazers:0Issues:0Issues:0