makemefriendanshu / bbripper

Project to scrape/rip certain content from the web

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bbripper

Project to scrape/rip certain content from the web

Requirements:

  • Jython
  • Java 7 JRE
  • mozilla firefox (current version)
  • scrapy 0.16 or later
  • sikuli 1.0.0 or later
  • ImageMagick + textcleaner
  • tesseract-ocr
  • pdfocr (modified) + option-modifier script

Hardware/OS requirements:

  • Linux initial support (ubuntu/unity), should work on Windows and Mac too
  • approx 200GB disk space (possibly more)
  • possible integration with VPS/cloud servers

Objective:

Running:

  • from ./sikuli_api/ ./sikuli-script -r ../workspace/bbripper/sikuli.sikuli

About

Project to scrape/rip certain content from the web


Languages

Language:Shell 64.0%Language:Python 28.3%Language:HTML 7.6%