0rca / clj-scraper

A web-scraper for your personal enjoyment

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

clj-scraper

A web-scraper for personal enjoyment and experiments with core/async. Supports two websites for your scraping pleasure.

Requirements

  1. Leiningen
  2. JDK >= 1.6

Building

$ lein uberjar

Usage

java -jar target/scraper-0.3.1-standalone.jar

Options

-c, --cache [dir]           cache files directory
-o, --output [dir]          downloaded images directory
-w, --workers [num]         number of download workers
-d, --debug                 display debug info
-s, --source [ngo|vrotmne]  handle of website to scrape
-S, --skip [num]            skip first num posts of LJ
-L, --list-only             save image urls, but don't download
-x, --exit-on-exist         exit the process if downloaded file exists
-h, --help                  print this help

Examples

$ java jar target/scraper-0.3.1-standalone.jar -w 20 -s ngo

License

Copyright © 2013 FIXME

Distributed under the Eclipse Public License, the same as Clojure.

About

A web-scraper for your personal enjoyment


Languages

Language:Clojure 99.8%Language:Shell 0.2%