yhslai / ssscs

Sixty Seconds Science Crawler in Scala

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ssscs

Sixty Seconds Sciences Crawler in Scala

Feature

It's a cralwer to download the awesome podcast 60-Second Science, including the transcripts and audios, into your local machine. If you like it can crawl 1000 episodes at once and output a single well-formatted PDF transcript.

PDF output file

Usage

If you have Scala and sbt, just clone this project and run it via sbt.

$ sbt "run --help"
  -c, --count  <arg>              How many podcasts to crawl (default = 100)
  -f, --format  <arg>             Output format. Support 'txt', 'pdf' and
                                  'single-pdf' (default = text)
  -p, --only-podcast              Only crawl podcasts(mp3)
  -t, --only-transcript           Only crawl transcripts
  -d, --output-directory  <arg>   Where to store crawled files (default = output)
  -u, --until  <arg>              Only crawl the podcasts older than this date.
                                  Example: 2013/05/12
      --help                      Show help message
      --version                   Show version of this program

Or, you can download the packaged jar.

$ java -jar ssscs.jar

License

Copyright (c) 2013-2013, Raincole Lai

Published under The MIT License, see LICENSE

About

Sixty Seconds Science Crawler in Scala

License:MIT License


Languages

Language:Scala 100.0%