agilecreativity / hn-scrapper

Collect the last 20 pages of Hacker News into one page

Home Page:https://github.com/agilecreativity/hn-scrapper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hn-scrapper

Clojars Project Dependencies Status

Get all of the latest links from Hacker News into a single page.

Installation and basic usage as CLI

Pre-requisites

Installation

# Clone this repository locally
mkdir -p ~/projects

git clone https://github.com/agilecreativity/hn-scrapper.git ~/projects/hn-scrapper

cd ~/projects/hn-scrapper

# Create the `~/bin` folder to hold the executable
mkdir -p ~/bin

# Generate the standalone using `lein bin`
lein bin

Usage

To see the help just type

~/bin/hn-scrapper

This should give you the help like

Extract the lastest Hacker News index to a single file

Usage: hn-scrapper [options]
  -p, --page-count PAGE-COUNT    20
  -o, --output-file OUTPUT-FILE  hacker-news.md
  -h, --help
Options:

--p PAGE-COUNT  the number of pages to be extracted default to 20
--o OUTPUT-FILE the output file name default to 'hacker-news.md'

Now get the list of all news from Hacker News

# Get only the first page from the site
~/bin/hn-scrapper --page-count 1 --output-file hacker-news-front-page.md

# Get all of the news (20 pages) using shorter option
~/bin/hn-scrapper -p 20 -o hacker-news-top-20-pages.md

Example Sessions and Outputs

Sample sessions

Sample Markdown Output

Sample Markdown Output view in Github's Gist

The actual result in Markdown format

Sample-markdown-output

Features idea

  • Export/print first level content of hackernews to PDFs or Epubs
  • Group the results in some ways (topics, keywords, link to YouTube?)
  • Persist the result to html pages and store the link just once!

Useful Links

License

Copyright © 2016 Burin Choomnuan

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

About

Collect the last 20 pages of Hacker News into one page

https://github.com/agilecreativity/hn-scrapper

License:Eclipse Public License 1.0


Languages

Language:Clojure 100.0%