benfoxall / scrape

Git Scraping Hacker News

Home Page:http://benjaminbenben.com/scrape/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Recent News

This pulls the HN front page to hacker-news.html and uses git log/show to access a history of changes.

See git scraping & Flat Data for more info about the approach.

Updating the data

export TARGET="hacker-news.html"

curl https://news.ycombinator.com > $TARGET
git add $TARGET
git commit -m ":robot: scraped to $TARGET"

This is run automatically by .github/workflows/scrape.yml

Extracting file history

git log --pretty=format:"%H %at" -- "$TARGET" | while read commit timestr
do
    git show "$commit:$TARGET" > tmp_${timestr}_${commit}.html
done

About

Git Scraping Hacker News

http://benjaminbenben.com/scrape/

License:MIT License


Languages

Language:HTML 75.0%Language:JavaScript 21.8%Language:CSS 3.2%