sreejeet / one-line-scraper

A web scraper in a single line of shell commands.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

one-line-scraper

A web scraper in a single line of shell commands.

No, It's not a full fledged web scraper.

There is a lot more to a scraper. This is just a simple html response processor for a specific website.

Why?

I like complex text processing and this is a stepping stone. Plus, it's cool to be able to build something that require little to no overhead or environment setup. This script can be run on a fresh installation of pretty much any flavour of linux.

What programs are used here?

curl and awk. Thats all!

How can this be made better?

Adding exception handling (blocking due to too many requests, etc). Adding it as a cron job. Appending to existing data. Removing anything older than the last n number of lines. Removing duplicates wihout changing order (sort-u changes order). But then it probably won't stay in a single line, or would it?

Can i contribute?

Yes you can! You are free to add new features/improve existing code.

About

A web scraper in a single line of shell commands.

License:MIT License


Languages

Language:Shell 100.0%