dracone / Archiver

Archives URLs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

For any url or a JSON with urls in one of the fields, this saves the html and a pdf of the page.

To use with multiple documents-
a = URLArchiver.new("multiple") (or "multifull" to also get full text)
a.multiarchive(json, "fieldname")
a.genOutput (to get the input JSON with the paths to the html and pdfs)

To use with a single document-
a = URLArchiver.new("single")
a.archiveone("url")
a.genOutput (if you want a JSON with the page text and paths)

About

Archives URLs

License:GNU General Public License v3.0


Languages

Language:Ruby 100.0%