okraskaj / twikiget

Get and archive TWiki pages in the WARC format

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

twikiget

https://readthedocs.org/projects/docs/badge/?version=latest

About

twikiget is a tool to download twiki pages and archive them in .warc format. It uses wget underneath and so it includes all its downloading features.

Features

  • download and archive specific TWiki page and all its attachments
  • create WARC files for long-term preservation purposes
  • save local cache for faster and periodic reprocessing
  • (planned) extract specific metadata from TWiki document markup according to configurable templates

Useful links

About

Get and archive TWiki pages in the WARC format

License:MIT License


Languages

Language:Python 94.2%Language:Shell 5.8%