emmadickson / unwarcit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UNWARCIT: WARC (and WACZ) Unzipping Library

Background

This library provides a command line interface to unzip warc and wacz files.

Builds off of the warcio library to read and validate warc files and the py-wacz library to validate wacz files.

Both libraries are provided by Webrecorder

Setup

Install by cloning the repo and then running: python3 setup.py install

You can now run the tool like so: unwarcit metro_capture2.wacz data.warc --output myfolder

You can pass a single file or a list of files, either warc or wacz, separated by spaces to unwarcit by placing them after the unwarcit command. unwarcit warcfile1.warc warcfile2.warc waczfile.wacz

Configuration Options

Unwarcit currently accepts the following parameters:
      --help                                Show help                  [str]
      --version                             Show version number        [int]
      --output                              The folder to output the results to [str]

About

License:Apache License 2.0


Languages

Language:Python 100.0%