cloudtracer / WARCPinch

Create WARC files for use in signature creations (SHA1, SHA2, SSDEEP). Based off WARCreate.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WARCreate logo
WARCreate

"Create WARC files from any webpage"

TravisCI build status

WARCreate is a Google Chrome extension with an aim to be able to "Create WARC files from any webpage".

With WARCs normally being limited to be generated by Internet Archive's Heritrix Archival Crawler, providing another means of generating these files from webpages opens the door to

  • Preserving content not accessible to crawlers (e.g., deep web contents)
  • Circumventing the complication and overhead needed to setup a Heritrix instance by an end-user
  • Allowing a webpage to be interacted with (e.g., Facebook comments unrolled) prior to preservation, ensuring content that might not be initially present in a page is available to be captured.

...among many other use cases.

WARCreate is currently in active development though has gone through various release and retraction periods due to changes in the Google Chrome extension API and rules controlling extension distribution.

The original idea and prototype was published in the Joint Conference on Digital Libraries 2012 (JCDL '12) Proceedings.

Install

The latest stable binary can be downloaded from the Chrome Web Store.

Sample Usage

(TODO)

Contact

WARCreate is a project of the Web Science and Digital Libraries (WS-DL) research group at Old Dominion University (ODU), created by Mat Kelly.

For support e-mail warcreate@matkelly.com or tweet to us at @machawk1 and/or @WebSciDL.

License

GPLv3

About

Create WARC files for use in signature creations (SHA1, SHA2, SSDEEP). Based off WARCreate.

License:GNU Lesser General Public License v3.0


Languages

Language:JavaScript 87.2%Language:HTML 12.2%Language:CSS 0.6%