datatogether / cdxj

Golang package implementing the CDXJ file format used by OpenWayback 3.0.0+ to index web archive contents

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CDXJ

GitHub Slack GoDoc License

Golang package implementing the CDXJ file format used by OpenWayback 3.0.0 (and later) to index web archive contents (notably in WARC and ARC files) and make them searchable via a resource resolution service. The format builds on the CDX file format originally developed by the Internet Archive for the indexing behind the WaybackMachine. This specification builds on it by simplifying the primary fields while adding a flexible JSON 'block' to each record, allowing high flexiblity in the inclusion of additional data.

License & Copyright

Copyright (C) 2017 Data Together
This program is free software: you can redistribute it and/or modify it under the terms of the GNU AFFERO General Public License as published by the Free Software Foundation, version 3.0.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the LICENSE file for details.

Getting Involved

We would love involvement from more people! If you notice any errors or would like to submit changes, please see our Contributing Guidelines.

We use GitHub issues for tracking bugs and feature requests and Pull Requests (PRs) for submitting changes

Installation

Use in any golang package with:

import "github.com/datatogether/cdxj"

Development

Coming Soon

About

Golang package implementing the CDXJ file format used by OpenWayback 3.0.0+ to index web archive contents

License:GNU Affero General Public License v3.0


Languages

Language:Go 100.0%