Awesome Data Preservation
A curated list of data preservation resources, inspired by awesome-deep-vision and awesome-computer-vision.
Maintainers - Arun kumar
Contributing
Please feel free to pull requests to add resources.
Table of Contents
PapersReports
-
Digital Preservation Handbook, 2nd Edition. PDF
-
Ensuring the Longevity of Digital Documents. Author: Rothenberg, Jeff.
Published @ January 1995 edition of Scientific American (Vol. 272, Number 1, pp. 42-7)
PDF -
Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation. Author: Rothenberg, Jeff
Published report as ISBN 1-887334-63-7 in 1999
PDF -
Long Term Preservation of Digital Information. Author: Raymond A. Lorie, IBM Almaden Research Center
Published @ JCDL '01 Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries. Pages 346-352
PDF -
UVC: A Universal Virtual Computer for Long-term Preservation of Digital Information. Authors: Raymond A. Lorie; Raymond J. van Diessen
Published as IBM Researc Report: RJ10338 in 2005
PDF -
Permanent Web Publishing - LOCKSS (Lots Of Copies Keep Stuff Safe). Authors: David S.H. Rosenthal and Vicky Reich
In Proceedings of the FREENIX Track: 2000 USENIX Annual Technical Conference. June 18-23, 2000, San Diego, California. pp. 129-140
PDF -
Emulation for Digital Preservation in Practice: The Results. Authors: Jeffrey van der Hoeven and Bram Lohman
In Proceedings of the International Journal of Digital Curation. Dec 2007
PDF -
REFERENCE MODEL FOR AN OPEN ARCHIVAL INFORMATION SYSTEM (OAIS).
The Consultative Committee for Space Data Systems (CCSDS) RECOMMENDED PRACTICE, MAGENTA BOOK, June 2012
PDF
Conferences
-
International Conference on Theory and Practice of Digital Libraries (TPDL)
-
International Council for Science : Committee on Data for Science and Technology (CODATA)
ActiveCommunities
Books
SoftwareTools
Archivematica
Archivematica is a web- and standards-based, open-source application which allows your institution to preserve long-term access to trustworthy, authentic and reliable digital content more...
POWRR
Preserving digital Objects With Restricted Resources (Digital POWRR) Project has endeavored to make digital preservation more accessible to a wider range of professionals more...
Format Identification for Digital Objects (fido)
Format Identification for Digital Objects (fido). fido is a command-line tool to identify the file formats of digital objects. It is designed for simple integration into automated work-flows more...
JHOVE
JHOVE (pronounced “jove”), the JSTOR/Harvard Object Validation Environment, is an extensible software framework for performing format identification, validation, and characterisation of digital objects. The OPF have assumed active responsibility for the project and software creating a permanent, sustainable home for JHOVE more...
Jpylyzer
About Jpylyzer is a JP2 (JPEG 2000 Part 1) image validator and properties extractor. Its development was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137) more...
xcorrSound
Improve Your Digital Audio Recordings What is xcorrSound? xcorrSound consists of four tools: overlap-analysis detects overlap in two audio files waveform-compare compares two audio files and outputs the similarity sound-match detects occurrences of a smaller audio file (e.g. a jingle) within a larger audio file or an index of audio files sound-index builds an index more...
DROID
DROID is a software tool developed by The National Archives to perform automated batch identification of file formats. Developed by its Digital Preservation Department as part of its broader digital preservation activities, DROID is designed to meet the fundamental requirement of any digital repository to be able to identify the precise format of all stored digital objects, and to link that identification to a central registry of technical information about that format and its dependencies more...
CSV Validator
A Validation Tool and APIs for validating CSV (Comma Separated Value) files by using CSV Schema more...
UTF-8 Validator
A UTF-8 Validation Tool which may be used as either a command line tool or as a library embedded in your own program more...
format-corpus
An openly-licensed corpus of small example files, covering a wide range of formats and creation tools more...
Bitwiser
This is a small suite of tools used to perform bitwise analysis of data and processes related to digital preservation more...