dougnukem / vgp-tools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The One-Code Data Framework

One-Code is a data representation framework with a growing collection of associated software, initially designed in the context of the Vertebrate Genomes Project (VGP) to operate on all the forms of data involved in a DNA sequencing and assembly project. Data is represented in a very simple ASCII file format that is easy for both humans and programs to read and interpret. Moreover, there is a corresponding compressed and indexed binary version for each ASCII file so that production tools built with the One-Code library are very efficient in time and the size of the data files they manipulate. All fields are strongly typed and a specific collection of data types constituting a schema is specified in a schema file, that is itself a One-Code data file. A generic converter allows one to move between the ASCII and binary representations of data, and another core tool is provided to validate files against a given schema.

There are currently two packages. Core contains the general One-Code command-line utilities alluded to above, and a C-library supporting the development of One-Code applications. VGP contains the VGP-formats schema designed specifically for DNA sequencing and assembly applications, and a growing number of VGP-tools for importing and operating on data in this schema.

To make all libraries and command line tools just type make in this top level directory. Both packages have no dependencies on other software.

Further documentation is available as follows:

CORE

VGP

  • The VGP Schema: the VGP formats schema for sequence data and genome assembly
  • The VGP Tools: command line tools for importing and operating on data in VGP-formats. Many of these are useful for general file conversions and other basic operations on arbitrary DNA sequence data sets
  • Assembly Workflow Example: a hypothetical work flow for genome assembly to illustrate potential use of VGP formats and tools. Note that this is not all implemented - it is just for illustrative purposes.

In addition, specific technical documentation can be found within individual source files.

Authors: Gene Myers, Richard Durbin, and the Vertebrate Genome Project Assembly Group

Date: February 2019

About

License:Other


Languages

Language:C 91.2%Language:C++ 2.6%Language:Shell 1.7%Language:Perl 1.6%Language:Makefile 1.5%Language:Roff 0.7%Language:M4 0.6%Language:Scilab 0.1%Language:Batchfile 0.0%