tximeta: Import transcript abundances with automagic generation of metadata"

Michael Love, Rob Patro

Idea

tximeta performs numerous annotation and metadata gathering tasks on behalf of users during the import of transcript quantifications from Salmon or Sailfish into R/Bioconductor. The goal is to provide something similar to the experience of GEOquery, which downloaded microarray expression data from NCBI GEO and simultaneously brought along associated pieces of metadata. Doing this automatically helps to prevent costly bioinformatic errors. To use tximeta, all one needs is the quant directory output from Salmon (version >= 0.8.1) or Sailfish.

The key idea within tximeta is to store a signature of the transcriptome sequence itself using a hash function, computed and stored by the index and quant functions of Salmon and Sailfish. This signature acts as the identifying information for later building out rich annotations and metadata in the background, on behalf of the user. This should greatly facilitate genomic workflows, where the user can immediately begin overlapping their transcriptomic data with other genomic datasets, e.g. epigenetic tracks such as ChIP or methylation, as the data has been embedded within an organism and genome context, including the proper genome version. We seek to reduce wasted time of bioinformatic analysts, prevent costly bioinformatic mistakes, and promote computational reproducibility by avoiding situations of annotation and metadata ambiguity, when files are shared publicly or among collaborators but critical details go missing.

This package is in beta

Expect that this package will change a lot in the coming months. This is a prototype for how automatic generation of transcriptome metadata from a transcriptome sequence signature might work. Note that, as it is just a prototype, it only works for a single transcriptome (Gencode human v26), although the long term goal will be to automate signature generation for as many transcriptomes as possible, including different versions, sources, organisms, etc.

In addition, we are very interested in solving problem cases for this approach, such as derived transcriptomes (e.g. filtered, or edited after downloading from source) and de novo transcriptomes, such as those generated by StringTie, Trinity, Scripture, Oases, etc. We hope that for both of these cases tximeta might help to assist in computational reproducibility of quantification, by encapsulating the steps need to generate the transcriptome and providing a signature for checking equality.

rob-p / tximeta

tximeta: Import transcript abundances with automagic generation of metadata"

Idea

This package is in beta

About

Languages