zavolanlab / htsinfer

Infer metadata for your downstream analysis straight from your RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HTSinfer

license docs release_gh release_docker ci coverage

HTSinfer infers metadata from Illumina high-throughput sequencing (HTS) data.

Examples

Single-ended library*

htsinfer tests/files/adapter_single.fastq

Paired-ended library*

htsinfer tests/files/adapter_1.fastq tests/files/adapter_2.fastq

Output is written to STDOUT in JSON format. The log is written to STDERR.

Example output

This is the output (STDOUT) of the above-mentioned call on a paired-ended example library:

{
   "library_stats": {
      "file_1": {
         "read_length": {
            "min": 75,
            "max": 75,
            "mean": 75.0,
            "median": 75,
            "mode": 75
         }
      },
      "file_2": {
         "read_length": {
            "min": 75,
            "max": 75,
            "mean": 75.0,
            "median": 75,
            "mode": 75
         }
      }
   },
   "library_source": {
      "file_1": {
         "short_name": "hsapiens",
         "taxon_id": "9606"
      },
      "file_2": {
         "short_name": "hsapiens",
         "taxon_id": "9606"
      }
   },
   "library_type": {
      "file_1": "first_mate",
      "file_2": "second_mate",
      "relationship": "split_mates"
   },
   "read_orientation": {
      "file_1": "SF",
      "file_2": "SR",
      "relationship": "ISF"
   },
   "read_layout": {
      "file_1": {
         "adapt_3": "AATGATACGGCGACC",
         "polyA_frac": 10.0
      },
      "file_2": {
         "adapt_3": "AATGATACGGCGACC",
         "polyA_frac": 10.0
      }
   }
}

To better understand the output, please refer to the Results model in the API documentation. Note that Results model has several nested child models, such as enumerators of possible outcomes. Simply follow the references in each parent model for detailed descriptions of each child model's attributes.

General usage

htsinfer [--output-directory PATH]
         [--temporary-directory PATH]
         [--cleanup-regime {DEFAULT,KEEP_ALL,KEEP_NONE,KEEP_RESULTS}]
         [--records INT]
         [--threads INT]
         [--transcripts FASTA]
         [--read-layout-adapters PATH]
         [--read-layout-min-match-percentage FLOAT]
         [--read-layout-min-frequency-ratio FLOAT]
         [--library-source-min-match-percentage FLOAT]
         [--library-source-min-frequency-ratio FLOAT]
         [--library-type-max-distance INT]
         [--library-type-mates-cutoff FLOAT]
         [--read-orientation-min-mapped-reads INT]
         [--read-orientation-min-fraction FLOAT]
         [--tax-id INT]
         [--verbosity {DEBUG,INFO,WARN,ERROR,CRITICAL}]
         [-h] [--version]
         PATH [PATH]

Installation

In order to use the HTSinfer, clone the repository and install the dependencies via Conda:

git clone https://github.com/zavolanlab/htsinfer
cd htsinfer
conda env create --file environment.yml
# Alternatively, to install with development dependencies,
# run the following instead
conda env create --file environment-dev.yml

Note that creating the environment takes non-trivial time and it is strongly recommended that you install Mamba and replace conda with mamba in the previous command.

Then, activate the htsinfer Conda environment with:

conda activate htsinfer

If you have installed the development/testing dependencies, you may first want to verify that HTSinfer was installed correctly by executing the tests shipped with the package:

python -m pytest

Otherwise just go ahead and try one of the examples.

API documentation

Auto-built API documentation is hosted on ReadTheDocs.

Contributing

This project lives off your contributions, be it in the form of bug reports, feature requests, discussions, or fixes and other code changes. Please refer to the contributing guidelines if you are interested to contribute. Please mind the code of conduct for all interactions with the community.

Contact

For questions or suggestions regarding the code, please use the issue tracker. For any other inquiries, please contact us by email: zavolab-biozentrum@unibas.ch

(c) 2020 Zavolan lab, Biozentrum, University of Basel

About

Infer metadata for your downstream analysis straight from your RNA-seq data

License:Apache License 2.0


Languages

Language:Python 99.3%Language:Dockerfile 0.7%