comprna / radian

Nanopore direct RNA basecaller

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RADIAN

RNA lAnguage informeD decodIng of nAnopore sigNals

Overview

Nanopore direct RNA basecaller that utilises a model of mRNA language.

Since RNA is always sequenced from the 3' to 5' direction, nanopore signals implicitly encode the nucleotide biases in mRNA. This basecaller uses a probabilistic model of human mRNA language to guide basecalling when the signal prediction is ambiguous. The mRNA model is incorporated in a modified CTC beam search decoding algorithm.

Preprint: https://www.biorxiv.org/content/10.1101/2022.10.19.512968v1

RADIAN architecture

Installation

cd <path/to/radian>
pip install --upgrade pip
pip install -r requirements.txt
tar -xvzf radian/models/rnamodel_12mer_pc.tar.gz

Command structure

usage: basecall.py [-h] fast5_dir fasta_dir [--local] [--chunk-len] [--step-size]
                   [--batch-size] [--outlier-clip] [--rna-model]
                   [--sig-model] [--sig-config] [--beam-width]
                   [--decode-type] [--sig-threshold]
                   [--rna-threshold] [--context-len]

positional arguments:
  fast5_dir             Directory of single/multi fast5 files.
  fasta_dir             Directory to output fasta files.

optional arguments:
  -h, --help
  --local
  --chunk-len
  --step-size
  --batch-size
  --outlier-clip
  --rna-model
  --sig-model
  --sig-config
  --beam-width
  --decode-type {global,chunk}
  --sig-threshold
  --rna-threshold
  --context-len

Example usage

We provide a fast5 file containing 5 reads for testing in data/reads.fast5.

To basecall the single or multi-fast5 file(s) in and output fasta to <out_dir>:

cd radian
mkdir out_dir
python3 basecall.py data out_dir

About

Nanopore direct RNA basecaller

License:GNU General Public License v3.0


Languages

Language:Python 100.0%