Dylan-DPC / fibertools-rs

Tools for fiberseq data written in rust.

Home Page:https://fiberseq.github.io/fibertools-rs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool



fibertools-rs

fibertools-rs dark logo

fibertools-rs light logo

Actions Status Conda (channel only) Downloads crates.io version crates.io downloads DOI

fibertools-rs a CLI tool for creating and interacting with fiberseq bam files.

Install Conda (channel only)

fibertools-rs is avalible through bioconda and can be installed with the following command:

mamba install -c conda-forge -c bioconda fibertools-rs

However, the bioconda version currently does not support GPU acceleration. If you would like to use GPU acceleration, you will need to install using the directions in the INSTALL.md file.

Usage

ft --help

Help page for fibertools

Subcommands for fibertools-rs

ft predict-m6a

Predict m6A positions using HiFi kinetics data and encode the results in the MM and ML bam tags. Help page for predict-m6a.

ft add-nucleosomes

Add nucleosomes to a bam that file already contains m6a predictions. Note, this process is also run in the background during predict-m6a, so it is unnecessary to run independently unless you want to try new parameters for nucleosome calling. Help page for add-nucleosomes.

ft extract

Extracts fiberseq data from a bam file into plain text. Help page for extract.

ft extract --all

The extract all option is a special option that tries to extract all the fiberseq data into a tabular format. The following is an image of the output. Note that the column names will be preserved across different software versions (unless otherwise noted); however, the order may change and new columns may be added. Therefore, when loading the data (with pandas e.g.) be sure to use the column names as opposed to indexes for manipulation. ft-extract all

ft center

Center fiberseq reads (bam) around reference position(s). Help page for center. Center

Cite

Jha, A., Bohaczuk, S. C., Mao, Y., Ranchalis, J., Mallory, B. J., Min, A. T., Hamm, M. O., Swanson, E., Finkbeiner, C., Li, T., Whittington, D., Stergachis, A. B., & Vollger, M. R. (2023). Fibertools: fast and accurate DNA-m6A calling using single-molecule long-read sequencing. bioRxiv. https://doi.org/10.1101/2023.04.20.537673

Read the fibertools library docs

You can find the docs for the latest release here: https://docs.rs/fibertools-rs/latest/fibertools_rs/ or download from source and run:

cargo doc --open

and the docs will open in your browser.

TODO items

  • Use new iterator for ft extract and group writes to try and improve the speed
  • Set filters for ML depending on the model used
  • long format extract command
  • Add rustybam stats to ft all as an option
  • add option result to bamlift
  • Add more test cases, learn about test modules in folders
  • Test GPU support, see if I can simplify or statically link PyTorch.
  • Improve progress bar for predict-m6a.
    • Get size of bam, say how far we are through the bam in terms of MB/GB?
  • Add unaligned, secondary, supplemental reads to the test bam.
  • Detect GPU memory to set batch size dynamically.

About

Tools for fiberseq data written in rust.

https://fiberseq.github.io/fibertools-rs/


Languages

Language:Rust 98.5%Language:Dockerfile 0.9%Language:Shell 0.5%