colindaven / rustyread

A long read simulator based on badread idea

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Test Lints MSRV CodeCov Documentation License

Rustyread

Rustyread is a drop in replacement of badread simulate. Rustyread is very heavily inspired by badread, it reuses the same error and quality model file. But Rustyreads is multi-threaded and benefits from other optimizations.

WARNING:

  • Rustyread has not yet been evaluated or even compared to any other long read generators
  • Rustyread is tested only on Linux
  • Rustyread is still in developpement many thing can change or be break

Usage

If previously you called badread like this:

badread simulate --reference {reference path} --quantity {quantity} > {reads}.fastq

you can now replace badread by rustyread:

rustyread simulate --reference {reference path} --quantity {quantity} > {reads}.fastq

But by default rustyread use all avaible core you can control it with option threads:

rustyread --theads {number of thread} simulate --reference {reference path} --quantity {quantity} > {reads}.fastq

If you have badread installed in your python sys.path rustyread can found error and quality model automatically, but you can still use --error_model and --qscore_model option.

Control memory usage

Rustyread memory usage could be estimated with formula: 2 * reference base + 2 * targeted base + epsilon, to limit memory impact of Rustyread you can use parameter number_base_store it's take an absolute value or a relative depth, if this option is set memory usage became 2 * reference base + 2 number_base_store + epsilon.

Full usage

rustyread 0.4 Pidgeotto
Pierre Marijon <pierre.marijon@hhu.de>
A long read simulator based on badread idea and model

USAGE:
    rustyread [FLAGS] [OPTIONS] <SUBCOMMAND>

FLAGS:
    -h, --help         Prints help information
    -v, --verbosity    verbosity level also control by environment variable RUSTYREAD_LOG if flag is
                       set RUSTYREAD_LOG value is ignored
    -V, --version      Prints version information

OPTIONS:
    -t, --threads <threads>    Number of thread use by rustyread, 0 use all avaible core, default
                               value 0

SUBCOMMANDS:
    help        Prints this message or the help of the given subcommand(s)
    simulate    Generate fake long read
rustyread-simulate
Generate fake long read

USAGE:
    rustyread simulate [FLAGS] [OPTIONS] --reference <reference-path> --quantity <quantity>

FLAGS:
    -h, --help                  Prints help information
        --small_plasmid_bias    If set, then small circular plasmids are lost when the fragment
                                length is too high (default: small plasmids are included regardless
                                of fragment length)
    -V, --version               Prints version information

OPTIONS:
        --chimera <chimera>
            Percentage at which separate fragments join together [default: 1]

        --end_adapter <end-adapter>
            Adapter parameters for read ends (rate and amount) [default: 50,20]

        --end_adapter_seq <end-adapter-seq>
            Adapter parameters for read ends [default: GCAATACGTAACTGAACGAAGT]

        --error_model <error-model>
            Path to an error model file [default: nanopore2020]

        --glitches <glitches>
            Read glitch parameters (rate, size and skip) [default: 10000,25,25]

        --identity <identity>
            Sequencing identity distribution (mean, max and stdev) [default: 85,95,5]

        --junk_reads <junk>
            This percentage of reads wil be low complexity junk [default: 1]

        --length <length>
            Fragment length distribution (mean and stdev) [default: 15000,13000]

        --number_base_store <nb-base-store>
            Number of base, rustyread can store in ram before write in output in absolute value
            (e.g. 250M) or a relative depth (e.g. 25x)

        --output <output-path>                     Where read is write
        --qscore_model <qscore-model>
            Path to an quality score model file [default: nanopore2020]

        --quantity <quantity>
            Either an absolute value (e.g. 250M) or a relative depth (e.g. 25x)

        --random_reads <random>
            This percentage of reads wil be random sequence [default: 1]

        --reference <reference-path>               Reference fasta (can be gzipped, bzip2ped, xzped)
        --seed <seed>
            Random number generator seed for deterministic output (default: different output each
            time)

        --start_adapter <start-adapter>
            Adapter parameters for read starts (rate and amount) [default: 90,60]

        --start_adapter_seq <start-adapter-seq>
            Adapter parameters for read starts [default: AATGTACTTCGTTCAGTTACGTATTGCT]

Installation

Bioconda

If you haven't bioconda setup follow this instruction

conda|mamba install rustyread

With rust environment

If you haven't a rust environment you can use rustup or your package manager.

With cargo

cargo install --git https://github.com/natir/rustyread.git --tag 0.4

From source

git clone https://github.com/natir/rustyread.git
cd rustyread
git checkout 0.4
cargo install --path .

Minimum supported Rust version

Currently the minimum supported Rust version is 1.56.0.

Difference with badread

  • option small_plasmid_bias is silently ignored but small plasmid is 'sequence'

About

A long read simulator based on badread idea

License:MIT License


Languages

Language:Jupyter Notebook 99.2%Language:Rust 0.8%Language:Shell 0.0%Language:Python 0.0%