GregorySchwartz / collapse-duplication

Process the output of heatitup in order to collapse sequences into clones by similar ITD mutations.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

collapse-duplication

Description

collapse-duplication collapses output from heatitup and heatitup-complete into clones by looking at the positions of their duplications and spacers.

There are three related tools for this program:

  • heatitup to categorize longest repeated substrings along with characterizing the “spacer” in-between substrings.
  • heatitup-complete to apply heatitup to BAM files along with additional options for preprocessing.
  • collapse-duplication to collapse annotated reads found by heatitup into clones with associated frequencies.

Installation

Install stack

See https://docs.haskellstack.org/en/stable/README/ for more details.

curl -sSL https://get.haskellstack.org/ | sh
stack setup

Install collapse-duplication

Online

stack install collapse-duplication

Source

stack install

Usage

Here, heatitup_output.csv must have a label field with the format SUBJECT_SAMPLE.

cat heatitup_output.csv | collapse-duplication --wiggle 5 --filterCloneFrequency 0.01 --collapseClone

Documentation

collapse-duplication, Gregory W. Schwartz. Collapse the duplication output into
clones and return their frequencies or clone IDs. Make sure format of the label
field is SUBJECT_SAMPLE

Usage: collapse-duplication [--output STRING] [--collapseClone]
                            [--wiggle DOUBLE] --filterCloneFrequency DOUBLE
                            [--filterReadFrequency DOUBLE] [--absolute]
                            [--filterType STRING] [--method STRING]

Available options:
  -h,--help                Show this help text
  --output STRING          (FILE) The output file.
  --collapseClone          Collapse the clone into a representative sequence
                           instead of appending clone IDs to the reads.
  --wiggle DOUBLE          ([0] | DOUBLE) Highly recommended to play around
                           with! The amount of wiggle room for defining clones.
                           Instead of grouping exactly by same duplication and
                           spacer location and length, allow for a position
                           distance of this much (so no two reads have a
                           difference of more than this number).
  --filterCloneFrequency DOUBLE
                           ([0.01] | DOUBLE) Filter reads (or clones) from
                           clones with too low a frequency. Default is 0.01
                           (1%).
  --filterReadFrequency DOUBLE
                           ([Nothing] | DOUBLE) Filter duplications with too
                           high a frequency (probably false positive if very
                           high, for instance if over half of reads or 0.5).
                           Converts these duplications to "Normal" sequences.
                           Frequencies and counts are taken place before
                           collapsing and filtering.
  --absolute               Whether to filter reads (or clones) from clones with
                           too low an absolute number for filterReadFrequency
                           instead frequency.
  --filterType STRING      ([Substring] | Position) Whether to filter reads with
                           filterReadFrequency using the dSubstring field or the
                           dLocations field.
  --method STRING          ([CompareAll] | Hierarchical) The method used to
                           group together wiggle room reads. Compare all
                           compares the current element with all elements in the
                           previous sublist. Hierarchical is for clustering, but
                           is most likely worse at this point in time.

About

Process the output of heatitup in order to collapse sequences into clones by similar ITD mutations.

License:GNU General Public License v3.0


Languages

Language:Haskell 100.0%