dbichko / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

https://images1-focus-opensocial.googleusercontent.com/gadgets/proxy?url=https://cloud.githubusercontent.com/assets/6096414/19521726/a4dea98e-960c-11e6-806a-a18ff04a391e.png&container=focus&resize_w=550

UMI-tools was published in Genome Research on 18 Jan '17 (early access)

Tools for dealing with Unique Molecular Identifiers

This repository contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs). Currently there are three tools:

  • extract:
    Flexible removal of UMI sequences from fastq reads.
    UMIs are removed and appended to the read name. Any other barcode, for example a library barcode, is left on the read.
  • dedup:
    Removes PCR duplicates.
    Implements a number of different UMI deduplication schemes. The recommended method is directional.
  • group:
    Groups PCR duplicates using the same methods available through `dedup`.
    This is useful when you want to interrogate the PCR duplicates

See QUICK_START.md for a quick tutorial on the most common usage pattern.

The dedup and group commands make use of network-based methods to resolve similar UMIs with the same alignment coordinates. For a background regarding these methods see:

Genome Research Publication

Blog post discussing network-based methods.

Installation

If you're using Conda, you can use:

$ conda install -c https://conda.anaconda.org/toms umi_tools

Or pip:

$ pip install umi_tools

Or if you'd like to work directly from the git repository:

$ git clone https://github.com/CGATOxford/UMI-tools.git

Enter repository and run:

$ python setup.py install

For more detail see INSTALL.rst

Help

See QUICK_START.md for a quick tutorial on the most common usage pattern.

To get detailed help on umi_tools run

$ umi_tools --help

To get help on umi_tools extract run

$ umi_tools extract --help

To get help on umi_tools dedup run

$ umi_tools dedup --help

Dependencies

umi_tools is dependent on numpy, pandas, cython, pysam and future

About

Tools for handling Unique Molecular Identifiers in NGS data sets

License:MIT License


Languages

Language:Python 99.9%Language:Shell 0.1%