UMI-tools was published in Genome Research on 18 Jan '17 (early access)
This repository contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs). Currently there are three tools:
- extract: Flexible removal of UMI sequences from fastq reads.
UMIs are removed and appended to the read name. Any other barcode, for example a library barcode, is left on the read.
- dedup: Removes PCR duplicates. Implements a number of different UMI deduplication schemes.
The recommended method is directional.
- group: Groups PCR duplicates using the same methods available through dedup.
This is useful when you want to interrogate the PCR duplicates
See QUICK_START.md for a quick tutorial on the most common usage pattern.
The dedup and group commands make use of network-based methods to resolve similar UMIs with the same alignment coordinates. For a background regarding these methods see:
Blog post discussing network-based methods.
If you're using Conda, you can use:
$ conda install -c https://conda.anaconda.org/toms umi_tools
Or pip:
$ pip install umi_tools
Or if you'd like to work directly from the git repository:
$ git clone https://github.com/CGATOxford/UMI-tools.git
Enter repository and run:
$ python setup.py install
For more detail see INSTALL.rst
See QUICK_START.md for a quick tutorial on the most common usage pattern.
To get detailed help on umi_tools run
$ umi_tools --help
To get help on umi_tools extract run
$ umi_tools extract --help
To get help on umi_tools dedup run
$ umi_tools dedup --help
umi_tools is dependent on numpy, pandas, cython, pysam and future