power10dan / CellTics

Center for Integrated Diagnostics at Mass General Hospital NGS tools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CellTics

Center for Integrated Diagnostics at Mass General Hospital NGS tools

Installation

virtualenv --python=/path/to/python3 venv3
source venv3/bin/activate
python setup.py install

If you receive an error pertaining to lzma.h you may need to disable lzma and try python setup.py install again. (This occurs on MacOS Mojave)

export HTSLIB_CONFIGURE_OPTIONS=--disable-lzma
python setup.py install

With Mac OS High Sierra there is a new security feature to disable multithreading. If you see an error such as:

objc[49174]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.                                                                                          
objc[49174]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set 
a breakpoint on objc_initializeAfterForkError to debug.

set the following environment variable:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

VarGroup

Scans a vcf file and combines multiple nearby SNPs and indels into a single genomic event.

The pickle library DOES NOT work with the cli library which allows calling celltics directly. When multithreading you must call

python /path/to/celltics/tools/vargroup.py -i <input> -o <output>

VarGrouper runs multithreaded by default. Use --debug or set threads to 1 (-t 1) to avoid multiprocessing.

Simplest vargroup command

celltics vargroup --input-file sorted_variants.vcf --output-file grouped_variants.vcf --ref-seq hg19.fasta -t 1

Run vargroup with bam

celltics vargroup --input-file sorted_variants.vcf --output-file grouped_variants.vcf --bam-file sorted_alignment.bam --ref-seq hg19.fasta -t 1

If a reference sequence is not supplied the UCSC hg19 api is queried (http://genome.ucsc.edu/). Variants will be grouped if they are within a certain distance and occur on the same reads. For more advanced options run celltics vargroup --help.

##Troubleshooting Errors are not very informative when masked by the python multithreading module. Run vargroup with --debug and error messages are more informative.

Algorithm

VarGrouper

Version 2.0

  • added multiprocessing
  • converted from python 2.7 to python 3

Contact

About

Center for Integrated Diagnostics at Mass General Hospital NGS tools

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 98.9%Language:Shell 1.1%