Running piper on M.Kaller_14_06
vezzi opened this issue · comments
Last Friday I tried to run Piper on the 6 samples recently generated using V4 technology.
All the data and analysis can be found here (it should be accessible by all members of group a2010002) :
/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06
Qualimap is always failing. The error I find in the piper log is:
ERROR 14:15:58,665 FunctionEdge - Error: /proj/a2009002/piper_resources/programs/qualimap_v1.0/qualimap --java-mem-size=64G bamqc -bam /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/pipeline_output/01_raw_alignments/P1171_104.AC41A2ANXX.P1171_104.5.bam --paint-chromosome-limits -outdir /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/pipeline_output/02_preliminary_alignment_qc/P1171_104.AC41A2ANXX.P1171_104.5.qc/ -nt 8 &> /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/pipeline_output/02_preliminary_alignment_qc/P1171_104.AC41A2ANXX.P1171_104.5.qc.log
Subsequently also HaplotypeCaller fails. This is the piper log on the first haplotypeCaller error:
ERROR 18:22:42,455 FunctionEdge - Error: 'java' '-Xmx131072m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/.queue/tmp' '-cp' '/proj/a2010002/software/piper_bin/Pipe
r/current/lib/piper_2.10-v1.2.0-beta12.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/aopalliance-1.0.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/jcommander-1.7.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/scopt_2.10-3.2.0.jar:/proj/a2010002/software/piper_bin/Piper
/current/lib/guice-2.0.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/java-xmlbuilder-0.4.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/commons-codec-1.3.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/commons-httpclient-3.1.jar:/proj/a2010002/software/piper_bin/Piper/cu
rrent/lib/commons-io-2.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/commons-lang-2.5.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/commons-logging-1.1.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/junit-3.8.1.jar:/proj/a2010002/software/piper_bin/Piper/current/li
b/log4j-1.2.16.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/jets3t-0.8.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/bsh-2.0b4.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/scala-library-2.10.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/simple-xml-2.0.
4.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/testng-5.14.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/stax-1.2.0.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/stax-api-1.0.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/GenomeAnalysisTK.jar:/proj/a2010
002/software/piper_bin/Piper/current/lib/Queue.jar' 'org.broadinstitute.gatk.engine.CommandLineGATK' '-T' 'HaplotypeCaller' '-I' '/apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/pipeline_output/04_processed_alignments/P1171_104.clean.dedup.recal.bam' '-L' '/apus/v1/a2010002_nobackup/v
ezzi/ANALYSIS/M.Kaller_14_06/.queue/scatterGather/DNABestPracticeVariantCalling-97-sg/temp_10_of_23/scatter.intervals' '-R' '/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta' '-dcov' '250' '-nct' '16' '-variant_index_type' 'LINEAR' '-variant_index_parameter' '128000' '
-o' '/apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/.queue/scatterGather/DNABestPracticeVariantCalling-97-sg/temp_10_of_23/P1171_104.clean.dedup.recal.bam.genomic.vcf' '-D' '/proj/a2009002/piper_references/gatk_bundle/2.8/b37/dbsnp_138.b37.vcf' '-ERC' 'GVCF' '-stand_call_conf' '30.0'
'-stand_emit_conf' '10.0' '-pairHMM' 'LOGLESS_CACHING' '-pcrModel' 'CONSERVATIVE'
ERROR 18:22:45,601 FunctionEdge - Contents of /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/.queue/scatterGather/DNABestPracticeVariantCalling-97-sg/temp_10_of_23/P1171_104.clean.dedup.recal.bam.genomic.vcf.out:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/apus/v1/a2010002/software/piper_bin/Piper/Piper-v1.2.0-beta12/lib/GenomeAnalysisTK.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/apus/v1/a2010002/software/piper_bin/Piper/Piper-v1.2.0-beta12/lib/Queue.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
INFO 18:20:36,270 HelpFormatter - --------------------------------------------------------------------------------
INFO 18:20:36,341 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.2-0-g799071b, Compiled 2014/07/21 11:22:24
INFO 18:20:36,341 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 18:20:36,341 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 18:20:36,404 HelpFormatter - Program Args: -T HaplotypeCaller -I /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/pipeline_output/04_processed_alignments/P1171_104.clean.dedup.recal.bam -L /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/.queue/scatterGather/DNABestPracticeVa
riantCalling-97-sg/temp_10_of_23/scatter.intervals -R /proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta -dcov 250 -nct 16 -variant_index_type LINEAR -variant_index_parameter 128000 -o /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/.queue/scatterGather/DNABestPractic
eVariantCalling-97-sg/temp_10_of_23/P1171_104.clean.dedup.recal.bam.genomic.vcf -D /proj/a2009002/piper_references/gatk_bundle/2.8/b37/dbsnp_138.b37.vcf -ERC GVCF -stand_call_conf 30.0 -stand_emit_conf 10.0 -pairHMM LOGLESS_CACHING -pcrModel CONSERVATIVE
INFO 18:20:36,417 HelpFormatter - Executing as vezzi@n35.uppmax.uu.se on Linux 2.6.32-431.20.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15.
INFO 18:20:36,417 HelpFormatter - Date/Time: 2014/07/26 18:20:36
INFO 18:20:36,417 HelpFormatter - --------------------------------------------------------------------------------
INFO 18:20:36,417 HelpFormatter - --------------------------------------------------------------------------------
INFO 18:20:38,852 GenomeAnalysisEngine - Strictness is SILENT
INFO 18:20:40,735 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250
INFO 18:20:40,742 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 18:20:42,491 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 1.75
INFO 18:20:43,764 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO 18:20:46,336 IntervalUtils - Processing 134861075 bp from intervals
INFO 18:20:46,348 MicroScheduler - Running the GATK in parallel mode with 16 total threads, 16 CPU thread(s) for each of 1 data thread(s), of 16 processors available on this machine
INFO 18:20:46,534 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 18:20:46,804 GenomeAnalysisEngine - Done preparing for traversal
INFO 18:20:46,804 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 18:20:46,804 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 18:20:46,805 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
INFO 18:20:46,805 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
INFO 18:20:46,805 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output
INFO 18:20:46,907 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO 18:20:46,908 PairHMM - Performance profiling for PairHMM is disabled because HaplotypeCaller is being run with multiple threads (-nct>1) option
Profiling is enabled only when running in single thread mode
INFO 18:21:16,808 ProgressMeter - 6:151719912 0.0 30.0 s 49.6 w 0.4% 2.2 h 2.2 h
INFO 18:21:47,400 ProgressMeter - 6:152718291 0.0 60.0 s 100.2 w 1.1% 89.3 m 88.3 m
INFO 18:22:17,841 ProgressMeter - 6:153664816 0.0 91.0 s 150.5 w 1.8% 83.2 m 81.7 m
INFO 18:22:21,399 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.lang.IndexOutOfBoundsException: Index: 28, Size: 6
at java.util.LinkedList.checkElementIndex(LinkedList.java:553)
at java.util.LinkedList.get(LinkedList.java:474)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.DanglingChainMergingGraph.mergeDanglingTail(DanglingChainMergingGraph.java:272)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.DanglingChainMergingGraph.recoverDanglingTail(DanglingChainMergingGraph.java:184)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.DanglingChainMergingGraph.recoverDanglingTails(DanglingChainMergingGraph.java:131)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:202)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:114)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.LocalAssemblyEngine.runLocalAssembly(LocalAssemblyEngine.java:164)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:1022)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:882)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:218)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.2-0-g799071b):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Index: 28, Size: 6
##### ERROR ------------------------------------------------------------------------------------------
INFO 18:22:45,601 QGraph - Writing incremental jobs reports...
INFO 18:22:45,607 QGraph - 203 Pend, 63 Run, 33 Fail, 411 Done
The samples are 4 not 6
The above is related to a bug in the GATK version we are currently using. Issue #19 will solve this.
Good to know, I was expecting this.
I will close this as a test run for issue #19
@vezzi This should be fixed in the latest release. You can test this again. :)
First thing I will do tomorrow.
@vezzi Have you tested this yet? If so can you close the issue?
tested!!!! Everything worked out as expected