zeeev / wham

Structural variant detection and association testing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

whamg trouble - finishing instantly with no results

afkoeppel opened this issue · comments

Trying to call SVs on some rat sequences and want to try whamg.

whamg -a $GENOME –f SL150028.bwamem.unfilt.rmdup.sort.bam > SL150028.vcf 2> SL150028.err

Strangely, it seems to be finishing instantaneously (despite >50Gb bam file), and producing no results.

cat SL150028.err
INFO: fasta file: /home/sdt5z/genomes/igenomes/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/WholeGenomeFasta/genome.fa
INFO: Loading discordant reads into forest.
INFO: Finished loading reads.
INFO: Gathering graphs from forest.
INFO: Matching breakpoints.
INFO: Printing.
INFO: done processing trees
INFO: WHAM finished normally, goodbye!

The output VCF file contains the VCF header lines but is otherwise empty.

I also tried with the -c option limiting to just a few chromosomes, but the results were the same.
Any advice about what I might be doing wrong would be appreciated.

Odd.
Can you post the full error message?
This is whole genome paired end sequencing data?
Also, can you post the header of the bam file?
Specifically, do you have the SQ tag?

Well that's what's so odd, there was no error message. Nothing at all was echoed to the screen, and according to the .err file everything went fine, but there was no output (except for the VCF header) and it took no time to run.

samtools view -H SL150028.bwamem.unfilt.rmdup.sort.bam
@hd VN:1.3 SO:coordinate
@sq SN:10 LN:112200500
@sq SN:11 LN:93518069
@sq SN:12 LN:54450796
@sq SN:13 LN:118718031
@sq SN:14 LN:115151701
@sq SN:15 LN:114627140
@sq SN:16 LN:90051983
@sq SN:17 LN:92503511
@sq SN:18 LN:87229863
@sq SN:19 LN:72914587
@sq SN:1 LN:290094216
@sq SN:20 LN:57791882
@sq SN:2 LN:285068071
@sq SN:3 LN:183740530
@sq SN:4 LN:248343840
@sq SN:5 LN:177180328
@sq SN:6 LN:156897508
@sq SN:7 LN:143501887
@sq SN:8 LN:132457389
@sq SN:9 LN:121549591
@sq SN:MT LN:16313
@sq SN:X LN:154597545
@rg ID:SL150028 SM:SL150028
@pg ID:bwa PN:bwa CL:bwa mem /home/sdt5z/genomes/igenomes/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BWAIndex/genome.fa SL150028_1.fastq.gz SL150028_2.fastq.gz -t 20 -M -R @rg\tID:SL150028\tSM:SL150028 VN:0.7.12-r1039

can you post the full SL150028.err error file?

There should be something like:

INFO: for Sample:xxx
STATS: UV940: mean depth ..............: 39.2523
STATS: UV940: sd depth ................: 10.7403
STATS: UV940: mean insert length: .....: 319.644
STATS: UV940: median insert length ....: 318
STATS: UV940: sd insert length ........: 69.2122
STATS: UV940: lower insert cutoff .....: 208.905
STATS: UV940: upper insert cutoff .....: 492.675
STATS: UV940: average base quality ....: 28.347
STATS: UV940: average mapping quality .: 59.0846
STATS: UV940: number of reads used ....: 100227

What I posted in the original message was the full file. I have only INFO fields, no STATS fields like in your example.

@afkoeppel thanks, I was expecting it to be longer, but maybe not if it is an options parsing error.

original:

"whamg -a $GENOME **–**f SL150028.bwamem.unfilt.rmdup.sort.bam > SL150028.vcf 2> SL150028.err"

try:

"whamg -a $GENOME -f SL150028.bwamem.unfilt.rmdup.sort.bam > SL150028.vcf 2> SL150028.err"

Note the difference in the dash for -f.

If this solves the problem I'll add better error handling for this case.

Well I'll be. I replaced the dash and now it seems to be running. I'll update as soon as I have output. I'm not even quite sure how I got that non-dash dash, but I think maybe from copy-pasting from the instructions here (I sometimes copy example commands and then just replace the file names with my own).

Shame on me! I'll make sure to check for m vs n dash.

http://www.punctuationmatters.com/hyphen-dash-n-dash-and-m-dash/

The wrong dash was in the README.

That did the trick. I have a vcf with SVs in it now. Thank you so much for your help.