zeeev / wham

Structural variant detection and association testing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Filtering the VCF file

abolia opened this issue · comments

Hi Zev,

Do you have any recommendations on setting parameters for filtering of the VCF files generated by WHAM?

Thanks,
Ashini

Ashini,

Can you tell me a little more about your experimental design? There are many filtering strategies, so will a little more knowledge I can point you to the correct one.

--Zev

Hi Zev,

The aim of my project is to detect translocations. However, the current problem we are facing with WHAM is way too many false positives (> 1000s) and we are trying ways to lessen them.

Moreover, I have also played with parameters controlling sensitivity and specificity in my WHAM runs:

For Example:
m (Min # of soft clips supporting SV start) = 15 (set pretty high)
p (Exclude Soft Clipped reads below mapping value ) 20
q (​Exclude Soft Clipped reads with average base quality below Phred-scaled value) = 30

​Therefore, trying to set filters on VCF file that can lessen the # of ​these ​false positives.

So far I have played with Read Depth (RD) > 1500. It detects the true positive but still the # of false positives is way too high.

So, if you can provide any filtering strategy, that would be immensely helpful.

Thanks.
Ashini

Ashini,

Sorry for the late reply. I would suggest using the "AT" field as a set of filters. Specifically the 4th datum in the AT field. You would expect that a true translocation would have a value > than 0.05.

If you have control data that will also help substantially.

If you send me a Snippet of your VCF file I'd be happy to help provide a filtering program for you.

--Zev

Zev,

Thanks for the reply.
Attached is the snippet of my VCF file containing the translocations we are looking for. The translocation is on Chr22 (29,684,094-29,684,602) and Chr 11 (32,415,739-32,416,247). I see that its being called but still the values in the 4th AT datum are not greater than 0.05.

sample.txt

Thanks for all your help.
Ashini.

@BrettKennedy @abolia

Here is a filter that should get you much closer.

SVLEN = 0 <- translations
CF < 0.2 <- remove sites with excessive cigar operations
CU < 10 <- remove sites where there is excessive soft clipping near by
MQ > 30 <- average mapping quality greater than 30
NC > 10 <- number of soft clips supporting breakpoint

In the file Brett sent me: there were 868 calls
There is one left after filtering.

/tools/vcflib/bin/vcffilter -f "SVLEN = 0" -f "CF < 0.2" -f "CU < 10" -f "MQ > 30" XXXX-ALK.wham.raw.vcf | perl -lane 'if($_ = /^#/){print}else{$z = $1 if $_ =~ /NC=(.*?);/; print if $1 > 10}'

Superb. Thanks Zev. I will try it on other samples too and let you know how it works.

Thanks again,
Ashini

Happy to help. I'm closing this issue, but feel free to open up another one.