Issue with reading mutect2 output on somalier (fileformat=VCFv4.2)
brain-discourse opened this issue · comments
I am having an issue with reading the mutect2 output (both, paired and tumor only) on somalier that follows VCFv4.2
I am using the following syntax:
./somalier extract -d extracted --sites /proj/humandb/sites.hg38.vcf -f /proj/ref/GRCh38.d1.vd1.fa /proj/test/4472_paired_filtered.vcf
and the output looks as follows:
somalier version: 0.2.12
[somalier] found 0 sites
My vcf files have all the relevant headers but I can't seem to extract sites for any of my files
chr1 17398654 . GCC G,GC,GCCC . germline;multiallelic;normal_artifact;panel_of_normals AS_FilterStatus=weak_evidence|SITE|weak_evidence;AS_SB_TABLE=0,20|0,3|0,22|0,3;DP=60;ECNT=1;GERMQ=1;MBQ=40,40,40,40;MFRL=382,357,350,373;MMQ=60,60,60,60;MPOS=46,50,57;NALOD=0.676,-3.010e+01,0.836;NLOD=1.89,-3.138e+01,6.42;PON;POPAF=4.51,4.51,4.51;ROQ=93;RPA=11,9,10,12;RU=C;STR;STRQ=93;TLOD=3.12,16.16,4.01 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:8,1,14,0:0.040,0.572,0.030:23:0,0,2,0:7,1,10,0:0,8,0,15 0/1/2/3:12,2,8,3:0.097,0.312,0.122:25:0,2,0,0:12,0,8,3:0,12,0,13
chr1 18841984 . GA G,GAA,GAAA . germline;multiallelic;normal_artifact;panel_of_normals;slippage AS_FilterStatus=weak_evidence|base_qual|base_qual;AS_SB_TABLE=1,70|0,12|1,55|1,23;DP=220;ECNT=1;GERMQ=1;MBQ=13,28,13,13;MFRL=355,381,336,370;MMQ=60,60,60,60;MPOS=43,40,26;NALOD=-2.318e+00,-4.064e+01,-6.058e+00;NLOD=13.80,-5.352e+01,-1.757e+01;PON;POPAF=4.51,4.51,4.51;ROQ=93;RPA=15,14,16,17;RU=A;STR;STRQ=1;TLOD=7.64,34.95,12.63 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:41,4,28,11:0.048,0.340,0.127:84:3,1,3,0:5,0,4,3:0,41,1,42 0/1/2/3:30,8,28,13:0.099,0.353,0.163:79:2,3,4,2:3,3,5,3:1,29,1,48
Any idea what might be causing this?
Thanks in advance!
I also tried using the vcf file from the GATK Haplotype caller that has the following format but am encountering the same issue:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT fcde191bl
chr1 1 . N <NON_REF> . . END=12972 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr1 12973 . A <NON_REF> . . END=13001 GT:DP:GQ:MIN_DP:PL 0/0:1:3:1:0,3,35
chr1 13002 . G <NON_REF> . . END=13014 GT:DP:GQ:MIN_DP:PL 0/0:2:6:2:0,6,66
chr1 13015 . G <NON_REF> . . END=13043 GT:DP:GQ:MIN_DP:PL 0/0:3:9:3:0,9,98
chr1 13044 . A <NON_REF> . . END=13057 GT:DP:GQ:MIN_DP:PL 0/0:4:12:4:0,12,133
chr1 13058 . C <NON_REF> . . END=13083 GT:DP:GQ:MIN_DP:PL 0/0:5:15:5:0,15,148
chr1 13084 . G <NON_REF> . . END=13085 GT:DP:GQ:MIN_DP:PL 0/0:6:18:6:0,18,192
chr1 13086 . C <NON_REF> . . END=13088 GT:DP:GQ:MIN_DP:PL 0/0:7:21:7:0,21,210
hi, somalier will work best when you extract from the bam files. extracting from a VCF with somatic calls is unlikely to work because few of those sites (none in your case) would be common variants.
if you are unable to extract from bam then we can debug the gvcf issue.
Thank you for your response. I tried extracting from the bam file but its generating an empty file with no error messages.
./somalier extract -d extracted/ --sites /proj/humandb/sites.hg38.vcf -f /proj/ref/GRCh38.d1.vd1.fa /proj/bams/3828.bwamem.sorted.marked.bam
this is there message I get when I run the code:
somalier version: 0.2.12
somalier version: 0.2.5 didn't work either.
Thank you in advance!
are you using these bams with other tools? most likely scenario I can think of is that something is wrong with them. can you re-index? are they hg38?
did you change the sites file? it does not come unzipped, maybe try re-downloading from https://github.com/brentp/somalier/files/3412456/sites.hg38.vcf.gz
Thank you so much! There was an issue with the sites file. This one worked perfectly with the bam.