brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with reading mutect2 output on somalier (fileformat=VCFv4.2)

brain-discourse opened this issue · comments

I am having an issue with reading the mutect2 output (both, paired and tumor only) on somalier that follows VCFv4.2

I am using the following syntax:
./somalier extract -d extracted --sites /proj/humandb/sites.hg38.vcf -f /proj/ref/GRCh38.d1.vd1.fa /proj/test/4472_paired_filtered.vcf

and the output looks as follows:

somalier version: 0.2.12
[somalier] found 0 sites

My vcf files have all the relevant headers but I can't seem to extract sites for any of my files

chr1	17398654	.	GCC	G,GC,GCCC	.	germline;multiallelic;normal_artifact;panel_of_normals	AS_FilterStatus=weak_evidence|SITE|weak_evidence;AS_SB_TABLE=0,20|0,3|0,22|0,3;DP=60;ECNT=1;GERMQ=1;MBQ=40,40,40,40;MFRL=382,357,350,373;MMQ=60,60,60,60;MPOS=46,50,57;NALOD=0.676,-3.010e+01,0.836;NLOD=1.89,-3.138e+01,6.42;PON;POPAF=4.51,4.51,4.51;ROQ=93;RPA=11,9,10,12;RU=C;STR;STRQ=93;TLOD=3.12,16.16,4.01	GT:AD:AF:DP:F1R2:F2R1:SB	0/0:8,1,14,0:0.040,0.572,0.030:23:0,0,2,0:7,1,10,0:0,8,0,15	0/1/2/3:12,2,8,3:0.097,0.312,0.122:25:0,2,0,0:12,0,8,3:0,12,0,13
chr1	18841984	.	GA	G,GAA,GAAA	.	germline;multiallelic;normal_artifact;panel_of_normals;slippage	AS_FilterStatus=weak_evidence|base_qual|base_qual;AS_SB_TABLE=1,70|0,12|1,55|1,23;DP=220;ECNT=1;GERMQ=1;MBQ=13,28,13,13;MFRL=355,381,336,370;MMQ=60,60,60,60;MPOS=43,40,26;NALOD=-2.318e+00,-4.064e+01,-6.058e+00;NLOD=13.80,-5.352e+01,-1.757e+01;PON;POPAF=4.51,4.51,4.51;ROQ=93;RPA=15,14,16,17;RU=A;STR;STRQ=1;TLOD=7.64,34.95,12.63	GT:AD:AF:DP:F1R2:F2R1:SB	0/0:41,4,28,11:0.048,0.340,0.127:84:3,1,3,0:5,0,4,3:0,41,1,42	0/1/2/3:30,8,28,13:0.099,0.353,0.163:79:2,3,4,2:3,3,5,3:1,29,1,48

Any idea what might be causing this?

Thanks in advance!

I also tried using the vcf file from the GATK Haplotype caller that has the following format but am encountering the same issue:

CHROM   POS    ID     REF    ALT    QUAL   FILTER INFO   FORMAT fcde191bl
chr1   1      .      N      <NON_REF>     .      .      END=12972     GT:DP:GQ:MIN_DP:PL       0/0:0:0:0:0,0,0
chr1   12973  .      A      <NON_REF>     .      .      END=13001     GT:DP:GQ:MIN_DP:PL       0/0:1:3:1:0,3,35
chr1   13002  .      G      <NON_REF>     .      .      END=13014     GT:DP:GQ:MIN_DP:PL       0/0:2:6:2:0,6,66
chr1   13015  .      G      <NON_REF>     .      .      END=13043     GT:DP:GQ:MIN_DP:PL       0/0:3:9:3:0,9,98
chr1   13044  .      A      <NON_REF>     .      .      END=13057     GT:DP:GQ:MIN_DP:PL       0/0:4:12:4:0,12,133
chr1   13058  .      C      <NON_REF>     .      .      END=13083     GT:DP:GQ:MIN_DP:PL       0/0:5:15:5:0,15,148
chr1   13084  .      G      <NON_REF>     .      .      END=13085     GT:DP:GQ:MIN_DP:PL       0/0:6:18:6:0,18,192
chr1   13086  .      C      <NON_REF>     .      .      END=13088     GT:DP:GQ:MIN_DP:PL       0/0:7:21:7:0,21,210

hi, somalier will work best when you extract from the bam files. extracting from a VCF with somatic calls is unlikely to work because few of those sites (none in your case) would be common variants.
if you are unable to extract from bam then we can debug the gvcf issue.

Thank you for your response. I tried extracting from the bam file but its generating an empty file with no error messages.

./somalier extract -d extracted/ --sites /proj/humandb/sites.hg38.vcf -f /proj/ref/GRCh38.d1.vd1.fa /proj/bams/3828.bwamem.sorted.marked.bam

this is there message I get when I run the code:

somalier version: 0.2.12

somalier version: 0.2.5 didn't work either.

Thank you in advance!

are you using these bams with other tools? most likely scenario I can think of is that something is wrong with them. can you re-index? are they hg38?
did you change the sites file? it does not come unzipped, maybe try re-downloading from https://github.com/brentp/somalier/files/3412456/sites.hg38.vcf.gz

Thank you so much! There was an issue with the sites file. This one worked perfectly with the bam.