skipping 'AC' because it has Number=A
matthdsm opened this issue · comments
Hi,
When I try to load my VEP annotated vcf's into a db I get the error above. I traced this back to the code (line 656)
if d['Number'] in "RA":
print("skipping '%s' because it has Number=%s" % (d["ID"], d["Number"]),
file=sys.stderr)
continue
Would it be possible to explain why those columns get skipped? Is there a way to force include them? The AC, AF MLEAC en MLEAF info fields are quite important, and the info fields are compliant with the VCF standard.
Thanks
M
There is a 1:1 mapping of INFO fields to database fields. When your AF value is something like 0.233,0.444
because there are multiple ALTs, we can't store that as a single float.
If you decompose your VCF and make sure the header is adjusted so that records that had Number=A are replaced with Number=1.
If you annotate with vcfanno, you could use a [[postannotation]]
section and use max(AF) to get a new field and then save that in the database.
I would like to handle this more cleanly, but I haven't thought of anything that works well with RDBMS.
We always decompose and normalize with vt before we do our annotation. If I'm understanding you correctly, I can then safely edit the vcf header and replace the A
by 1
and I should be good to go?
Nice, I though it was going to be more complicated! looking forward to the next release of vcf2db to bioconda!
Thanks,
M
Yes, that should work. Remember for GATK AD tag, you need to do: sed 's/ID=AD,Number=./ID=AD,Number=R/' before decompose.
thanks for the tip!
M