skipping 'AC' because it has Number=A

Question

skipping 'AC' because it has Number=A

matthdsm opened this issue 8 years ago · comments

Hi,
When I try to load my VEP annotated vcf's into a db I get the error above. I traced this back to the code (line 656)

            if d['Number'] in "RA":
                print("skipping '%s' because it has Number=%s" % (d["ID"], d["Number"]), 
                      file=sys.stderr)
                continue

Would it be possible to explain why those columns get skipped? Is there a way to force include them? The AC, AF MLEAC en MLEAF info fields are quite important, and the info fields are compliant with the VCF standard.

Thanks
M

Brent Pedersen · Answer 1 · Wed Oct 12 2016 21:19:30 GMT+0800 (China Standard Time)

There is a 1:1 mapping of INFO fields to database fields. When your AF value is something like 0.233,0.444 because there are multiple ALTs, we can't store that as a single float.
If you decompose your VCF and make sure the header is adjusted so that records that had Number=A are replaced with Number=1.

If you annotate with vcfanno, you could use a [[postannotation]] section and use max(AF) to get a new field and then save that in the database.

I would like to handle this more cleanly, but I haven't thought of anything that works well with RDBMS.

Matthias De Smet · Answer 2 · Wed Oct 12 2016 21:23:51 GMT+0800 (China Standard Time)

We always decompose and normalize with vt before we do our annotation. If I'm understanding you correctly, I can then safely edit the vcf header and replace the A by 1 and I should be good to go?

Nice, I though it was going to be more complicated! looking forward to the next release of vcf2db to bioconda!

Thanks,
M

Brent Pedersen · Answer 3 · Wed Oct 12 2016 21:28:56 GMT+0800 (China Standard Time)

Yes, that should work. Remember for GATK AD tag, you need to do: sed 's/ID=AD,Number=./ID=AD,Number=R/' before decompose.

Matthias De Smet · Answer 4 · Wed Oct 12 2016 21:30:02 GMT+0800 (China Standard Time)

thanks for the tip!

M