brentp / hts-nim

nim wrapper for htslib for parsing genomics data files

Home Page:https://brentp.github.io/hts-nim/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Get list of INFO and FORMAT keys from Header

edg1983 opened this issue · comments

Hi Brent,

Is there a way to get the list of keys for INFO and FORMAT definitions from the header of a VCF using hts-nim?

The idea I'm working on is to get all the INFO and FORMAT keys defined in the header from 2 VCFs so I can compute the intersection and output a new VCF containing only the shared fields for both.

Thanks!

Hi Edoardo,
there's not currently a nice way to do this. You could get the header-string from each header, then write your own code to get the intersection of INFO and FORMAT fields and then use, e.g.:

try:
 var hi = ivcf.header[key]
  # do something with hi (HeaderInfo)
except KeyError:
  continue

and you can merge headers as here: https://github.com/brentp/tnsv/blob/main/tnsv.nim#L38 (just letting htslib do that part).
In short, it's possible, but will be a lot of work and string parsing. If you give it a go and get stuck I'll attempt to help.