rust-bio / rust-htslib

This library provides HTSlib bindings and a high level Rust API for reading and writing BAM files.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Merge VCF headers

wdesouza opened this issue · comments

Is it possible to merge VCF headers to combine records from different files? I'am trying to do something like this

let mut out_header = Header::new();
for record in reader.header().header_records() {
    let header_line = todo!("convert record to string?");
    out_header.push_record(header_line);
}

Hi stupid question but do you want to combine multiple or just 2 files ?
If the latter you could initialize the out_header instead of new with template for 1st file:

let mut out_header = Header::from_template(vcfA.header());

Then you can iterate over header from file 2 and push into 1st, no ?

UPDATE:

Otherwise I would do something like that:

use rustc_hash::FxHashMap;

    let mut vcf_contigs : FxHashMap<String,linear_map::LinearMap<String,String>> = FxHashMap::default();

    for entries in vcf_a.header().header_records() {
        match entries {
            rust_htslib::bcf::header::HeaderRecord::Contig{key,mut values} => {
                values.remove("IDX").expect("ERROR:could not remove ID entry!");
                vcf_contigs.insert(values.get("ID").expect("ERROR: could not get ID of contig!").to_string(), values);
            },
            _ => println!("Not interesting"),
        };
    };

    for entries in vcf_b.header().header_records() {
        match entries {
            rust_htslib::bcf::header::HeaderRecord::Contig{key,mut values} => {
                values.remove("IDX").expect("ERROR:could not remove ID entry!");
                let id =  values.get("ID").expect("ERROR: could not get ID of contig!").to_string();
                let length = values.get("length").expect("ERROR: could not get length of contig!").to_string(); 
                if let Some(existant) = vcf_contigs.get(&id){
                    if existant.get("length").expect("ERROR: could not get length of contig!").to_string() != length {
                        panic!("ERROR: length of similar contig IDs differed!");
                    }else{
                        continue
                    }

                }else{
                    vcf_contigs.insert(values.get("ID").expect("ERROR: could not get ID of contig!").to_string(), values);
                }
            },
            _ => println!("Not interesting"),
        };
    };

    for (_,value) in vcf_contigs.iter() {
        let id = value.get("ID").expect("ERROR: could not get ID of contig!").to_string();
        let length = value.get("length").expect("ERROR: could not get ID of contig!").to_string();
        let assembly = value.get("assembly").expect("ERROR: could not get ID of contig!").to_string();
        vcf_header.push_record(format!("##contig=<ID={},length={},assembly={}>", id, length, assembly).as_bytes());
    }

This is only to match e.g. the header contig entries and to assure they are compatible.
Be aware, I am a rust noob though and there might be better and for sure more elegant ways to do that ;)

I encounter though now a weird problem doing that.

  • VCF1_reader has an associated HeaderView
  • VCF2_reader has an associated HeaderView

Now I am generating a new header in this process for the new comparison vcf writer.
If I visit now entries from FileA and do simply a vcf.write(&record) into the comparison writer this works well and I get the new header and the entry from FileA.
Based on the comparison I would like to add now a new "INFO" field, e.g. "seen=2" if 2 samples contain a similar entry.
But for that I would have to do now a record.push_info_string which is now weird because it has to be defined as well in the
header of the FileA, but how can I actually modify this now ???
I can obviously add this info in my new header for the comparison but this is not sufficient and the push above will fail into the header of File1.

Is there a way similar to :

pub fn header(&self) -> &HeaderView

Return associated header.

a way to modify the associated header or change the associated header ?

UPDATE:

Obviously one way to do it is then to use vcf.empty_record() and populate it with the record from above.
Unfortunately there is no vcf.record_from() or something similar as far as I can see

UPDATE:
Actually I looked into the wrong place, by doing:

vcf.translate(record);

It translates a record into the new writer and one can then push additional information