Merge VCF headers
wdesouza opened this issue · comments
Is it possible to merge VCF headers to combine records from different files? I'am trying to do something like this
let mut out_header = Header::new();
for record in reader.header().header_records() {
let header_line = todo!("convert record to string?");
out_header.push_record(header_line);
}
Hi stupid question but do you want to combine multiple or just 2 files ?
If the latter you could initialize the out_header
instead of new with template for 1st file:
let mut out_header = Header::from_template(vcfA.header());
Then you can iterate over header from file 2 and push into 1st, no ?
UPDATE:
Otherwise I would do something like that:
use rustc_hash::FxHashMap;
let mut vcf_contigs : FxHashMap<String,linear_map::LinearMap<String,String>> = FxHashMap::default();
for entries in vcf_a.header().header_records() {
match entries {
rust_htslib::bcf::header::HeaderRecord::Contig{key,mut values} => {
values.remove("IDX").expect("ERROR:could not remove ID entry!");
vcf_contigs.insert(values.get("ID").expect("ERROR: could not get ID of contig!").to_string(), values);
},
_ => println!("Not interesting"),
};
};
for entries in vcf_b.header().header_records() {
match entries {
rust_htslib::bcf::header::HeaderRecord::Contig{key,mut values} => {
values.remove("IDX").expect("ERROR:could not remove ID entry!");
let id = values.get("ID").expect("ERROR: could not get ID of contig!").to_string();
let length = values.get("length").expect("ERROR: could not get length of contig!").to_string();
if let Some(existant) = vcf_contigs.get(&id){
if existant.get("length").expect("ERROR: could not get length of contig!").to_string() != length {
panic!("ERROR: length of similar contig IDs differed!");
}else{
continue
}
}else{
vcf_contigs.insert(values.get("ID").expect("ERROR: could not get ID of contig!").to_string(), values);
}
},
_ => println!("Not interesting"),
};
};
for (_,value) in vcf_contigs.iter() {
let id = value.get("ID").expect("ERROR: could not get ID of contig!").to_string();
let length = value.get("length").expect("ERROR: could not get ID of contig!").to_string();
let assembly = value.get("assembly").expect("ERROR: could not get ID of contig!").to_string();
vcf_header.push_record(format!("##contig=<ID={},length={},assembly={}>", id, length, assembly).as_bytes());
}
This is only to match e.g. the header contig entries and to assure they are compatible.
Be aware, I am a rust noob though and there might be better and for sure more elegant ways to do that ;)
I encounter though now a weird problem doing that.
- VCF1_reader has an associated HeaderView
- VCF2_reader has an associated HeaderView
Now I am generating a new header in this process for the new comparison vcf writer.
If I visit now entries from FileA and do simply a vcf.write(&record)
into the comparison writer this works well and I get the new header and the entry from FileA.
Based on the comparison I would like to add now a new "INFO" field, e.g. "seen=2" if 2 samples contain a similar entry.
But for that I would have to do now a record.push_info_string
which is now weird because it has to be defined as well in the
header of the FileA, but how can I actually modify this now ???
I can obviously add this info in my new header for the comparison but this is not sufficient and the push above will fail into the header of File1.
Is there a way similar to :
pub fn header(&self) -> &HeaderView
Return associated header.
a way to modify the associated header or change the associated header ?
UPDATE:
Obviously one way to do it is then to use vcf.empty_record()
and populate it with the record from above.
Unfortunately there is no vcf.record_from()
or something similar as far as I can see
UPDATE:
Actually I looked into the wrong place, by doing:
vcf.translate(record);
It translates a record into the new writer and one can then push additional information