compressed VCF
Ballote opened this issue · comments
Hi,
When I'm changing from CHP files to BCF this is the command:
- bcftools +affy2vcf \
- --no-version -Ou \
- --csv "GenomeWideSNP_6.na35.annot.csv" \
- --fasta-ref "human_g1k_v37.fasta" \
- --chps /home/user/project/cc-chp/NAME \
- --snp /home/user/project/AxiomGT1.snp-posteriors.txt \
- --extra NAME.tsv | \
- bcftools sort -Ou -T ./bcftools-sort.XXXXXX | \
- bcftools norm --no-version -Ob -o NAME.vcf -c x -f "human_g1k_v37.fasta" && \
- bcftools index -f NAME.vcf
I was wondering, if I want to change the format to VCF I need to change the lines 2, 8 and 9 to "-Ov", "-Ov" and "-Oz", respectively? I mean, because "-Ov" and "-Oz" is for VCF, instead of "-Ou" and "-Ob" that is for BCF format.
If this is correct, It would look like this:
- bcftools +affy2vcf \
- --no-version -Ov \
- --csv "GenomeWideSNP_6.na35.annot.csv" \
- --fasta-ref "human_g1k_v37.fasta" \
- --chps /home/user/project/cc-chp/NAME \
- --snp /home/user/project/AxiomGT1.snp-posteriors.txt \
- --extra NAME.tsv | \
- bcftools sort -Ov -T ./bcftools-sort.XXXXXX | \
- bcftools norm --no-version -Oz -o NAME.vcf -c x -f "human_g1k_v37.fasta" && \
- bcftools index -f NAME.vcf
When I run it in this way, I have the VCF file in the end, but also I have this message:
index: "NAME.vcf" is in a format that cannot be usefully indexed
I just want to know if the change is correct and if its correct, there is any way to index the file usefully?
You should still keep the -Ou
for bcftools +affy2vcf
and bcftools sort
as uncompressed binary VCF is the fastest format for piping (though this is not advertised well enough). You can keep -Oz
for the last bcftools norm
command.
Your command should work as is, though it would likely be more appropriate to use the NAME.vcf.gz
rather than NAME.vcf
as you are outputting a compressed VCF with bcftools norm
.
I'm going to do that
Thank you so much for your time