luntergroup / octopus

Bayesian haplotype-based mutation calling

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question: Calling non-variant sites

ja-Reeve opened this issue · comments

How can I get Octopus to output a VCF including non-variant sites?

Thanks. I tried that before on a short 600kb region of the genome, but it only returns variant sites.

My call:

octopus \
                 -R $PATH/ref_genome.fasta \
                 -i $PATH/bam_list.txt \
                 -o $PATH/Octopus_trial1.6b.vcf.gz \
                 -T Contig38698:0-608273 \
                 --very-fast --refcall POSITIONAL

Output (only showing 1st 10 calls & 3 samples):
Screen Shot 2022-11-04 at 16 51 41

Are you using the latest commit in the development branch?

Yes, version: 17a597d-20220708 is installed on the server I am using

What if you remove --very-fast ?

I tried a test run, but it timed out after 5days. I will give it another go with more time.

Maybe use samtools on a single sample to filter the bam file to just that region to test octopus without the --very-fast?

I tested this region with a single sample without `--very-fast``. I only returns indels and SNPs, no non-variant sites.
Screen Shot 2022-11-04 at 19 17 56

Maybe I do not understand, but these are non-variant sites as well in the output you show. For example Contig38698:1 has phased genotype 0|0 . The remaining sites in the interval Contig38698:2-13763 are presumably not passing filter? So, you might need to keep the raw calls with --keep-unfiltered-calls and perhaps do some post processing to include the lines from the unfiltered file to the filtered file? I guess last time I used this, I realized I only wanted the confidently called sites, whether variant or non-variant.

Calling again with --keep-unfiltered-calls worked. You're right, many sites were being removed due to the filters.
Thanks a lot for your help.

Great!