Values chosen in the --matrix output ?
nigiord opened this issue · comments
Hi there,
While doing an all-to-all ANI comparison on a set of genomes, I noticed that the regular output displays different values when the genomes are switched:
1509405_PRJNA252589.fasta.gz 246200_PRJNA281.fasta.gz 76.3461 98 1679
246200_PRJNA281.fasta.gz 1509405_PRJNA252589.fasta.gz 76.9103 84 1369
When using the --matrix
option there is only a single value for this pair, which is 76.628181
(looks like the mean).
I thus have two questions:
- Why do the ANI values change when the genome are switched?
- Is there a particular reason to use the mean of the two values in the
--matrix
output?
Cheers,
Nils
Following 👁
@nigiord , the basic pipeline that we follow to estimate ANI lacks symmetry, (e.g., if you use BLAST-based ANI, the same issue occurs there). This is mainly due to the heuristics that are being followed (see the Methods section of the FastANI paper for more details.)
That said, we expect the difference two be almost negligible if you change the order of two genomes. You are right, we are taking mean for the --matrix
option as I could display only one value here.
Make sense, thank you for your answer!