ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Values chosen in the --matrix output ?

nigiord opened this issue · comments

Hi there,

While doing an all-to-all ANI comparison on a set of genomes, I noticed that the regular output displays different values when the genomes are switched:

1509405_PRJNA252589.fasta.gz  246200_PRJNA281.fasta.gz      76.3461  98    1679
246200_PRJNA281.fasta.gz      1509405_PRJNA252589.fasta.gz  76.9103  84    1369

When using the --matrix option there is only a single value for this pair, which is 76.628181 (looks like the mean).

I thus have two questions:

  • Why do the ANI values change when the genome are switched?
  • Is there a particular reason to use the mean of the two values in the --matrix output?

Cheers,
Nils

Following 👁

@nigiord , the basic pipeline that we follow to estimate ANI lacks symmetry, (e.g., if you use BLAST-based ANI, the same issue occurs there). This is mainly due to the heuristics that are being followed (see the Methods section of the FastANI paper for more details.)
That said, we expect the difference two be almost negligible if you change the order of two genomes. You are right, we are taking mean for the --matrix option as I could display only one value here.

Make sense, thank you for your answer!