veg / tn93

TN93 fast distance calculator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No mean distance calculated between samples

AndreaAguadoM opened this issue · comments

Hello!
My name is Andrea and I am bioinformatician from Spain. I have been using tn93 for a while, including it in some of the pipelines I am developing in order to analyze HIV sequences more effectively. I am trying to generate a distance matrix with lots of HIV-samples, and analyzing my results, I found some of the sequences pairs do not seem to have assigned a distance (as a result of the mean distance, I obtain -nan). How can this be possible if the t parameter value adjusted in my pipeline is 1?

Thanks in advance!

Dear @AndreaAguadoM,

Can you please provide an example? nan will only arise if no comparisons were performed, i.e. something like this occurs (Actual comparisons performed = 0).

{
	"Actual comparisons performed" :0,
	"Comparisons accounting for copy numbers " :0,
	"Total comparisons possible" : 10,
	"Links found" : 0,
	"Maximum distance" : 0,
	"Sequences" : 5,
	"Mean distance" : nan
...

Make sure you specify the -L argument to compare sequences that overlap by fewer than the default 100 nucleotides as well (which is the case for the example above).

Best,
Sergei

Dear @AndreaAguadoM,

In default run mode, N means "match everything". Sequences that comprise N will match any character at that position (distance 0).

If you want to treat N differently, you should adjust the -a command line argument. For example -a average.

Best,
Sergei

Thank you so much! I've been noticing that when using this -a parameter adjustment (-a average), I obtain 1000 as resulting mean distance in some distance calculations. As far as I know, the Tamura-Nei distance has a range of values between 0 and 2. Why am I obtaining these results? Thanks in advance again!

Dear @AndreaAguadoM,

1000 is the upper bound that tn93 reports for all distances. Most genetic distances, including the TN93 distance, can range from 0 to ∞

It requires some serious data pathology, but it could occur. In fact, tn93 will "downgrade" to a K2P distance is the input data do not contain one of the four characters. That's because TN93 may become undefined in this case.

if (useK2P) {

Best,
Sergei

Okay. thank you very much! Your response has been very helpful
Best,
Andrea.