emmanuelparadis / ape

analysis of phylogenetics and evolution

Home Page:http://ape-package.ird.fr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

biplot.pcoa() function produces different projections for the same variable when it is scaled.

daroari opened this issue · comments

I am using the biplot.pcoa() function to project the original variables (2 nominal and 1 continuous variable) of my dataset onto a PCoA plot and I am getting different results for the projection of the continuous variable depending on whether I scale it or not. This looks suspicious to me, since the only thing I am doing is scaling the variable using the scale() function, and I have confirmed that the relationships between the observations in the distance matrix (the input of the pcoa() function) are preserved when I do this scaling.

I checked the source code of the biplot.pcoa() function and I noticed the covariance matrix changes when I scale the variable:
image

This causes a change in U as well, and consequently, in the size of the vector projected onto the plot.

Are projections expected to change when the scale of the variables change? Note that the relations (distances between observations) are preserved.

I would appreciate any help you could provide me for this.

Thank you!

Daniel

If you scale one variable and not the 2 others, it seems logical that the final output is altered.
How do you compute the distances input to ape::pcoa?

@emmanuelparadis The thing is that the other two variables are categorical (ordinal, specifically) so it would not make sense to scale them, even though they have an order (ranked).

I am calculating a distance matrix using the daisy() function in r and the metric I am using is Gower's metric, since it works relatively well with mixed data (continuous and categorical variables):

daisy(data, metric ='gower')

Then, I use that distance matrix (or distance object) as the input to ape::pcoa.

Indeed, you cannot scale both types of variables in the same way.
I think this is more whether you think scaling or not the continuous variable results in a "more accurate" distance. Sometimes it may be better to do some scaling, for instance if there is some form of allometry. If you're not sure, it's probably better to compare the outputs from different analyses: with/without scaling of the continuous variable, even maybe analyse the nominal variables separately. Since you don't have many variables, you'll have to handle the fact that the number of dimensions of the output will be affected.