MEGA-GO / MegaGO

Calculate semantic distance for sets of Gene Ontology terms

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Interpretation problem when comparing identical high-level terms

tivdnbos opened this issue · comments

When identical high-level terms are compared, a low score is returned, e.g.:
GO:0030170 (pyridoxal phosphate binding) vs GO:0030170 gives 99% similarity
GO:0043167 (ion binding) vs GO:0043167 gives 55% similarity
GO:0003674 (molecular function) vs GO:0003674 gives 0% similarity

I also tested what happens if that term is multiple times in the list (e.g. 10x GO:0043167 vs 1x GO:0043167) but this gives the same result, 55% in this case

According to the authors, the simrel method is aimed at comparing gene products rather than functional profiles. Thus, generic terms are penalized: “Generic terms do not have a high relevance for the comparison of the exact function of different gene products.” In my opinion, this does not make sense for comparing profiles. The simrel method without the penalty becomes the simLin method.

I suggest to make a different branch where we test it with simLin. What do you think @rababerladuseladim @pverscha ?

I redid the analysis with the simLin metrik, to be found here: https://github.com/MEGA-GO/manuscript-data-analysis/tree/use_lin_metric
Sample clustering is not affected, the ranges for the similarity change a bit towards higher levels.