Teichlab / SpatialDE

Test genes for Spatial Variation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Determining optimal C for AEH

dpcook opened this issue · comments

Just wondering if there's a good way to determine an optimal C for AEH, or perhaps a metric to assess the quality (may be a bad word for it) of a given pattern.

+1

I guess it is like choosing the dimensions of tSNE / MDS, or even PCA (but PCA has %explained_variance as ref). So far my quick solution is to choose K=5-7 and then run GO to see what are the biological meanings behind the "pattern".

It is a very hard problem, and in general something that would be an important (but challenging) problem for the scRNA-seq field in general. The GO enrichment strategy seems pretty reasonable, but you might miss novel things And there's something weird about how findings are typically only reported when they have significant GO categories; this will feed into newer GO annotations (since they are based on literature), then there is like a feedback loop.

I used to be excited about Dirichlet processes for this general problem: we investigated it in Lönnberg et al for number of pseudotime trajectories for example. And it was something we considered for a while with AEH. It didn't work well, and I've seen some papers describing that even when simulating from Dirichlet processes models, the same model cannot infer the correct number of clusters (unfortunately I can't find a reference right now.)

This is why I made C an explicit parameter in this implementation so that it is clear it's a choice made by the researcher.

I'll keep this issue open so people can discuss and suggest things. Maybe eventually we can come up with a great idea.

I guess if we can find a metric describing the diversity of the C groups, we can try different C's and find the elbow point as the optimal C... - Not sure but just a thought.