AlexsLemonade / refinebio-examples

Example workflows for refine.bio data

Home Page:https://www.refine.bio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New Analysis Example: Finding Co expressed gene modules with CoGaps

cansavvy opened this issue · comments

What are the goals of this new example analysis?

People often want to find genes that are coexpressed together. WGCNA is often the default method people use and while this may be fine for some use cases, CoGaps is a slightly more sophisticated, albeit computing intensive method for similar questions. They also appear to have very nicely made vignettes and documentation -- a quality we look for in tools that we recommend to users and trainees.

What kind of dataset will this need?

CoGaps looks for latent spaces using Non negative matrix factorization. This means we want a dataset that is big enough (has enough genes and enough samples) to run this on, but not such a large dataset that this won't be able to run locally. This may take some trials.
The CoGaps example in the vignette has 9 samples and 1363 genes, so probably something at least that big and probably bigger is better.

What steps should be included in this analysis?

We can sort through the CoGaps vignette and determine what steps we find most useful after running the main CoGaps function. These aren't hard fast steps because I haven't run this myself yet, but these are more items that we should explore in these steps

  • This function has some parallel computing options that we will want to use so we try to run CoGaps in a timely manner that users can still do locally.

  • It's unclear to me at this point, but their vignette seems to suggest some fiddling with parameters may be needed or at least should be explored (another reason this should be an "advanced topic") so we should give some guidance about how to exploree and choose parameters.

  • We usually like to leave our users with some nice visuals. The CoGaps vignette has one plot on there but we may want to think about another more pub-ready visual that users might like to see (a heatmap or something better than that).

What packages/methods do you recommend using or looking into for this analysis?

CoGaps is installed from bioconductor. I haven't ran the full thing, but their documentation does warn it takes a good amount of computing time because of the non negative matrix factorization involved. We may need to have a RAM requirements warning/suggestion for users on this example -- another reason for it to be in the "advanced topics" section.