AlexsLemonade / refinebio-examples

Example workflows for refine.bio data

Home Page:https://www.refine.bio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Investigation/Discussion] Methods/packages for identifying co-expressed gene modules

cansavvy opened this issue · comments

Background

We were discussing other use cases for ORA for RNA-seq and co-expressed gene modules was something @jaclyn-taroni mentioned as an idea: #344 (comment)

This issue is about what co-expressed gene module finding methods we may want to consider.

Problem

What methods currently are available and recommended for finding coexpressed gene modules.

I know of WGCNA but there's probably other methods out there that are newer and maybe one that we should recommend to our users instead of WGCNA. This issue is about taking a look at what methods and packages like WGCNA there are out there.

What are the recommended next steps?

I'm going to do some investigating online and I will post what I find here for discussion, but if anyone has any leads or recommendations on methods/packages for identifying coexpressed genes, please post here and we can discuss!

Obviously a scientifically sound method is important, but for our contexts, R-compatibility, good documentation are also key components in this evaluation

After we determine what and if there's a method we should make a new analysis example for, I will create a new analysis issue that is more specific based on that method that we may determine here.

This review paper seems relevant: https://www.sciencedirect.com/science/article/abs/pii/S0010482519302574?via%3Dihub and compares WGCNA to THD-Module Extractor, DiffCoEx, and MODA.

Here's an idea that might be appealing to our users: https://github.com/hidelab/diffcoexp DiffCoExp tries to find groups differentially co expressed genes. It claims it works with RNA-seq but it also sounds like it uses similar methods to WGCNA, so we'd need to look into whether the methods it borrows from WGCNA are the same ones that are problematic/suboptimal for RNA-seq. (I don't know at this time).

Another contender I just found: petal is an R package that its authors describe as better suited for RNA-seq data specifically because it does not assume normality.
Petereit et al, 2016.pdf
Github: https://github.com/julipetal/petalNet

Edit: Looks like it hasn't been updated in ~5 years : (

Here's a brief summary of other ideas of come across:

Ideas that won't work for this instance (but are interesting):

  • General Singular Value Decomposition is a math technique people have used for co-expression analyses, and while there is a R function that can do this for us, there doesn't seem to be a standard way of doing this for RNA-seq (the examples I've found are microarray) and certainly not a lot of documentation we could point people to.

  • Cross-Conditions Cluster Detection (C3D) seems like it could be nice but it uses Matlab....

  • GCNA seemed interesting but documentation is negligible and they again, only used microarray data.

Ideas we are left with as of now:

Twitter recommended CoGaps and while I'm interested, it requires quite a bit of computing power because of the non negative matrix factorization, not to mention conceptually it's probably a lot for users to digest.

See #344: For ORA I think WGCNA is quicker and makes more sense to use.
However, I think CoGaps could be good for its own example which I'll make an issue for that we can act on at a later date.

This discussion issue has been split into actionable issues/PRs #348 and #353 so it can now be closed.