[Investigation/Discussion] Methods/packages for identifying co-expressed gene modules

Question

[Investigation/Discussion] Methods/packages for identifying co-expressed gene modules

cansavvy opened this issue 4 years ago · comments

Background

We were discussing other use cases for ORA for RNA-seq and co-expressed gene modules was something @jaclyn-taroni mentioned as an idea: #344 (comment)

This issue is about what co-expressed gene module finding methods we may want to consider.

Problem

What methods currently are available and recommended for finding coexpressed gene modules.

I know of WGCNA but there's probably other methods out there that are newer and maybe one that we should recommend to our users instead of WGCNA. This issue is about taking a look at what methods and packages like WGCNA there are out there.

What are the recommended next steps?

I'm going to do some investigating online and I will post what I find here for discussion, but if anyone has any leads or recommendations on methods/packages for identifying coexpressed genes, please post here and we can discuss!

Obviously a scientifically sound method is important, but for our contexts, R-compatibility, good documentation are also key components in this evaluation

After we determine what and if there's a method we should make a new analysis example for, I will create a new analysis issue that is more specific based on that method that we may determine here.

Candace Savonen · Answer 1 · Thu Nov 05 2020 01:46:05 GMT+0800 (China Standard Time)

This review paper seems relevant: https://www.sciencedirect.com/science/article/abs/pii/S0010482519302574?via%3Dihub and compares WGCNA to THD-Module Extractor, DiffCoEx, and MODA.

Candace Savonen · Answer 2 · Thu Nov 05 2020 04:22:24 GMT+0800 (China Standard Time)

Here's an idea that might be appealing to our users: https://github.com/hidelab/diffcoexp DiffCoExp tries to find groups differentially co expressed genes. It claims it works with RNA-seq but it also sounds like it uses similar methods to WGCNA, so we'd need to look into whether the methods it borrows from WGCNA are the same ones that are problematic/suboptimal for RNA-seq. (I don't know at this time).

Candace Savonen · Answer 3 · Thu Nov 05 2020 21:52:34 GMT+0800 (China Standard Time)

Another contender I just found: petal is an R package that its authors describe as better suited for RNA-seq data specifically because it does not assume normality.
Petereit et al, 2016.pdf
Github: https://github.com/julipetal/petalNet

Edit: Looks like it hasn't been updated in ~5 years : (

Candace Savonen · Answer 4 · Fri Nov 06 2020 03:12:43 GMT+0800 (China Standard Time)

Here's a brief summary of other ideas of come across:

Ideas that won't work for this instance (but are interesting):

General Singular Value Decomposition is a math technique people have used for co-expression analyses, and while there is a R function that can do this for us, there doesn't seem to be a standard way of doing this for RNA-seq (the examples I've found are microarray) and certainly not a lot of documentation we could point people to.
- Alter et al, 2003.pdf
- Schreiber et al, 2008.pdf
Cross-Conditions Cluster Detection (C3D) seems like it could be nice but it uses Matlab....
- Xiao et al, 2014.pdf
GCNA seemed interesting but documentation is negligible and they again, only used microarray data.
- Wang2019_Article_GeneralizedGeneCo-expressionAn.pdf

Ideas we are left with as of now:

k-means (or some other kind of clustering strategy) is what some use for finding gene clusters: https://2-bitbio.com/2017/10/clustering-rnaseq-data-using-k-means.html I don't know how standard this is.
There's still WGCNA (diffcoexp which ultimately uses WGCNA).

Candace Savonen · Answer 5 · Fri Nov 06 2020 03:52:58 GMT+0800 (China Standard Time)

Twitter recommended CoGaps and while I'm interested, it requires quite a bit of computing power because of the non negative matrix factorization, not to mention conceptually it's probably a lot for users to digest.

Candace Savonen · Answer 6 · Sat Nov 07 2020 02:28:24 GMT+0800 (China Standard Time)

See #344: For ORA I think WGCNA is quicker and makes more sense to use.
However, I think CoGaps could be good for its own example which I'll make an issue for that we can act on at a later date.

Candace Savonen · Answer 7 · Mon Nov 16 2020 21:01:50 GMT+0800 (China Standard Time)

This discussion issue has been split into actionable issues/PRs #348 and #353 so it can now be closed.