gseKEGG function running indefinitely with a gene list of over 3000 genes
Sophia409 opened this issue · comments
Hello,
I am encountering an issue while using the gseKEGG function from the clusterProfiler package for GSEA enrichment analysis. I have provided a gene list containing just over 3000 genes, but the function has been running for two hours without completing. I manually stopped the process and attempted to modify the function parameters, but after several tries, the function still hangs.
Here are the details of my setup:
clusterProfiler version: 4.12.0 (latest version)
I would appreciate any insights into why this might be happening and how I can resolve this issue.
Thank you for your help!
Best regards,
> genelist <- genelist[names(genelist) %in% entrez[,1]]
> names(genelist) <- entrez[match(names(genelist),entrez[,1]),2]
> genelist <- sort(genelist, decreasing = T) #按log2FC高低排序
> length(genelist)
[1] 3786
> head(genelist)
20304 20306 20296 16175 14825 117167
1112.53 1059.17 1018.66 651.99 608.66 603.55
> #2)基于KEGG基因集的GSEA富集
> set.seed(123)
> KEGG_ges <- gseKEGG(
+ geneList = genelist,
+ organism = "mmu",
+ minGSSize = 10,
+ maxGSSize = 500,
+ pvalueCutoff = 0.05,
+ pAdjustMethod = "BH",
+ verbose = FALSE,
+ eps = 0)
Reading KEGG annotation online: "https://rest.kegg.jp/link/mmu/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/mmu"...
警告信息:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, :
There are ties in the preranked stats (80.9% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.
2: In fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize, :
There were 1 pathways for which P-values were not calculated properly due to unbalanced (positive and negative) gene-level statistic values. For such pathways pval, padj, NES, log2err are set to NA. You can try to increase the value of the argument nPermSimple (for example set it nPermSimple = 10000)
>
> KEGG_ges <- gseKEGG(
+ geneList = genelist,
+ organism = "mmu")
preparing geneSet collections...
GSEA analysis...
警告信息:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, :
There are ties in the preranked stats (80.9% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.
2: In fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize, :
There were 1 pathways for which P-values were not calculated properly due to unbalanced (positive and negative) gene-level statistic values. For such pathways pval, padj, NES, log2err are set to NA. You can try to increase the value of the argument nPermSimple (for example set it nPermSimple = 10000)
> KEGG_ges <- gseKEGG(
+ geneList = genelist,
+ organism = "mmu",
+ nPermSimple = 10000)
preparing geneSet collections...
GSEA analysis...
Did you carefully read the messages that were returned?
This is the key remark:
There are ties in the preranked stats (80.9% of the list).
In other words, 81% of your input data has an identical ranking metric! Why? This cannot be correct...
Anyway, this results in behavior reported before:
ctlab/fgsea#151