ctlab / fgsea

Fast Gene Set Enrichment Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

minSize and maxSize

ag1805x opened this issue · comments

What is the effect of minSize and maxSize in fgsea()? The documentation mentions "All pathways below/above the threshold are excluded". If the maxSize is set to 500 and one of the pathways has more than 500 genes, should I get an enrichment score for that pathway? I was trying to understand the change it brings but observed that the results stay the same.

library(fgsea)

data(examplePathways)
data(exampleRanks)
set.seed(42)

examplePathways <- examplePathways[lengths(examplePathways) > 500]

fgseaRes_15_100 <- fgsea(pathways = examplePathways, 
                  stats    = exampleRanks,
                  minSize  = 15,
                  maxSize  = 100)

fgseaRes_15_500 <- fgsea(pathways = examplePathways, 
                         stats    = exampleRanks,
                         minSize  = 15,
                         maxSize  = 500)

fgseaRes_15_1000 <- fgsea(pathways = examplePathways, 
                         stats    = exampleRanks,
                         minSize  = 15,
                         maxSize  = 1000)

fgseaRes_15_Inf <- fgsea(pathways = examplePathways, 
                          stats    = exampleRanks,
                          minSize  = 15,
                          maxSize  = Inf)

Only in the case of maxSize = 100, I get an empty data table. When maxSize = 1000, one term with 628 genes is missing.

@ag1805x Hi

The minSize and maxSize arguments control the size of the pathways that will be used for analysis. When analyzing pathways, the size is not simply the number of genes it contains, but rather the size of the intersection between the genes from the pathway and names(exampleRanks). You can verify that all values in the size column of fgsea results lie between minSize and maxSize.