broadinstitute / ssGSEA2.0

Single sample Gene Set Enrichment analysis (ssGSEA) and PTM Enrichment Analysis (PTM-SEA)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Output only for 11 signatures

julia-aguade opened this issue · comments

Hi,

I'm using PTM-SEA to determine pathway enrichment from phospho-proteomics data. When I run my data I only get output results (plots and gct files) for 11 signatures, while when running the example I get data for 96 signatures. Is there some kind of filtering that determines from which signatures you get outputs or is there something wrong with my analysis? I am using the default parameters in the gui.R file

thank you

Hi,

Sorry for the slow response. There might be several reasons why your output only contains 11 signatures.

  1. You can lower the number of phosphosites required to score a signature ( paramater min.overlap). For PTM-SEA we typically require a minimum of 5 sites to be detected in the data.

  2. Not all sites in your data can be mapped to sites in PTMsigDB. UniProt-centric site identifiers (e.g. Q06609;Y315-p) often cause problem with mapping sites, since UniProt accession numbers might get updated and residue numbers might change as well. We recommend using the flanking sequences as site identifiers (e.g. ETRICKIYDSPCLPE-p).

  3. Limited depth of phoshoproteomic data. The likelihood of being able to score a signature in PTMsigDB increases with the number of sites in your input data. If your dataset only comprises a few thousand phosphosites you are likely only sampling the most abundant sites, but missing a lot of lower abundant sites.

I hope that helps.

Best,
K

Thanks Karsten.
I've been trying for some time and I could not find how to obtain my data in the right format (flanking sequences with +-7aa). I use ArtMS and have the flanking sequences that are not centered around the phosphorylated aminoacid, and without a specific number of on each side. For example:

AAALQALQAQAPT(ph)SPPPPPPPLKAEQEEEGLPLPLANIK
or
AAALQALQAQAPTSPPPPPPPLKAEQEEEGLPLPLANIK_

Which kind of analysis do you perform to obtain the site identifiers as flanking sequences that are compatible with PTM-SEA?

thank you for your help