Output only for 11 signatures
julia-aguade opened this issue · comments
Hi,
I'm using PTM-SEA to determine pathway enrichment from phospho-proteomics data. When I run my data I only get output results (plots and gct files) for 11 signatures, while when running the example I get data for 96 signatures. Is there some kind of filtering that determines from which signatures you get outputs or is there something wrong with my analysis? I am using the default parameters in the gui.R file
thank you
Hi,
Sorry for the slow response. There might be several reasons why your output only contains 11 signatures.
-
You can lower the number of phosphosites required to score a signature ( paramater
min.overlap
). For PTM-SEA we typically require a minimum of 5 sites to be detected in the data. -
Not all sites in your data can be mapped to sites in PTMsigDB. UniProt-centric site identifiers (e.g.
Q06609;Y315-p
) often cause problem with mapping sites, since UniProt accession numbers might get updated and residue numbers might change as well. We recommend using the flanking sequences as site identifiers (e.g.ETRICKIYDSPCLPE-p
). -
Limited depth of phoshoproteomic data. The likelihood of being able to score a signature in PTMsigDB increases with the number of sites in your input data. If your dataset only comprises a few thousand phosphosites you are likely only sampling the most abundant sites, but missing a lot of lower abundant sites.
I hope that helps.
Best,
K
Thanks Karsten.
I've been trying for some time and I could not find how to obtain my data in the right format (flanking sequences with +-7aa). I use ArtMS and have the flanking sequences that are not centered around the phosphorylated aminoacid, and without a specific number of on each side. For example:
AAALQALQAQAPT(ph)SPPPPPPPLKAEQEEEGLPLPLANIK
or
AAALQALQAQAPTSPPPPPPPLKAEQEEEGLPLPLANIK_
Which kind of analysis do you perform to obtain the site identifiers as flanking sequences that are compatible with PTM-SEA?
thank you for your help