New format of modifications from MaxQuant cause artmsProtein2SiteConversion to fail
bpolacco opened this issue · comments
I've been seeing new format for modifications from recent versions of MaxQuant that cause a failure of artmsProtein2SiteConversion
. Instead of the short K(ph)
, MaxQuant is now using S(Phospho (STY))
. I've been pre-converting these MaxQuant files using the function below. Note that this started as a converter for Spectronaut output which uses the similar S[Phospho (STY)]
format--that's the reason for the [[(]
character classes in the regular expression and the variable name specFormats
and specModSequence
.
convertModificationFormat <- function(specModSequence, mods=c("PH", "UB", "CAM", "MOX", "NAC")){
result <- specModSequence
specFormats <- list (PH='([STY])[[(]Phospho \\(STY\\)[])]',
UB='(K)[[(]GlyGly \\(K\\)[])]',
CAM = '([C])[[(]Carbamidomethyl \\(C\\)[])]',
MOX = '([M])[[(]Oxidation \\(M\\)[])]',
NAC = '([A-Z_])[[(]Acetyl \\(Protein N-term\\)[])]')
artmsFormats <- list (PH='\\1\\(ph\\)',
UB='\\1\\(gl\\)',
CAM = '\\1\\(cam\\)',
MOX = '\\1\\(ox\\)',
NAC = '\\1\\(ac\\)')
stopifnot(names(specFormats)==names(artmsFormats))
for (mod in mods){
if (mod %in% names(specFormats)){
result <- gsub(specFormats[[mod]], artmsFormats[[mod]], result)
}else (stop("I don't know how to deal with requested mod: ", mod))
}
return (result)
}
I'm happy to put something like this in artMS as a pull request, but inserting this into artMS code will require a bit more restructuring of your code than I am comfortable doing without discussion on how you like to structure things. I'm thinking something like a checkpoint in artmsProtein2SiteConversion, and an attempt to convert on failure, followed by another checkpoint...
evidence_1000lines.txt
Thanks @bpolacco, indeed this is a pretty important issue. Let me get back to this very soon.
I tested and your code works well. Thank you very much. But I still cannot understand this bizarre change in MaxQuant. I am afraid this is a bug: just a wrong mapping. I bet they will correct it.
Hi @bpolacco i would like to confirm that this pre-conversion also takes into consideration that it seems as though the new [Phospho (STY)] format recognises the phosphorylated amino acid is on the right side of the label [Phospho (STY)]? I had not used the previous MaxQuant ph iteration which i understand recognised phosphorylation on the left hand side amino acid?
I have not seen the case where the modified STY is to the right of the (Phospho (STY)). I may have confused with my example above K(Phospho (STY))
using K instead of S, T or Y as the modified amino acid (edit: just now edited to avoid further confusion). That was a mistake on my part -- I was working with ubiquitination as well as phosphorylation when I wrote that and confused them. As written, S(Phospho (STY))
will be translated to S(ph). See the PH modifications in example evidence file I shared with my comment. If you are seeing examples where [Phospho (STY)]S
should translate to S(ph)
instead that is something new, and you should share here (along with MaxQuant version info that produced that file if you have it). Thanks!