biodavidjm / artMS

Analytical R Tools for Mass Spectrometry

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New format of modifications from MaxQuant cause artmsProtein2SiteConversion to fail

bpolacco opened this issue · comments

I've been seeing new format for modifications from recent versions of MaxQuant that cause a failure of artmsProtein2SiteConversion. Instead of the short K(ph), MaxQuant is now using S(Phospho (STY)). I've been pre-converting these MaxQuant files using the function below. Note that this started as a converter for Spectronaut output which uses the similar S[Phospho (STY)] format--that's the reason for the [[(] character classes in the regular expression and the variable name specFormats and specModSequence.

convertModificationFormat <- function(specModSequence, mods=c("PH", "UB", "CAM", "MOX", "NAC")){
  result <- specModSequence
  specFormats <- list (PH='([STY])[[(]Phospho \\(STY\\)[])]',
                       UB='(K)[[(]GlyGly \\(K\\)[])]',
                       CAM = '([C])[[(]Carbamidomethyl \\(C\\)[])]',
                       MOX = '([M])[[(]Oxidation \\(M\\)[])]',
                       NAC =  '([A-Z_])[[(]Acetyl \\(Protein N-term\\)[])]')
  artmsFormats <- list (PH='\\1\\(ph\\)',
                        UB='\\1\\(gl\\)',
                        CAM = '\\1\\(cam\\)',
                        MOX = '\\1\\(ox\\)',
                        NAC = '\\1\\(ac\\)')
  stopifnot(names(specFormats)==names(artmsFormats))
  for (mod in mods){
    if (mod %in% names(specFormats)){
      result <- gsub(specFormats[[mod]], artmsFormats[[mod]], result)
    }else (stop("I don't know how to deal with requested mod: ", mod))
  }
  return (result)
}

I'm happy to put something like this in artMS as a pull request, but inserting this into artMS code will require a bit more restructuring of your code than I am comfortable doing without discussion on how you like to structure things. I'm thinking something like a checkpoint in artmsProtein2SiteConversion, and an attempt to convert on failure, followed by another checkpoint...
evidence_1000lines.txt

Thanks @bpolacco, indeed this is a pretty important issue. Let me get back to this very soon.

I tested and your code works well. Thank you very much. But I still cannot understand this bizarre change in MaxQuant. I am afraid this is a bug: just a wrong mapping. I bet they will correct it.

Hi @bpolacco i would like to confirm that this pre-conversion also takes into consideration that it seems as though the new [Phospho (STY)] format recognises the phosphorylated amino acid is on the right side of the label [Phospho (STY)]? I had not used the previous MaxQuant ph iteration which i understand recognised phosphorylation on the left hand side amino acid?

I have not seen the case where the modified STY is to the right of the (Phospho (STY)). I may have confused with my example above K(Phospho (STY)) using K instead of S, T or Y as the modified amino acid (edit: just now edited to avoid further confusion). That was a mistake on my part -- I was working with ubiquitination as well as phosphorylation when I wrote that and confused them. As written, S(Phospho (STY)) will be translated to S(ph). See the PH modifications in example evidence file I shared with my comment. If you are seeing examples where [Phospho (STY)]S should translate to S(ph) instead that is something new, and you should share here (along with MaxQuant version info that produced that file if you have it). Thanks!