do_findChromPeaks_centWave returns wrong maxo and int* values
Pascallio opened this issue · comments
Hi, I've noticed that do_findChromPeaks_centWave
returns an incorrect maxo
value and likely incorrect into
and intb
values. According to the documentation, maxo
represents the maximum value of the peak, but in the following example it returns almost double that value (120881.1 versus 61141.14).
# Load library
library(xcms)
# Select MRM mzML from msdata package
data <- readSRMData(system.file("proteomics", "MRM-standmix-5.mzML.gz", package = "msdata"))
# Select a chromatogram
chrom <- data@.Data[[5]]
# Set the variables needed for findChromPeaks
intensities <- chrom@intensity # Chromatogram intensities
mzs <- rep(1, length(intensities)) # Fixed MZ to eliminate ppm issues
rts <- chrom@rtime * 60 # Transform minutes to seconds
valsPerSpect <- table(rts) # 1 data point per rtime
# Run findChromPeaks with default values, should not matter for this example
xc <- as.data.frame(do_findChromPeaks_centWave(
mz = mzs,
int = intensities,
scantime = rts,
valsPerSpect = valsPerSpect
))
assertthat::are_equal(xc$maxo, max(intensities)) # FALSE
xc$maxo # 120888.1
max(intensities) # 61141.14
I've tried to narrow the issue down. The maxo variable is defined here:
xcms/R/do_findChromPeaks-functions.R
Line 568 in db80f2c
and into and intb are defined here:
xcms/R/do_findChromPeaks-functions.R
Lines 573 to 575 in db80f2c
Since both are depended on the pd
variable (which is a subset of the d
variable), this issue probably affects both the maxo
and int*
values. The d
variable represents the intensities of the found EIC:
xcms/R/do_findChromPeaks-functions.R
Lines 364 to 368 in db80f2c
I've confirmed that the issue occurs in the getEIC
function, but I am unable to narrow the issue further due to my lack of knowledge on C.
I've tried version 3.22 from Bioconductor and version 3.99.5 from GitHub and the issue exists in both versions.
Thanks,
Pascal
I dove a little bit deeper into the code and it's likely an indexing issue. I've extracted the values that go into getEIC
in order to extract the result from the function. Using the variables from the previous code gives us the following plot (black = original chromatogram, red = EIC chromatogram):
scanindex <- 1:length(rts) - 1
mzrange <- c(1, 1)
sr <- c(1, length(rts))
eic <- .Call("getEIC", mzs, intensities, as.integer(scanindex), as.double(mzrange),
as.integer(sr), as.integer(length(scanindex)),
PACKAGE = "xcms"
)
plot(intensities, type = "l", ylim = c(0, 2e5))
lines(eic$intensity, col = "red")
It seems that the getEIC
function adds the previous scan intensity to the current scan intensity. A one-liner using dplyr confirms this:
points(intensities + dplyr::lag(intensities, n = 1), col = "green")
I haven't found the solution yet, but it's either in the loop in getScanEIC
, or in the lowerBound
or upperBound
functions:
Lines 440 to 447 in db80f2c
Hi, thanks for the report, we should turn this into a unit test once we've figured exactly what's going on.
Yours, Steffeb
Thanks Steffen,
Interestingly, this issue does not occur in the S4 method findChromPeaks
. It specifically occurs in the do_*
version of the method.
# Load library
library(xcms)
# Select MRM mzML from msdata package
data <- readSRMData(system.file("proteomics", "MRM-standmix-5.mzML.gz", package = "msdata"))
# Use 0.1 - 1 due to RT being in minutes
data <- findChromPeaks(data[5, ], CentWaveParam(peakwidth = c(0.1, 1)))
chromPeaks(data)
Returns the top peak with the correct maxo of 61141.1367