Definition of anchor peaks for subset-based peakGroups alignment not correct
jorainer opened this issue · comments
The current implementation of the subset-based peakGroups alignment to define the anchor peaks currently does not correctly select the peaks based on the minFraction
parameter: minFraction
is referring to the total number of samples and not the total number of samples of the subset. Example:
library(xcms)
faahko_3_files <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
system.file('cdf/KO/ko16.CDF', package = "faahKO"),
system.file('cdf/KO/ko18.CDF', package = "faahKO"))
faahko_od <- readMSData(faahko_3_files, mode = "onDisk")
xod_x <- findChromPeaks(
faahko_od, param = CentWaveParam(noise = 10000, snthresh = 40,
prefilter = c(3, 10000)))
pdp <- PeakDensityParam(sampleGroups = rep(1, 3))
xod_xg <- groupChromPeaks(xod_x, param = pdp)
Using the first and last sample for subset and setting minFraction = 0.6
we would expect to get anchor peaks only if a peak was identified in both samples, but
p <- PeakGroupsParam(subset = c(1, 3), minFraction = 0.6)
adjustRtimePeakGroups(xod_xg, p)
ko15.CDF ko18.CDF
FT21 2679.783 NA
FT15 2678.218 2684.479
FT16 2679.783 2682.914
FT20 2679.783 2684.479
FT24 2681.348 2684.479
FT08 2783.070 NA
FT04 2786.200 2787.766
FT46 2786.200 2787.766
FT01 2787.765 2787.766
FT45 NA 2916.092
FT06 2923.916 2923.917
FT05 2994.338 2994.339
FT02 NA 3250.992
FT27 3276.030 3285.421
FT31 3316.719 3340.194
FT41 NA 3349.584
FT03 3379.317 3387.143
FT32 3384.012 3410.617
FT33 NA 3410.617
FT11 3413.746 NA
FT10 3415.311 3416.877
FT28 3454.435 NA
FT42 3476.344 3482.605
FT17 3491.994 NA
FT09 3496.688 NA
FT07 3496.688 3501.384
FT35 3512.338 3523.294
FT34 3515.468 3524.859
FT43 3576.501 3584.327
FT30 3590.586 NA
FT18 3610.930 3614.061
FT12 3618.755 3626.581
FT13 3618.755 3626.581
FT14 3657.879 3664.140
FT39 3662.574 3704.828
FT40 3662.574 3706.393
FT29 3682.918 3687.614
FT44 3703.262 3712.653
FT47 NA 3709.523
FT36 NA 3767.426
FT19 3819.069 3830.025
FT26 NA 3870.713
FT25 4047.552 4052.248
FT23 4099.196 NA
Thus, anchor peaks are defined even if there are no peaks detected in one of the two subset sample (NA
reported).
This can be nailed down to the internal .getPeakGroupsRtMatrix
function:
res <- xcms:::.getPeakGroupsRtMatrix(chromPeaks(xod_xg),
featureDefinitions(xod_xg)$peakidx,
sampleIndex = c(1, 3),
missingSample = 0,
extraPeaks = 1L)
res
CP011 CP199
[1,] 2679.783 NA
[2,] 2678.218 2684.479
[3,] 2679.783 2682.914
[4,] 2679.783 2684.479
[5,] 2681.348 2684.479
[6,] 2783.070 NA
[7,] 2786.200 2787.766
[8,] 2786.200 2787.766
[9,] 2787.765 2787.766
[10,] NA 2916.092
[11,] 2923.916 2923.917
[12,] 2994.338 2994.339
[13,] NA 3250.992
[14,] 3276.030 3285.421
[15,] 3316.719 3340.194
[16,] NA 3349.584
[17,] 3379.317 3387.143
[18,] 3384.012 3410.617
[19,] NA 3410.617
[20,] 3413.746 NA
[21,] 3415.311 3416.877
[22,] 3454.435 NA
[23,] 3476.344 3482.605
[24,] 3491.994 NA
[25,] 3496.688 NA
[26,] 3496.688 3501.384
[27,] 3512.338 3523.294
[28,] 3515.468 3524.859
[29,] 3576.501 3584.327
[30,] 3590.586 NA
[31,] 3610.930 3614.061
[32,] 3618.755 3626.581
[33,] 3618.755 3626.581
[34,] 3657.879 3664.140
[35,] 3662.574 3704.828
[36,] 3662.574 3706.393
[37,] 3682.918 3687.614
[38,] 3703.262 3712.653
[39,] NA 3709.523
[40,] NA 3767.426
[41,] 3819.069 3830.025
[42,] NA 3870.713
[43,] 4047.552 4052.248
[44,] 4099.196 NA
So, that function needs to be fixed to return only non-NA values for this setting.