-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add new Ion-mobility peak picking algorithm #647
base: devel
Are you sure you want to change the base?
Changes from 2 commits
5147dc7
92eeec1
857f681
216210f
56cc96c
ce8daa4
f8f4826
120627a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2220,9 +2220,220 @@ do_findPeaks_MSW <- function(mz, int, snthresh = 3, | |
peaklist | ||
} | ||
|
||
############################################################ | ||
## Ion-mobility peak picking | ||
## | ||
#' @title Core API for Centwave-based ion-mobility peak picking | ||
#' @name do_findChromPeaks_IM_centWave | ||
#' | ||
#' @description Performs an extension of CentWave peak-picking on LC-IM-MS MS1 | ||
#' data. First it joins all scans into frames and performs .centWave_orig on | ||
#' the summarized LC-MS-like data. From each peak, it calculates its mobilogram and | ||
#' performs a second peak-picking on the IM dimension, resolving the peaks. | ||
#' | ||
#' @inheritParams do_findChromPeaks_centWave | ||
#' @inheritParams findChromPeaks-centWaveIonMobility | ||
#' | ||
#' @return A matrix, each row representing an identified peak, with columns: | ||
#' \describe{ | ||
#' \item{mz}{m/z value of the peak at the apex position.} | ||
#' \item{mzmin}{Minimum m/z of the peak.} | ||
#' \item{mzmax}{Maximum m/z of the peak.} | ||
#' \item{rt}{Retention time value of the peak at the apex position.} | ||
#' \item{rtmin}{Minimum retention time of the peak.} | ||
#' \item{rtmax}{Maximum retention time of the peak.} | ||
#' \item{im}{Ion mobility value of the peak at the apex position.} | ||
#' \item{immin}{Minimum ion mobility value of the peak.} | ||
#' \item{immax}{Maximum ion mobility value of the peak.} | ||
#' \item{maxo}{Maximum intensity of the peak.} | ||
#' \item{into}{Integrated (original) intensity of the peak.} | ||
#' \item{intb}{Always \code{NA}.} | ||
#' \item{sn}{Always \code{NA}} | ||
#' } | ||
#' | ||
#' @family core peak detection functions | ||
#' | ||
#' @author Roger Gine, Johannes Rainer | ||
#' | ||
#' @importFrom Spectra peaksData rtime combineSpectra mz | ||
do_findChromPeaks_IM_centWave <- function(spec, | ||
ppm = 25, | ||
peakwidth = c(20, 50), | ||
snthresh = 10, | ||
prefilter = c(3, 100), | ||
mzCenterFun = "wMean", | ||
integrate = 1, | ||
mzdiff = -0.001, | ||
fitgauss = FALSE, | ||
noise = 0, | ||
verboseColumns = FALSE, | ||
roiList = list(), | ||
firstBaselineCheck = TRUE, | ||
roiScales = NULL, | ||
sleep = 0, | ||
extendLengthMSW = FALSE, | ||
ppmMerging = 10, | ||
binWidthIM = 0.01 | ||
){ | ||
## Extract frame information | ||
pdata <- peaksData(spec, columns = c("mz", "intensity")) | ||
rt <- rtime(spec) | ||
im <- spec$inv_ion_mobility | ||
|
||
|
||
## Merging frames into scans and Summarize across IM dimension | ||
message("Collapsing data over IM dimension... ", appendLF = F) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just a comment: I would avoid using these |
||
|
||
scans_summarized <- | ||
combineSpectra( | ||
spec, | ||
f = as.factor(spec$frameId), | ||
intensityFun = base::sum, | ||
weighted = TRUE, | ||
ppm = ppmMerging | ||
) | ||
message("OK") | ||
|
||
## Peak-picking on summarized data | ||
mzs <- mz(scans_summarized) | ||
valsPerSpect <- lengths(mzs, FALSE) | ||
mz <- unlist(mzs, use.names = FALSE) | ||
int <- unlist(intensity(scans_summarized), use.names = FALSE) | ||
scantime <- sort(unique(rt)) | ||
peaks <- .centWave_orig(mz = mz, int = int, scantime = scantime, | ||
valsPerSpect = valsPerSpect, ppm = ppm, peakwidth = peakwidth, | ||
snthresh = snthresh, prefilter = prefilter, | ||
mzCenterFun = mzCenterFun, integrate = integrate, | ||
mzdiff = mzdiff, fitgauss = fitgauss, noise = noise, | ||
verboseColumns = verboseColumns, roiList = roiList, | ||
firstBaselineCheck = firstBaselineCheck, | ||
roiScales = roiScales, sleep = sleep, | ||
extendLengthMSW = extendLengthMSW) | ||
|
||
## Resolving peaks across IM dimension | ||
message("Resolving peaks over ion-mobility dimension... ", appendLF = F) | ||
resolved_peaks <- vector("list", nrow(peaks)) | ||
for (i in seq_len(nrow(peaks))) { | ||
current_peak <- peaks[i,] | ||
mobilogram <- .extract_mobilogram(pdata, current_peak, rt, im, binWidthIM) | ||
if (length(mobilogram) == 0) { | ||
warning(i, " mobilogram is empty") | ||
next | ||
} | ||
bounds <- .split_mobilogram(mobilogram) | ||
new_peaks <- data.frame( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think using |
||
mz = current_peak["mz"], | ||
mzmin = current_peak["mzmin"], | ||
mzmax = current_peak["mzmax"], | ||
rt = current_peak["rt"], | ||
rtmin = current_peak["rtmin"], | ||
rtmax = current_peak["rtmax"], | ||
im = vapply(bounds, mean, numeric(1)), | ||
immin = vapply(bounds, min, numeric(1)), | ||
immax = vapply(bounds, max, numeric(1)), | ||
row.names = NULL | ||
) | ||
resolved_peaks[[i]] <- new_peaks | ||
} | ||
resolved_peaks <- do.call(rbind, resolved_peaks) | ||
message("OK") | ||
|
||
## Refine and calculate peak parameters | ||
vals <- vector("list", nrow(resolved_peaks)) | ||
for (i in seq(nrow(resolved_peaks))) { | ||
peak <- unlist(resolved_peaks[i, , drop = TRUE]) | ||
|
||
## Create a EIC for mz, rt and IM ranges | ||
eic <- .extract_EIC_IM(peak, pdata, rt, im) | ||
|
||
if (nrow(eic) == 0 | all(eic[, 2] == 0)) | ||
next | ||
|
||
## Refine RT bounds | ||
rts <- c(peak["rtmin"], peak["rtmax"]) | ||
apx <- which.max(eic[, 2]) | ||
apx_rt <- eic[apx, 1] | ||
range <- xcms:::descendMin(eic[, 2], apx) | ||
|
||
eic <- eic[range[1]:range[2], , drop = FALSE] | ||
|
||
## Calculate peak stats | ||
vals[[i]] <- data.frame( | ||
mz = peak["mz"], | ||
mzmin = peak["mzmin"], | ||
mzmax = peak["mzmax"], | ||
rt = apx_rt, | ||
rtmin = min(eic[, 1]), | ||
rtmax = max(eic[, 1]), | ||
im = peak["im"], | ||
immin = peak["immin"], | ||
immax = peak["immax"], | ||
maxo = max(eic[, 2]), | ||
into = sum(eic[, 2]), | ||
intb = NA, | ||
sn = NA | ||
) | ||
} | ||
resolved_peaks <- do.call(rbind, vals) | ||
resolved_peaks <- | ||
resolved_peaks[resolved_peaks$into > 0, ] #Remove empty peaks | ||
|
||
as.matrix(resolved_peaks) | ||
} | ||
|
||
#' @importFrom MsCoreUtils bin | ||
.extract_mobilogram <- function(pdata, peak, rt, im, binWidthIM = 0.01){ | ||
rtr <- c(peak["rtmin"], peak["rtmax"]) | ||
mzr <- c(peak["mzmin"], peak["mzmax"]) | ||
keep <- dplyr::between(rt, rtr[1], rtr[2]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also tried it, but the performance was much better for
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wow - indeed. note that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, I got it the other way around with the I like the fact that the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I think it would be best to start using the Regarding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems good! I'll revert back to |
||
if (length(keep) == 0) return() | ||
ims <- im[keep] | ||
ints <- vapply(pdata[keep], xcms:::.aggregate_intensities, | ||
mzr = mzr, INTFUN = sum, na.rm = TRUE, numeric(1)) | ||
if(all(ints == 0)) return() | ||
mob <- MsCoreUtils::bin(x = ints, y = ims, size = binWidthIM, FUN = sum) | ||
mob | ||
} | ||
|
||
|
||
#' @importFrom MassSpecWavelet peakDetectionCWT | ||
.split_mobilogram <- function(mob){ | ||
if(length(mob$x) == 0){return()} | ||
vec <- mob$x | ||
#Add some padding, which will be removed after | ||
padding_size <- 5 | ||
vec <- c(rep(0, padding_size), vec, rep(0, padding_size)) | ||
pks <- MassSpecWavelet::peakDetectionCWT(vec, scales = c(1:7)) | ||
left <- sapply(pks$majorPeakInfo$peakCenterIndex - pks$majorPeakInfo$peakScale, function(x) max(1, x)) | ||
right <- sapply(pks$majorPeakInfo$peakCenterIndex + pks$majorPeakInfo$peakScale, function(x) min(x, length(vec) - padding_size)) | ||
limits <- list() | ||
for (i in seq_along(pks$majorPeakInfo$peakCenterIndex)){ | ||
ranges <- xcms:::descendMinTol(vec, startpos = c(left[i], right[i]), maxDescOutlier = 1) - padding_size | ||
ranges[1] <- min(max(1, ranges[1]), length(mob$mids)) | ||
ranges[2] <- min(ranges[2], length(mob$mids)) | ||
limits[[i]] <- mob$mids[ranges] | ||
} | ||
limits <- limits[vapply(limits, function(x){!any(is.na(x))}, logical(1))] | ||
return(limits) | ||
} | ||
|
||
|
||
|
||
#' @importFrom dplyr between | ||
.extract_EIC_IM <- function(peak, pdata, rt, im){ | ||
rtr <- c(peak["rtmin"], peak["rtmax"]) | ||
mzr <- c(peak["mzmin"], peak["mzmax"]) | ||
imr <- c(peak["immin"], peak["immax"]) | ||
|
||
keep <- dplyr::between(rt, rtr[1], rtr[2]) & dplyr::between(im, imr[1], imr[2]) | ||
rts <- rt[keep] | ||
ints <- vapply(pdata[keep], xcms:::.aggregate_intensities, | ||
mzr = mzr, INTFUN = sum, na.rm = TRUE, numeric(1)) | ||
ints <- vapply(unique(rts), function(x){sum(ints[rts == x])}, numeric(1)) | ||
|
||
cbind(unique(rts), ints) | ||
} | ||
|
||
|
||
############################################################ | ||
## MS1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other ways/algorithms to perform peak detection on IM data that do not first collapse the data and then expand it again like you do here?
If not I would suggest to split the functionality into 3 functions:
peaksData
by framepeaksData
and the detected chrom peaks matrix from 2) as input and post processes the data.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, yes, there are other ways to perform peak-picking that directly use the "full" data without collapsing (for instance, you could do a 2D-CWT where you change the scales in both RT and IM dimensions and find local maxima; or any peakpicking algorithm such as those used for GCxGC-MS, where you have a similar situation). I haven't looked deeply into such methods, but we should accomodate for them too, just in case
Still, splitting the functionality in
do_findChromPeaks_IM_centWave
seems good, since those functions would be reusable for other algorithms, etc. Specifically, if you agree, I'll do the following:.mse_find_chrom_peaks_sample
with the IM-collapsedSpectra
object and the paramas(param, CentWaveParam")
(since itIMCentWaveParam
inherits from it) -> That function will take care of extracting the peaksData, rt, valsPerSpect, etc., rundo_findChromPeaks_centWave
, and return the peak matrix.Sounds good? 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for my late reply!
splitting functionality is always good, as you say, enables reuse - and makes the code easier to read. so, yes, it sounds good.
so, if I understand:
param
inheritsIMCentWaveParam
collapse thepeaksData
by frame.do_findChromPeaks_centWave
for peak detection (sinceIMCentWaveParam
inheritsCentWaveParam
)param
inheritsIMCentWaveParam
post process the detected peaks (resolve across IM dimension) and return results.if the functions get to large, you could also consider implementing a
.im_mse_find_chrom_peaks_sample
that is called instead of.mse_find_chrom_peaks_sample
ifparam
inherits from anIM
param object... not sure if that would simplify integration of additional/other IM peak detection methods.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I've encapsulated all this into
do_findChromPeaks_IM_centWave
, so it's called after dispatching the function corresponding to theparam
:It's basically the all steps you mentioned (collapse, CentWave and resolve), but called from a "lower" function call level, so everything upstream is more tidy. What I like about this is that all Centwave-specific checks and data extraction (mz, int, valsPerSpect, etc.) are handled by
.mse_find_chrom_peaks_sample
, so we are reusing already-existing code.I'll commit the proposed refactor so you can take a closer look by yourself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent. Let me know when you're OK/ready from your side. I would then like to try the code in action and tinker myself a bit to see if/where we could improve/optimize.
For that I would create yet another branch to play with the code and ask for your feedback on the merge.
Related to that: could you please provide a short code snipped with the example how to perform the analysis (I guess I got already a file from you on which I can test...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely! If you agree, perhaps we should use (for testing) the
MsBackendMemory
orMsBackendDataFrame
to create a manageable toy example from the current read-onlyMsBackendTimsTof
(for instance, subsetting the RT)I'm still figuring out how to properly centroid the data (the format has some problems that makes the current
peakPicks
function inSpectra
ineffective, see below), but still, the analysis would go somewhat like this:You can use the TIMS-TOF data file I sent you a while back, just bear in mind it's in a zero-less profile mode (working on fixing that) and the chromatographic peaks are usually very short (<6-10s)