Therefore, our aim at this stage is to prescreen a great many peaks that could later be grouped into peakbins. Having in mind that the machine learning analysis to be applied afterwards will separate relevant from non-relevant peaks, the impact of including some artifacts at this early stage is not so crucial. The peak detection algorithm is thus individually applied to each separate spectrum, and then, a list of candidate peaks is retrieved for each spectrum.
Eventhough it is by far the hottest issue in the MS preprocessing field, there is agreement only on three conditions that a candidate peak must meet [2]:
We will take the peak detection algorithm proposed in [13] as the starting point. Our algorithm will follow the same top-down scheme, starting with the highest point of the overall signal and iteratively evaluating the lower points. To see whether a point is considered a peak we set a stricter criterion: there must exist a point
(respectively,
) on its left (respectively, right) before the previous (next) peak. This point must satisfy two conditions. First, the value of the candidate point
must be higher than a sensitivity threshold
and, second, the candidate point
must have an SNR higher than or equal to 3 within the intensity window framed by
and
.
To estimate the SNR of a signal window, our algorithm computes the SNR value as the ratio between the point's height and the median absolute deviation (MAD) in the window under consideration [20]. The criterion that the SNR must be higher than or equal to a value of three is borrowed from the image analysis field and has been widely used in microarray quality metrics [4].
The main advantage of this peak detection algorithm is that it takes into account all the individual characteristics rather than the evaluation of an average spectrum that could hide independent features [5]. In addition, the spectra maintain their original shape obviating the need for a shifting or alignment process. On the downside, the computation time increases linearly with the number of spectra since all spectra are investigated.