Baseline removal

At the low range of the spectrum, the intensity values are always found to be amplified. This side effect is the consequence of chemical noise from the matrix compounds required to fix the biological sample. The amplification effect tends to lessen until the $m/z$ values increase [19].

To minimize this effect, the true signal must be estimated, and the difference between the observed and estimated signal should be removed. This can be viewed as a filtering step, in the sense that each spectrum is evaluated individually and transformed into a (partially) corrected spectrum. To our knowledge, there is no agreement within the scientific community on which is the best way to tackle this problem. The most popular techniques include: smoothing by local linear regression (loess) [2,24], multiple shifted windows with spline approximations [15,14] and a non-linear filter approach from the field of morphological mathematics: the top-hat operator [18,3], and its variations [13,11].

They all have the same aim, i.e. to flatten the signal by removing the estimated chemical noise. No significant difference has been reported in the literature, and there has been no systematic comparison of the different techniques. Therefore, we propose the use of the top-hat morphological operator [22], since it is the least time consuming and has proven its merits in the image analysis domain, where it is a filter in widespread use [26,9].

The top-hat filter is a nonlinear positive low pass operator. Also known as the white tophat, it removes the result of performing a morphological opening operation using a predefined structuring element from the input signal. For application to MS data, each spectrum is configured as a binary array of values, and the neighbourhood element (or mask) should also be a 1-dimensional array.