Data preprocessing pipeline

The main objective of the preprocessing task in mass spectrometry (MS) data is to clean the data and detect the true signals in the noisy spectra. However, there is no standard preprocessing pipeline for MS data. Although a core set of preprocessing tasks have already been identified and accepted as a quasi-standard, pipelines do not all perform the same steps or tackle them necessarily in the same order. The most accepted dataflow core stages are: baseline removal or correction, inter-spectra normalization, signal noise reduction or smoothing, peak detection and, finally peak alignment. Other additional tasks could be outlier detection [18,15] and raw signal binning [15,14].

In the following pages, we present and discuss our proposal of a standard preprocessing pipeline. Notice that, after the preprocessing, our main aim is to obtain a set of relevant peaks by means of a feature subset selection approach. This must be taken into account when designing each preprocessing task in an attempt to obtain as unbiased a dataset as possible.