All these variations may include signal shifts and potentially hide isotopic formations or very close compounds. Moreover, this effect is more likely when dealing with very complex mixtures.
To overcome this artificial shifting, we propose assemble peakbins of different widths. In this way, a set of close peaks on the axis across different spectra would be clustered into the same peakbin if their intensity levels are similar. Classical clustering approaches have already been used to tackle this problem [2,24,10,21]. Instead, our preprocessing pipeline uses the Pearson linear correlation coefficient to group the peaks, as the computation time and memory demands are much lower. Peakbins are scanned recursively, and their signal values are quantified as the maximum value found in the bin . The stopping criterion is met when there is no single peak or peakbin that shows a correlation value greater than a given threshold . Figure 1.5 details the assembling algorithm. The output of this final preprocessing stage is thus composed of a list of peakbins, each one with a starting and ending point on the axis, coupled with the maximum signal value within each spectrum.