Baseline Correction

Home

Preprocessing

Baseline Correction

Index

Statistical Background

Baseline Correction

How to perform this in ImageLab? Click the ImageLab logo to find the corresponding instructions.

Real measured spectra in most cases (and especially in imaging applications) are composed of actual spectra and artifacts, coming about by a combination of various effects. For example, a spectrum can be shifted by a constant value (e.g. in the case of using optical fibers), or else it can be underlaid with a wavelength-dependent function (dependent, for example, on scattering effects), or alternatively a very weak spectral signal can be combined with a very large fundamental signal, such as typically occurs with Raman spectra. Here the Raman signal is smaller by about a factor of 1:10⁶ than the basis signal, making the Raman signal barely visible in the raw data.

In any case, the baseline needs to be corrected or eliminated as well as possible, so that in the end there is a “pure” spectrum. The correction of baselines is dependent on boundary conditions in many cases and must each be carried out with adjusted parameters. Additionally, the baseline problem in imaging applications is complicated by the fact that structure boundaries can lead to serious scattering effects if the structure size lies in the range of the wavelength of the radiation (Mie scattering). This is a big problem especially in infrared microscopy if, for example, particle sizes are analyzed in the range of tens of microns. Here there is a strong wavelength-dependent scattering that varies from pixel to pixel.

Essentially, there are several baseline correction methods that each have different advantages and disadvantages:

Polynomial adaptation
Penalized Splines
Algorithm according to Lieber
Algorithm according to Eilers
Use of the first or second derivative

The first two methods require manually selecting bases that denote those places in the spectrum where no band occurs. Although this is basically a simple solution, it has the disadvantage of the baseline shifting when unexpected spectral peaks occur in a spectrum at the bases. In this case the baseline correction can lead to a massive disturbance of the spectra. Basically, one should determine the bases from the mean values of the neighboring data points in order to the minimize the influence of the noise on the bases.

Comparison of polynomial adaptation and penalized splines. The bases are the same for both methods. In the polynomial adaptation the broad band between 3000 and 3500 cm^-1 gets adjusted too much.

Much simpler and less error-prone is the method according to Lieber [Lieber 2003], which iteratively adjusts an n-th degree polynomial (typically n = 2 to 6) by truncating the intensities lying above the estimated baseline and replacing them with the respective estimated value. Repeating this process many times shifts the estimated value of the baseline ever farther down until after about 10 to 20 iterations a stable baseline emerges. The disadvantage of this procedure is that the form of the baseline is given by the degree of the polynomial – this in many cases results in an insufficient or too strong adaptation.

The method according to Eilers [Eilers 2005] yields the best results in most cases, which determines the baseline via an asymmetrical least squares adaptation. For this purpose a polynomial is adapted to the spectral data by means of regression, so that values above the estimated baseline enter into the regression calculation with a lower weight than values below. Again the procedure is repeated several times until the estimated baseline no longer shifts. The adaptation is more "natural", however two parameters have to be set for this procedure, which require considerable variability of the adaptation (and thus also uncertainty as to whether the baseline correction is actually functioning optimally).

A basic problem that specifically concerns the methods according to Lieber and Eilers is the occurrence of negative peaks, which for example arise in IR spectroscopy through bands of CO₂ in the air. These negative peaks have to be explicitly excluded in the baseline calculation, or the CO₂bands have to be removed before the baseline correction (most easily through linear interpolation between the CO₂peak limits).

Effect of a negative CO₂peaks on the baseline correction.
Upper left the baseline calculated with the Eilers method including the CO₂peaks, upper right the baseline resulting from ignoring the CO₂peak. Below the corrected spectrum is shown. Without excluding the CO2 signal the corrected spectrum (bottom left) shows unacceptable disturbances around the CO₂peak.

Another possibility to deal with the baseline problem is to go around the problem by using the first respectively second derivative of the spectrum. In many applications this approach contributes to a viable solution, however the differentiation worsens the signal-to-noise ratio, so that further evaluation is made more difficult. In any case the signal should be smoothed when using derivatives before differentiating. Savitzky and Golay developed a method to calculate a smoothed first or second derivative in one step.