Metabolite fingerprinting: Detecting biological features by independent component analysis

M. Scholz, S. Gatzek, A. Sterling, O. Fiehn, J. Selbig

Research output: Contribution to journalArticle

183 Citations (Scopus)

Abstract

Motivation: Metabolite fingerprinting is a technology for providing information from spectra of total compositions of metabolites. Here, spectra acquisitions by microchip-based nanoflow-direct-infusion QTOF mass spectrometry, a simple and high throughput technique, is tested for its informative power. As a simple test case we are using Arabidopsis thaliana crosses. The question is how metabolite fingerprinting reflects the biological background. In many applications the classical principal component analysis (PCA) is used for detecting relevant information. Here a modern alternative is introduced - the independent component analysis (ICA). Due to its independence condition, ICA is more suitable for our questions than PCA. However, ICA has not been developed for a small number of high-dimensional samples, therefore a strategy is needed to overcome this limitation. Results: To apply ICA successfully it is essential first to reduce the high dimension of the dataset, by using PCA. The number of principal components determines the quality of ICA significantly, therefore we propose a criterion for estimating the optimal dimension automatically. The kurtosis measure is used to order the extracted components to our interest. Applied to our A. thaliana data, ICA detects three relevant factors, two biological and one technical, and clearly outperforms the PCA.

Original languageEnglish (US)
Pages (from-to)2447-2454
Number of pages8
JournalBioinformatics
Volume20
Issue number15
DOIs
StatePublished - Oct 12 2004
Externally publishedYes

Fingerprint

Fingerprinting
Independent component analysis
Independent Component Analysis
Metabolites
Principal Component Analysis
Principal component analysis
Arabidopsis
Arabidopsis Thaliana
Biological Factors
Mass Spectrometry
Kurtosis
Principal Components
Technology
High Throughput
Higher Dimensions
Mass spectrometry
High-dimensional
Throughput
Alternatives
Chemical analysis

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Metabolite fingerprinting : Detecting biological features by independent component analysis. / Scholz, M.; Gatzek, S.; Sterling, A.; Fiehn, O.; Selbig, J.

In: Bioinformatics, Vol. 20, No. 15, 12.10.2004, p. 2447-2454.

Research output: Contribution to journalArticle

Scholz, M, Gatzek, S, Sterling, A, Fiehn, O & Selbig, J 2004, 'Metabolite fingerprinting: Detecting biological features by independent component analysis', Bioinformatics, vol. 20, no. 15, pp. 2447-2454. https://doi.org/10.1093/bioinformatics/bth270
Scholz, M. ; Gatzek, S. ; Sterling, A. ; Fiehn, O. ; Selbig, J. / Metabolite fingerprinting : Detecting biological features by independent component analysis. In: Bioinformatics. 2004 ; Vol. 20, No. 15. pp. 2447-2454.
@article{5b461ac5b4e8473994e74e203ba328e1,
title = "Metabolite fingerprinting: Detecting biological features by independent component analysis",
abstract = "Motivation: Metabolite fingerprinting is a technology for providing information from spectra of total compositions of metabolites. Here, spectra acquisitions by microchip-based nanoflow-direct-infusion QTOF mass spectrometry, a simple and high throughput technique, is tested for its informative power. As a simple test case we are using Arabidopsis thaliana crosses. The question is how metabolite fingerprinting reflects the biological background. In many applications the classical principal component analysis (PCA) is used for detecting relevant information. Here a modern alternative is introduced - the independent component analysis (ICA). Due to its independence condition, ICA is more suitable for our questions than PCA. However, ICA has not been developed for a small number of high-dimensional samples, therefore a strategy is needed to overcome this limitation. Results: To apply ICA successfully it is essential first to reduce the high dimension of the dataset, by using PCA. The number of principal components determines the quality of ICA significantly, therefore we propose a criterion for estimating the optimal dimension automatically. The kurtosis measure is used to order the extracted components to our interest. Applied to our A. thaliana data, ICA detects three relevant factors, two biological and one technical, and clearly outperforms the PCA.",
author = "M. Scholz and S. Gatzek and A. Sterling and O. Fiehn and J. Selbig",
year = "2004",
month = "10",
day = "12",
doi = "10.1093/bioinformatics/bth270",
language = "English (US)",
volume = "20",
pages = "2447--2454",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "15",

}

TY - JOUR

T1 - Metabolite fingerprinting

T2 - Detecting biological features by independent component analysis

AU - Scholz, M.

AU - Gatzek, S.

AU - Sterling, A.

AU - Fiehn, O.

AU - Selbig, J.

PY - 2004/10/12

Y1 - 2004/10/12

N2 - Motivation: Metabolite fingerprinting is a technology for providing information from spectra of total compositions of metabolites. Here, spectra acquisitions by microchip-based nanoflow-direct-infusion QTOF mass spectrometry, a simple and high throughput technique, is tested for its informative power. As a simple test case we are using Arabidopsis thaliana crosses. The question is how metabolite fingerprinting reflects the biological background. In many applications the classical principal component analysis (PCA) is used for detecting relevant information. Here a modern alternative is introduced - the independent component analysis (ICA). Due to its independence condition, ICA is more suitable for our questions than PCA. However, ICA has not been developed for a small number of high-dimensional samples, therefore a strategy is needed to overcome this limitation. Results: To apply ICA successfully it is essential first to reduce the high dimension of the dataset, by using PCA. The number of principal components determines the quality of ICA significantly, therefore we propose a criterion for estimating the optimal dimension automatically. The kurtosis measure is used to order the extracted components to our interest. Applied to our A. thaliana data, ICA detects three relevant factors, two biological and one technical, and clearly outperforms the PCA.

AB - Motivation: Metabolite fingerprinting is a technology for providing information from spectra of total compositions of metabolites. Here, spectra acquisitions by microchip-based nanoflow-direct-infusion QTOF mass spectrometry, a simple and high throughput technique, is tested for its informative power. As a simple test case we are using Arabidopsis thaliana crosses. The question is how metabolite fingerprinting reflects the biological background. In many applications the classical principal component analysis (PCA) is used for detecting relevant information. Here a modern alternative is introduced - the independent component analysis (ICA). Due to its independence condition, ICA is more suitable for our questions than PCA. However, ICA has not been developed for a small number of high-dimensional samples, therefore a strategy is needed to overcome this limitation. Results: To apply ICA successfully it is essential first to reduce the high dimension of the dataset, by using PCA. The number of principal components determines the quality of ICA significantly, therefore we propose a criterion for estimating the optimal dimension automatically. The kurtosis measure is used to order the extracted components to our interest. Applied to our A. thaliana data, ICA detects three relevant factors, two biological and one technical, and clearly outperforms the PCA.

UR - http://www.scopus.com/inward/record.url?scp=4644240926&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4644240926&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bth270

DO - 10.1093/bioinformatics/bth270

M3 - Article

C2 - 15087312

AN - SCOPUS:4644240926

VL - 20

SP - 2447

EP - 2454

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 15

ER -