Application of metabolomics to plant genotype discrimination using statistics and machine learning

Janet Taylor, Ross D. King, Thomas Altmann, Oliver Fiehn

Research output: Contribution to journalArticle

163 Citations (Scopus)

Abstract

Motivation: Metabolomics is a post genomic technology which seeks to provide a comprehensive profile of all the metabolites present in a biological sample. This complements the mRNA profiles provided by microarrays, and the protein profiles provided by proteomics. To test the power of metabolome analysis we selected the problem of discrimating between related genotypes of Arabidopsis. Specifically, the problem tackled was to discrimate between two background genotypes (Co10 and C24) and, more significantly, the offspring produced by the cross-breeding of these two lines, the progeny (whose genotypes would differ only in their maternally inherited mitichondia and chloroplasts). Overview: A gas chromotography - mass spectrometry (GCMS) profiling protocol was used to identify 433 metabolites in the samples. The metabolomic profiles were compared using descriptive statistics which indicated that key primary metabolites vary more than other metabolites. We then applied neural networks to discriminate between the genotypes. This showed clearly that the two background lines can be discrimated between each other and their progeny, and indicated that the two progeny lines can also be discriminated. We applied Euclidean hierarchical and Principal Component Analysis (PCA) to help understand the basis of genotype discrimination. PCA indicated that malic acid and citrate are the two most important metabolites for discriminating between the background lines, and glucose and fructose are two most important metabolites for discriminating between the crosses. These results are consistant with genotype differences in mitochondia and chloroplasts.

Original languageEnglish (US)
JournalBioinformatics
Volume18
Issue numberSUPPL. 2
StatePublished - 2002
Externally publishedYes

Fingerprint

Metabolomics
Metabolites
Genotype
Discrimination
Learning systems
Machine Learning
Statistics
Line
Chloroplast
Chloroplasts
Principal Component Analysis
Principal component analysis
Arabidopsis
Protein Array Analysis
Fructose
Metabolome
Proteomics
Mass Spectrometry
Microarrays
Profiling

Keywords

  • Arabidopsis
  • Clustering
  • Metabolome

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Taylor, J., King, R. D., Altmann, T., & Fiehn, O. (2002). Application of metabolomics to plant genotype discrimination using statistics and machine learning. Bioinformatics, 18(SUPPL. 2).

Application of metabolomics to plant genotype discrimination using statistics and machine learning. / Taylor, Janet; King, Ross D.; Altmann, Thomas; Fiehn, Oliver.

In: Bioinformatics, Vol. 18, No. SUPPL. 2, 2002.

Research output: Contribution to journalArticle

Taylor, J, King, RD, Altmann, T & Fiehn, O 2002, 'Application of metabolomics to plant genotype discrimination using statistics and machine learning', Bioinformatics, vol. 18, no. SUPPL. 2.
Taylor, Janet ; King, Ross D. ; Altmann, Thomas ; Fiehn, Oliver. / Application of metabolomics to plant genotype discrimination using statistics and machine learning. In: Bioinformatics. 2002 ; Vol. 18, No. SUPPL. 2.
@article{0a976ca3e8a1476f816e52106b68b91a,
title = "Application of metabolomics to plant genotype discrimination using statistics and machine learning",
abstract = "Motivation: Metabolomics is a post genomic technology which seeks to provide a comprehensive profile of all the metabolites present in a biological sample. This complements the mRNA profiles provided by microarrays, and the protein profiles provided by proteomics. To test the power of metabolome analysis we selected the problem of discrimating between related genotypes of Arabidopsis. Specifically, the problem tackled was to discrimate between two background genotypes (Co10 and C24) and, more significantly, the offspring produced by the cross-breeding of these two lines, the progeny (whose genotypes would differ only in their maternally inherited mitichondia and chloroplasts). Overview: A gas chromotography - mass spectrometry (GCMS) profiling protocol was used to identify 433 metabolites in the samples. The metabolomic profiles were compared using descriptive statistics which indicated that key primary metabolites vary more than other metabolites. We then applied neural networks to discriminate between the genotypes. This showed clearly that the two background lines can be discrimated between each other and their progeny, and indicated that the two progeny lines can also be discriminated. We applied Euclidean hierarchical and Principal Component Analysis (PCA) to help understand the basis of genotype discrimination. PCA indicated that malic acid and citrate are the two most important metabolites for discriminating between the background lines, and glucose and fructose are two most important metabolites for discriminating between the crosses. These results are consistant with genotype differences in mitochondia and chloroplasts.",
keywords = "Arabidopsis, Clustering, Metabolome",
author = "Janet Taylor and King, {Ross D.} and Thomas Altmann and Oliver Fiehn",
year = "2002",
language = "English (US)",
volume = "18",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "SUPPL. 2",

}

TY - JOUR

T1 - Application of metabolomics to plant genotype discrimination using statistics and machine learning

AU - Taylor, Janet

AU - King, Ross D.

AU - Altmann, Thomas

AU - Fiehn, Oliver

PY - 2002

Y1 - 2002

N2 - Motivation: Metabolomics is a post genomic technology which seeks to provide a comprehensive profile of all the metabolites present in a biological sample. This complements the mRNA profiles provided by microarrays, and the protein profiles provided by proteomics. To test the power of metabolome analysis we selected the problem of discrimating between related genotypes of Arabidopsis. Specifically, the problem tackled was to discrimate between two background genotypes (Co10 and C24) and, more significantly, the offspring produced by the cross-breeding of these two lines, the progeny (whose genotypes would differ only in their maternally inherited mitichondia and chloroplasts). Overview: A gas chromotography - mass spectrometry (GCMS) profiling protocol was used to identify 433 metabolites in the samples. The metabolomic profiles were compared using descriptive statistics which indicated that key primary metabolites vary more than other metabolites. We then applied neural networks to discriminate between the genotypes. This showed clearly that the two background lines can be discrimated between each other and their progeny, and indicated that the two progeny lines can also be discriminated. We applied Euclidean hierarchical and Principal Component Analysis (PCA) to help understand the basis of genotype discrimination. PCA indicated that malic acid and citrate are the two most important metabolites for discriminating between the background lines, and glucose and fructose are two most important metabolites for discriminating between the crosses. These results are consistant with genotype differences in mitochondia and chloroplasts.

AB - Motivation: Metabolomics is a post genomic technology which seeks to provide a comprehensive profile of all the metabolites present in a biological sample. This complements the mRNA profiles provided by microarrays, and the protein profiles provided by proteomics. To test the power of metabolome analysis we selected the problem of discrimating between related genotypes of Arabidopsis. Specifically, the problem tackled was to discrimate between two background genotypes (Co10 and C24) and, more significantly, the offspring produced by the cross-breeding of these two lines, the progeny (whose genotypes would differ only in their maternally inherited mitichondia and chloroplasts). Overview: A gas chromotography - mass spectrometry (GCMS) profiling protocol was used to identify 433 metabolites in the samples. The metabolomic profiles were compared using descriptive statistics which indicated that key primary metabolites vary more than other metabolites. We then applied neural networks to discriminate between the genotypes. This showed clearly that the two background lines can be discrimated between each other and their progeny, and indicated that the two progeny lines can also be discriminated. We applied Euclidean hierarchical and Principal Component Analysis (PCA) to help understand the basis of genotype discrimination. PCA indicated that malic acid and citrate are the two most important metabolites for discriminating between the background lines, and glucose and fructose are two most important metabolites for discriminating between the crosses. These results are consistant with genotype differences in mitochondia and chloroplasts.

KW - Arabidopsis

KW - Clustering

KW - Metabolome

UR - http://www.scopus.com/inward/record.url?scp=0037731046&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037731046&partnerID=8YFLogxK

M3 - Article

C2 - 12386008

AN - SCOPUS:0037731046

VL - 18

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - SUPPL. 2

ER -