MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

James G. Jeffryes, Ricardo L. Colastani, Mona Elbadawi-Sidhu, Tobias Kind, Thomas D. Niehaus, Linda J. Broadbelt, Andrew D. Hanson, Oliver Fiehn, Keith E J Tyo, Christopher S. Henry

Research output: Contribution to journalArticle

83 Citations (Scopus)

Abstract

Background: In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Description: Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. Conclusions: MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.

Original languageEnglish (US)
Article number44
JournalJournal of Cheminformatics
Volume7
Issue number1
DOIs
StatePublished - Aug 28 2015

Fingerprint

open access
enzymes
Enzymes
expansion
metabolites
products
Metabolites
Genes
genome
genes
candidacy
Liquid chromatography
liquid chromatography
Mass spectrometry
Chemical reactions
mass spectroscopy
Metabolomics
annotations
application programming interface
biochemistry

Keywords

  • Enzyme promiscuity
  • Liquid chromatography-mass spectrometry
  • Metabolite identification
  • Untargeted metabolomics

ASJC Scopus subject areas

  • Physical and Theoretical Chemistry
  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications
  • Library and Information Sciences

Cite this

Jeffryes, J. G., Colastani, R. L., Elbadawi-Sidhu, M., Kind, T., Niehaus, T. D., Broadbelt, L. J., ... Henry, C. S. (2015). MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. Journal of Cheminformatics, 7(1), [44]. https://doi.org/10.1186/s13321-015-0087-1

MINEs : Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. / Jeffryes, James G.; Colastani, Ricardo L.; Elbadawi-Sidhu, Mona; Kind, Tobias; Niehaus, Thomas D.; Broadbelt, Linda J.; Hanson, Andrew D.; Fiehn, Oliver; Tyo, Keith E J; Henry, Christopher S.

In: Journal of Cheminformatics, Vol. 7, No. 1, 44, 28.08.2015.

Research output: Contribution to journalArticle

Jeffryes, JG, Colastani, RL, Elbadawi-Sidhu, M, Kind, T, Niehaus, TD, Broadbelt, LJ, Hanson, AD, Fiehn, O, Tyo, KEJ & Henry, CS 2015, 'MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics', Journal of Cheminformatics, vol. 7, no. 1, 44. https://doi.org/10.1186/s13321-015-0087-1
Jeffryes, James G. ; Colastani, Ricardo L. ; Elbadawi-Sidhu, Mona ; Kind, Tobias ; Niehaus, Thomas D. ; Broadbelt, Linda J. ; Hanson, Andrew D. ; Fiehn, Oliver ; Tyo, Keith E J ; Henry, Christopher S. / MINEs : Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. In: Journal of Cheminformatics. 2015 ; Vol. 7, No. 1.
@article{8e96c34e85c9401fac4bd41d3e844633,
title = "MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics",
abstract = "Background: In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Description: Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93{\%} are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6{\%} of a set of 667 MassBank spectra, 14{\%} more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. Conclusions: MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.",
keywords = "Enzyme promiscuity, Liquid chromatography-mass spectrometry, Metabolite identification, Untargeted metabolomics",
author = "Jeffryes, {James G.} and Colastani, {Ricardo L.} and Mona Elbadawi-Sidhu and Tobias Kind and Niehaus, {Thomas D.} and Broadbelt, {Linda J.} and Hanson, {Andrew D.} and Oliver Fiehn and Tyo, {Keith E J} and Henry, {Christopher S.}",
year = "2015",
month = "8",
day = "28",
doi = "10.1186/s13321-015-0087-1",
language = "English (US)",
volume = "7",
journal = "Journal of Cheminformatics",
issn = "1758-2946",
publisher = "Chemistry Central",
number = "1",

}

TY - JOUR

T1 - MINEs

T2 - Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

AU - Jeffryes, James G.

AU - Colastani, Ricardo L.

AU - Elbadawi-Sidhu, Mona

AU - Kind, Tobias

AU - Niehaus, Thomas D.

AU - Broadbelt, Linda J.

AU - Hanson, Andrew D.

AU - Fiehn, Oliver

AU - Tyo, Keith E J

AU - Henry, Christopher S.

PY - 2015/8/28

Y1 - 2015/8/28

N2 - Background: In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Description: Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. Conclusions: MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.

AB - Background: In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Description: Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. Conclusions: MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.

KW - Enzyme promiscuity

KW - Liquid chromatography-mass spectrometry

KW - Metabolite identification

KW - Untargeted metabolomics

UR - http://www.scopus.com/inward/record.url?scp=84940036737&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940036737&partnerID=8YFLogxK

U2 - 10.1186/s13321-015-0087-1

DO - 10.1186/s13321-015-0087-1

M3 - Article

AN - SCOPUS:84940036737

VL - 7

JO - Journal of Cheminformatics

JF - Journal of Cheminformatics

SN - 1758-2946

IS - 1

M1 - 44

ER -