Using MS-FINDER for identifying 19 natural products in the CASMI 2016 contest

Arpana Vaniya, Stephanie N. Samra, Mine Palazoglu, Hiroshi Tsugawa, Oliver Fiehn

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

In its fourth year, the CASMI 2016 contest was organized to evaluate current chemical structure identification strategies for 19 natural products using high-resolution LC-MS and LC-MS/MS challenge datasets using automated methods with or without the combination of other tools. These natural products originate from plants, fungi, marine sponges, algae, or micro-algae. Every compound annotation workflow must start with determination of elemental compositions. Of these 19 challenges, one was excluded by the organizers after submission. For the remaining 18 challenges, three software programs were used. MS-FINDER version 1.62 was able to correctly identify 89% of the molecular formulas using an internal database that comprised of 13 metabolomics repositories with 45,181 formulas. SIRIUS correctly identified 61% compositions using PubChem formulas and Seven Golden Rules correctly identified 83% by using the Dictionary of Natural Products as a targeted database. Next, we performed structural dereplication for which we used the consensus formula from the three software programs. We submitted two solution sets for these challenges. In the first solution set, avaniya001, we only used the internal MS-FINDER functions for predicting and ranking structures, correctly identifying 53% of the structures as top-hit, 72% within the top-3 structures, and 78% within the top-10 hits. For our second set, avaniya002, we used both MS-FINDER predictions as well as MS/MS queries against the commercial NIST 14, METLIN, and the public MassBank of North America libraries. Here we correctly identified 78% of the structures as top-hit and 83% within the top-3 hits. Three challenge spectra remained unidentified in either of our submissions within the top-10 hits.

Original languageEnglish (US)
JournalPhytochemistry Letters
DOIs
StateAccepted/In press - Sep 29 2016

Fingerprint

algae
Biological Products
Algae
metabolomics
Software
Porifera
chemical structure
Databases
Metabolomics
Workflow
Glossaries
North America
Fungi
Chemical analysis
Libraries
fungi
prediction
tandem mass spectrometry
methodology

Keywords

  • CASMI
  • Compound identification
  • Mass spectrometry
  • MS-FINDER
  • Natural products
  • Tandem mass spectrometry

ASJC Scopus subject areas

  • Biotechnology
  • Biochemistry
  • Agronomy and Crop Science
  • Plant Science

Cite this

Using MS-FINDER for identifying 19 natural products in the CASMI 2016 contest. / Vaniya, Arpana; Samra, Stephanie N.; Palazoglu, Mine; Tsugawa, Hiroshi; Fiehn, Oliver.

In: Phytochemistry Letters, 29.09.2016.

Research output: Contribution to journalArticle

Vaniya, Arpana ; Samra, Stephanie N. ; Palazoglu, Mine ; Tsugawa, Hiroshi ; Fiehn, Oliver. / Using MS-FINDER for identifying 19 natural products in the CASMI 2016 contest. In: Phytochemistry Letters. 2016.
@article{50a7643636d2459898d312a244181705,
title = "Using MS-FINDER for identifying 19 natural products in the CASMI 2016 contest",
abstract = "In its fourth year, the CASMI 2016 contest was organized to evaluate current chemical structure identification strategies for 19 natural products using high-resolution LC-MS and LC-MS/MS challenge datasets using automated methods with or without the combination of other tools. These natural products originate from plants, fungi, marine sponges, algae, or micro-algae. Every compound annotation workflow must start with determination of elemental compositions. Of these 19 challenges, one was excluded by the organizers after submission. For the remaining 18 challenges, three software programs were used. MS-FINDER version 1.62 was able to correctly identify 89{\%} of the molecular formulas using an internal database that comprised of 13 metabolomics repositories with 45,181 formulas. SIRIUS correctly identified 61{\%} compositions using PubChem formulas and Seven Golden Rules correctly identified 83{\%} by using the Dictionary of Natural Products as a targeted database. Next, we performed structural dereplication for which we used the consensus formula from the three software programs. We submitted two solution sets for these challenges. In the first solution set, avaniya001, we only used the internal MS-FINDER functions for predicting and ranking structures, correctly identifying 53{\%} of the structures as top-hit, 72{\%} within the top-3 structures, and 78{\%} within the top-10 hits. For our second set, avaniya002, we used both MS-FINDER predictions as well as MS/MS queries against the commercial NIST 14, METLIN, and the public MassBank of North America libraries. Here we correctly identified 78{\%} of the structures as top-hit and 83{\%} within the top-3 hits. Three challenge spectra remained unidentified in either of our submissions within the top-10 hits.",
keywords = "CASMI, Compound identification, Mass spectrometry, MS-FINDER, Natural products, Tandem mass spectrometry",
author = "Arpana Vaniya and Samra, {Stephanie N.} and Mine Palazoglu and Hiroshi Tsugawa and Oliver Fiehn",
year = "2016",
month = "9",
day = "29",
doi = "10.1016/j.phytol.2016.12.008",
language = "English (US)",
journal = "Phytochemistry Letters",
issn = "1874-3900",
publisher = "Elsevier BV",

}

TY - JOUR

T1 - Using MS-FINDER for identifying 19 natural products in the CASMI 2016 contest

AU - Vaniya, Arpana

AU - Samra, Stephanie N.

AU - Palazoglu, Mine

AU - Tsugawa, Hiroshi

AU - Fiehn, Oliver

PY - 2016/9/29

Y1 - 2016/9/29

N2 - In its fourth year, the CASMI 2016 contest was organized to evaluate current chemical structure identification strategies for 19 natural products using high-resolution LC-MS and LC-MS/MS challenge datasets using automated methods with or without the combination of other tools. These natural products originate from plants, fungi, marine sponges, algae, or micro-algae. Every compound annotation workflow must start with determination of elemental compositions. Of these 19 challenges, one was excluded by the organizers after submission. For the remaining 18 challenges, three software programs were used. MS-FINDER version 1.62 was able to correctly identify 89% of the molecular formulas using an internal database that comprised of 13 metabolomics repositories with 45,181 formulas. SIRIUS correctly identified 61% compositions using PubChem formulas and Seven Golden Rules correctly identified 83% by using the Dictionary of Natural Products as a targeted database. Next, we performed structural dereplication for which we used the consensus formula from the three software programs. We submitted two solution sets for these challenges. In the first solution set, avaniya001, we only used the internal MS-FINDER functions for predicting and ranking structures, correctly identifying 53% of the structures as top-hit, 72% within the top-3 structures, and 78% within the top-10 hits. For our second set, avaniya002, we used both MS-FINDER predictions as well as MS/MS queries against the commercial NIST 14, METLIN, and the public MassBank of North America libraries. Here we correctly identified 78% of the structures as top-hit and 83% within the top-3 hits. Three challenge spectra remained unidentified in either of our submissions within the top-10 hits.

AB - In its fourth year, the CASMI 2016 contest was organized to evaluate current chemical structure identification strategies for 19 natural products using high-resolution LC-MS and LC-MS/MS challenge datasets using automated methods with or without the combination of other tools. These natural products originate from plants, fungi, marine sponges, algae, or micro-algae. Every compound annotation workflow must start with determination of elemental compositions. Of these 19 challenges, one was excluded by the organizers after submission. For the remaining 18 challenges, three software programs were used. MS-FINDER version 1.62 was able to correctly identify 89% of the molecular formulas using an internal database that comprised of 13 metabolomics repositories with 45,181 formulas. SIRIUS correctly identified 61% compositions using PubChem formulas and Seven Golden Rules correctly identified 83% by using the Dictionary of Natural Products as a targeted database. Next, we performed structural dereplication for which we used the consensus formula from the three software programs. We submitted two solution sets for these challenges. In the first solution set, avaniya001, we only used the internal MS-FINDER functions for predicting and ranking structures, correctly identifying 53% of the structures as top-hit, 72% within the top-3 structures, and 78% within the top-10 hits. For our second set, avaniya002, we used both MS-FINDER predictions as well as MS/MS queries against the commercial NIST 14, METLIN, and the public MassBank of North America libraries. Here we correctly identified 78% of the structures as top-hit and 83% within the top-3 hits. Three challenge spectra remained unidentified in either of our submissions within the top-10 hits.

KW - CASMI

KW - Compound identification

KW - Mass spectrometry

KW - MS-FINDER

KW - Natural products

KW - Tandem mass spectrometry

UR - http://www.scopus.com/inward/record.url?scp=85008196675&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85008196675&partnerID=8YFLogxK

U2 - 10.1016/j.phytol.2016.12.008

DO - 10.1016/j.phytol.2016.12.008

M3 - Article

AN - SCOPUS:85008196675

JO - Phytochemistry Letters

JF - Phytochemistry Letters

SN - 1874-3900

ER -