Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies

Sandra L. Taylor, Gary S Leiserowitz, Kyoungmi Kim

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a compound is absent from a sample or is present but at a concentration below the detection limit. Several strategies are available for statistically analyzing data with missing values. The accelerated failure time (AFT) model assumes all missing values result from censoring below a detection limit. Under a mixture model, missing values can result from a combination of censoring and the absence of a compound. We compare power and estimation of a mixture model to an AFT model. Based on simulated data, we found the AFT model to have greater power to detect differences in means and point mass proportions between groups. However, the AFT model yielded biased estimates with the bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and matched controls.

Original languageEnglish (US)
Pages (from-to)703-722
Number of pages20
JournalStatistical Applications in Genetics and Molecular Biology
Volume12
Issue number6
DOIs
StatePublished - Dec 2013

Fingerprint

Accelerated Failure Time Model
Mass Spectrometry
Missing Values
Mass spectrometry
Mixture Model
Censoring
Detection Limit
Limit of Detection
Proportion
Glycomics
Ovarian Cancer
Missing Observations
Hypothesis Testing
Profiling
Ovarian Neoplasms
Estimate
High Throughput
Biased
Diagnostics
Serum

Keywords

  • Accelerated failure time model
  • Glycomics
  • Mass spectrometry
  • Metabolomics
  • Missing values
  • Point-mass mixture

ASJC Scopus subject areas

  • Genetics
  • Molecular Biology
  • Statistics and Probability
  • Computational Mathematics

Cite this

@article{89c26dba5d984bbf92d9edb8aeece38b,
title = "Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies",
abstract = "Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a compound is absent from a sample or is present but at a concentration below the detection limit. Several strategies are available for statistically analyzing data with missing values. The accelerated failure time (AFT) model assumes all missing values result from censoring below a detection limit. Under a mixture model, missing values can result from a combination of censoring and the absence of a compound. We compare power and estimation of a mixture model to an AFT model. Based on simulated data, we found the AFT model to have greater power to detect differences in means and point mass proportions between groups. However, the AFT model yielded biased estimates with the bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and matched controls.",
keywords = "Accelerated failure time model, Glycomics, Mass spectrometry, Metabolomics, Missing values, Point-mass mixture",
author = "Taylor, {Sandra L.} and Leiserowitz, {Gary S} and Kyoungmi Kim",
year = "2013",
month = "12",
doi = "10.1515/sagmb-2013-0021",
language = "English (US)",
volume = "12",
pages = "703--722",
journal = "Statistical Applications in Genetics and Molecular Biology",
issn = "1544-6115",
publisher = "Berkeley Electronic Press",
number = "6",

}

TY - JOUR

T1 - Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies

AU - Taylor, Sandra L.

AU - Leiserowitz, Gary S

AU - Kim, Kyoungmi

PY - 2013/12

Y1 - 2013/12

N2 - Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a compound is absent from a sample or is present but at a concentration below the detection limit. Several strategies are available for statistically analyzing data with missing values. The accelerated failure time (AFT) model assumes all missing values result from censoring below a detection limit. Under a mixture model, missing values can result from a combination of censoring and the absence of a compound. We compare power and estimation of a mixture model to an AFT model. Based on simulated data, we found the AFT model to have greater power to detect differences in means and point mass proportions between groups. However, the AFT model yielded biased estimates with the bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and matched controls.

AB - Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a compound is absent from a sample or is present but at a concentration below the detection limit. Several strategies are available for statistically analyzing data with missing values. The accelerated failure time (AFT) model assumes all missing values result from censoring below a detection limit. Under a mixture model, missing values can result from a combination of censoring and the absence of a compound. We compare power and estimation of a mixture model to an AFT model. Based on simulated data, we found the AFT model to have greater power to detect differences in means and point mass proportions between groups. However, the AFT model yielded biased estimates with the bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and matched controls.

KW - Accelerated failure time model

KW - Glycomics

KW - Mass spectrometry

KW - Metabolomics

KW - Missing values

KW - Point-mass mixture

UR - http://www.scopus.com/inward/record.url?scp=84888141577&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84888141577&partnerID=8YFLogxK

U2 - 10.1515/sagmb-2013-0021

DO - 10.1515/sagmb-2013-0021

M3 - Article

C2 - 24246290

AN - SCOPUS:84888141577

VL - 12

SP - 703

EP - 722

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

SN - 1544-6115

IS - 6

ER -