Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens

Sandra L. Taylor, L. Renee Ruhaak, Robert H Weiss, Karen Kelly, Kyoungmi Kim

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Motivation: High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact betweenbiospecimen correlation and multivariate analysis results. Results: We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen.

Original languageEnglish (US)
Pages (from-to)17-25
Number of pages9
JournalBioinformatics
Volume33
Issue number1
DOIs
StatePublished - 2017

Fingerprint

Mass Spectrometry
Mass spectrometry
Statistics
Univariate
Missing Values
Statistical methods
Multivariate Analysis
Correlation Analysis
Imputation
Null Distribution
Statistical Significance
Testing
Statistical method
High Throughput
Permutation
Simulation Study

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens. / Taylor, Sandra L.; Ruhaak, L. Renee; Weiss, Robert H; Kelly, Karen; Kim, Kyoungmi.

In: Bioinformatics, Vol. 33, No. 1, 2017, p. 17-25.

Research output: Contribution to journalArticle

@article{ef58c3da0de54be48ff5cec956f50a7f,
title = "Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens",
abstract = "Motivation: High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact betweenbiospecimen correlation and multivariate analysis results. Results: We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen.",
author = "Taylor, {Sandra L.} and Ruhaak, {L. Renee} and Weiss, {Robert H} and Karen Kelly and Kyoungmi Kim",
year = "2017",
doi = "10.1093/bioinformatics/btw578",
language = "English (US)",
volume = "33",
pages = "17--25",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "1",

}

TY - JOUR

T1 - Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens

AU - Taylor, Sandra L.

AU - Ruhaak, L. Renee

AU - Weiss, Robert H

AU - Kelly, Karen

AU - Kim, Kyoungmi

PY - 2017

Y1 - 2017

N2 - Motivation: High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact betweenbiospecimen correlation and multivariate analysis results. Results: We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen.

AB - Motivation: High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact betweenbiospecimen correlation and multivariate analysis results. Results: We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen.

UR - http://www.scopus.com/inward/record.url?scp=85014807345&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014807345&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btw578

DO - 10.1093/bioinformatics/btw578

M3 - Article

VL - 33

SP - 17

EP - 25

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 1

ER -