An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways

Bin Peng, Dianwen Zhu, Bradley Ander, Xiaoshuai Zhang, Fuzhong Xue, Frank R Sharp, Xiaowei Yang

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

Original languageEnglish (US)
Article numbere67672
JournalPLoS One
Volume8
Issue number7
DOIs
StatePublished - Jul 3 2013

Fingerprint

Genes
genetic markers
genes
Microarrays
gene interaction
selection methods
Precision Medicine
stroke
chemical structure
Systems Biology
least squares
biomarkers
Regulator Genes
medicine
Molecular Structure
Least-Squares Analysis
statistics
Biomarkers
Computational efficiency
phenotype

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways. / Peng, Bin; Zhu, Dianwen; Ander, Bradley; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei.

In: PLoS One, Vol. 8, No. 7, e67672, 03.07.2013.

Research output: Contribution to journalArticle

Peng, Bin ; Zhu, Dianwen ; Ander, Bradley ; Zhang, Xiaoshuai ; Xue, Fuzhong ; Sharp, Frank R ; Yang, Xiaowei. / An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways. In: PLoS One. 2013 ; Vol. 8, No. 7.
@article{69f3294410ee4b779d78441ee98bc2b1,
title = "An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways",
abstract = "The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.",
author = "Bin Peng and Dianwen Zhu and Bradley Ander and Xiaoshuai Zhang and Fuzhong Xue and Sharp, {Frank R} and Xiaowei Yang",
year = "2013",
month = "7",
day = "3",
doi = "10.1371/journal.pone.0067672",
language = "English (US)",
volume = "8",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "7",

}

TY - JOUR

T1 - An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways

AU - Peng, Bin

AU - Zhu, Dianwen

AU - Ander, Bradley

AU - Zhang, Xiaoshuai

AU - Xue, Fuzhong

AU - Sharp, Frank R

AU - Yang, Xiaowei

PY - 2013/7/3

Y1 - 2013/7/3

N2 - The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

AB - The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

UR - http://www.scopus.com/inward/record.url?scp=84879771268&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879771268&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0067672

DO - 10.1371/journal.pone.0067672

M3 - Article

C2 - 23844055

AN - SCOPUS:84879771268

VL - 8

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 7

M1 - e67672

ER -