Aspects of the design and analysis of high-dimensional SNP studies for disease risk estimation

Ross L. Prentice, Lihong Qi

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

The state of readiness for high-dimensional single nucleotide polymorphism (SNP) epidemiologic association studies is described, as background for a discussion of statistical aspects of case-control study design and analysis. Specifically, the important role that multistage designs can play in the elimination of false-positive associations and in the control of study costs will be noted. Also, the trade-offs associated with using pooled DNA at early design stages for additional important cost reductions will be discussed in some detail. An odds ratio approach to relating SNP alleles to disease risk using pooled DNA will be proposed, in conjunction with a simple empirical variance estimator, based on comparisons among log-odds ratio estimators from distinct pairs of case and control pools. Simulation studies will be presented to evaluate the moderate sample size properties of such multistage designs and estimation procedures. The design of an ongoing three-stage study in the Women's Health Initiative to relate 250 000 SNPs to the risk of coronary heart disease, stroke, and breast cancer will provide illustration, and will be used to motivate the choice of simulation configurations.

Original languageEnglish (US)
Pages (from-to)339-354
Number of pages16
JournalBiostatistics
Volume7
Issue number3
DOIs
StatePublished - Jul 2006
Externally publishedYes

Fingerprint

Single nucleotide Polymorphism
Single Nucleotide Polymorphism
High-dimensional
Odds Ratio
Cost Control
DNA
Empirical Estimator
Women's Health
Coronary Heart Disease
Ratio Estimator
Sample Size
Coronary Disease
Case-control Study
Case-Control Studies
Variance Estimator
Epidemiologic Studies
Costs
Stroke
Alleles
Breast Cancer

Keywords

  • Case-control
  • Cohort
  • Genetic association
  • High-dimensional data
  • Multistage design
  • Odds ratio
  • Pooled DNA
  • Single nucleotide polymorphism

ASJC Scopus subject areas

  • Medicine(all)
  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Aspects of the design and analysis of high-dimensional SNP studies for disease risk estimation. / Prentice, Ross L.; Qi, Lihong.

In: Biostatistics, Vol. 7, No. 3, 07.2006, p. 339-354.

Research output: Contribution to journalArticle

@article{c82e695f079f4a29a143a70639be4165,
title = "Aspects of the design and analysis of high-dimensional SNP studies for disease risk estimation",
abstract = "The state of readiness for high-dimensional single nucleotide polymorphism (SNP) epidemiologic association studies is described, as background for a discussion of statistical aspects of case-control study design and analysis. Specifically, the important role that multistage designs can play in the elimination of false-positive associations and in the control of study costs will be noted. Also, the trade-offs associated with using pooled DNA at early design stages for additional important cost reductions will be discussed in some detail. An odds ratio approach to relating SNP alleles to disease risk using pooled DNA will be proposed, in conjunction with a simple empirical variance estimator, based on comparisons among log-odds ratio estimators from distinct pairs of case and control pools. Simulation studies will be presented to evaluate the moderate sample size properties of such multistage designs and estimation procedures. The design of an ongoing three-stage study in the Women's Health Initiative to relate 250 000 SNPs to the risk of coronary heart disease, stroke, and breast cancer will provide illustration, and will be used to motivate the choice of simulation configurations.",
keywords = "Case-control, Cohort, Genetic association, High-dimensional data, Multistage design, Odds ratio, Pooled DNA, Single nucleotide polymorphism",
author = "Prentice, {Ross L.} and Lihong Qi",
year = "2006",
month = "7",
doi = "10.1093/biostatistics/kxj020",
language = "English (US)",
volume = "7",
pages = "339--354",
journal = "Biostatistics",
issn = "1465-4644",
publisher = "Oxford University Press",
number = "3",

}

TY - JOUR

T1 - Aspects of the design and analysis of high-dimensional SNP studies for disease risk estimation

AU - Prentice, Ross L.

AU - Qi, Lihong

PY - 2006/7

Y1 - 2006/7

N2 - The state of readiness for high-dimensional single nucleotide polymorphism (SNP) epidemiologic association studies is described, as background for a discussion of statistical aspects of case-control study design and analysis. Specifically, the important role that multistage designs can play in the elimination of false-positive associations and in the control of study costs will be noted. Also, the trade-offs associated with using pooled DNA at early design stages for additional important cost reductions will be discussed in some detail. An odds ratio approach to relating SNP alleles to disease risk using pooled DNA will be proposed, in conjunction with a simple empirical variance estimator, based on comparisons among log-odds ratio estimators from distinct pairs of case and control pools. Simulation studies will be presented to evaluate the moderate sample size properties of such multistage designs and estimation procedures. The design of an ongoing three-stage study in the Women's Health Initiative to relate 250 000 SNPs to the risk of coronary heart disease, stroke, and breast cancer will provide illustration, and will be used to motivate the choice of simulation configurations.

AB - The state of readiness for high-dimensional single nucleotide polymorphism (SNP) epidemiologic association studies is described, as background for a discussion of statistical aspects of case-control study design and analysis. Specifically, the important role that multistage designs can play in the elimination of false-positive associations and in the control of study costs will be noted. Also, the trade-offs associated with using pooled DNA at early design stages for additional important cost reductions will be discussed in some detail. An odds ratio approach to relating SNP alleles to disease risk using pooled DNA will be proposed, in conjunction with a simple empirical variance estimator, based on comparisons among log-odds ratio estimators from distinct pairs of case and control pools. Simulation studies will be presented to evaluate the moderate sample size properties of such multistage designs and estimation procedures. The design of an ongoing three-stage study in the Women's Health Initiative to relate 250 000 SNPs to the risk of coronary heart disease, stroke, and breast cancer will provide illustration, and will be used to motivate the choice of simulation configurations.

KW - Case-control

KW - Cohort

KW - Genetic association

KW - High-dimensional data

KW - Multistage design

KW - Odds ratio

KW - Pooled DNA

KW - Single nucleotide polymorphism

UR - http://www.scopus.com/inward/record.url?scp=33745453811&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745453811&partnerID=8YFLogxK

U2 - 10.1093/biostatistics/kxj020

DO - 10.1093/biostatistics/kxj020

M3 - Article

C2 - 16443924

AN - SCOPUS:33745453811

VL - 7

SP - 339

EP - 354

JO - Biostatistics

JF - Biostatistics

SN - 1465-4644

IS - 3

ER -