Bayesian semiparametric ROC curve estimation and disease diagnosis

Adam J. Branscum, Wesley O. Johnson, Timothy E. Hanson, Ian Gardner

Research output: Contribution to journalArticle

43 Citations (Scopus)

Abstract

We develop a novel semiparametric modeling framework involving mixtures of Polya trees for screening data with the dual purpose of diagnosing infection or disease status and of assessing the accuracy of continuous diagnostic measures. In this framework, we obtain (i) predictive probabilities of 'disease' based on continuous diagnostic test outcomes in conjunction with other information, including relevant covariates and results from one or more independent binary diagnostic tests. An example would be the modeling of a serum enzyme-linked immunosorbent assay (ELISA) procedure for detecting antibodies to an infectious agent when used in conjunction with culture for antigen detection. Our second goal is to (ii) characterize measures of diagnostic performance of continuous tests by estimating receiver-operating characteristic curves and area under the curve, primarily when such extra information is available. When true disease status is unknown, parametric and nonparametric analyses require sufficient separation between the distributions of outcome values for the diseased and nondiseased populations. However, this overlap becomes less problematic when additional information in the form of either an informative 'prior' that is based on real (preferably data-based) scientific input, or when additional information, or both, are available. The additional information can be used to distinguish 'diseased' from 'nondiseased' individuals. We present an example using simulated data that illustrates this point. We also present an example involving data from an animal-health survey for Johne's disease, where the performance of a serum ELISA is evaluated using additional information obtained from fecal culture. Issues related to identifiability and partial identifiability are also discussed.

Original languageEnglish (US)
Pages (from-to)2474-2496
Number of pages23
JournalStatistics in Medicine
Volume27
Issue number13
DOIs
StatePublished - Jun 15 2008

Fingerprint

Curve Estimation
Receiver Operating Characteristic Curve
ROC Curve
Enzyme-linked Immunosorbent Assay
Diagnostic Tests
Routine Diagnostic Tests
Identifiability
Enzyme-Linked Immunosorbent Assay
Diagnostics
Paratuberculosis
Polya Trees
Health Surveys
Serum
Area Under Curve
Modeling
Antibody
Screening
Infection
Covariates
Overlap

Keywords

  • Evaluation of screening tests
  • Mixtures of Polya trees
  • No gold-standard test data
  • Nonparametric approach
  • Sensitivity
  • Specificity

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this

Bayesian semiparametric ROC curve estimation and disease diagnosis. / Branscum, Adam J.; Johnson, Wesley O.; Hanson, Timothy E.; Gardner, Ian.

In: Statistics in Medicine, Vol. 27, No. 13, 15.06.2008, p. 2474-2496.

Research output: Contribution to journalArticle

Branscum, Adam J. ; Johnson, Wesley O. ; Hanson, Timothy E. ; Gardner, Ian. / Bayesian semiparametric ROC curve estimation and disease diagnosis. In: Statistics in Medicine. 2008 ; Vol. 27, No. 13. pp. 2474-2496.
@article{7183d81f4b804c71830a9fff9e9c651f,
title = "Bayesian semiparametric ROC curve estimation and disease diagnosis",
abstract = "We develop a novel semiparametric modeling framework involving mixtures of Polya trees for screening data with the dual purpose of diagnosing infection or disease status and of assessing the accuracy of continuous diagnostic measures. In this framework, we obtain (i) predictive probabilities of 'disease' based on continuous diagnostic test outcomes in conjunction with other information, including relevant covariates and results from one or more independent binary diagnostic tests. An example would be the modeling of a serum enzyme-linked immunosorbent assay (ELISA) procedure for detecting antibodies to an infectious agent when used in conjunction with culture for antigen detection. Our second goal is to (ii) characterize measures of diagnostic performance of continuous tests by estimating receiver-operating characteristic curves and area under the curve, primarily when such extra information is available. When true disease status is unknown, parametric and nonparametric analyses require sufficient separation between the distributions of outcome values for the diseased and nondiseased populations. However, this overlap becomes less problematic when additional information in the form of either an informative 'prior' that is based on real (preferably data-based) scientific input, or when additional information, or both, are available. The additional information can be used to distinguish 'diseased' from 'nondiseased' individuals. We present an example using simulated data that illustrates this point. We also present an example involving data from an animal-health survey for Johne's disease, where the performance of a serum ELISA is evaluated using additional information obtained from fecal culture. Issues related to identifiability and partial identifiability are also discussed.",
keywords = "Evaluation of screening tests, Mixtures of Polya trees, No gold-standard test data, Nonparametric approach, Sensitivity, Specificity",
author = "Branscum, {Adam J.} and Johnson, {Wesley O.} and Hanson, {Timothy E.} and Ian Gardner",
year = "2008",
month = "6",
day = "15",
doi = "10.1002/sim.3250",
language = "English (US)",
volume = "27",
pages = "2474--2496",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "13",

}

TY - JOUR

T1 - Bayesian semiparametric ROC curve estimation and disease diagnosis

AU - Branscum, Adam J.

AU - Johnson, Wesley O.

AU - Hanson, Timothy E.

AU - Gardner, Ian

PY - 2008/6/15

Y1 - 2008/6/15

N2 - We develop a novel semiparametric modeling framework involving mixtures of Polya trees for screening data with the dual purpose of diagnosing infection or disease status and of assessing the accuracy of continuous diagnostic measures. In this framework, we obtain (i) predictive probabilities of 'disease' based on continuous diagnostic test outcomes in conjunction with other information, including relevant covariates and results from one or more independent binary diagnostic tests. An example would be the modeling of a serum enzyme-linked immunosorbent assay (ELISA) procedure for detecting antibodies to an infectious agent when used in conjunction with culture for antigen detection. Our second goal is to (ii) characterize measures of diagnostic performance of continuous tests by estimating receiver-operating characteristic curves and area under the curve, primarily when such extra information is available. When true disease status is unknown, parametric and nonparametric analyses require sufficient separation between the distributions of outcome values for the diseased and nondiseased populations. However, this overlap becomes less problematic when additional information in the form of either an informative 'prior' that is based on real (preferably data-based) scientific input, or when additional information, or both, are available. The additional information can be used to distinguish 'diseased' from 'nondiseased' individuals. We present an example using simulated data that illustrates this point. We also present an example involving data from an animal-health survey for Johne's disease, where the performance of a serum ELISA is evaluated using additional information obtained from fecal culture. Issues related to identifiability and partial identifiability are also discussed.

AB - We develop a novel semiparametric modeling framework involving mixtures of Polya trees for screening data with the dual purpose of diagnosing infection or disease status and of assessing the accuracy of continuous diagnostic measures. In this framework, we obtain (i) predictive probabilities of 'disease' based on continuous diagnostic test outcomes in conjunction with other information, including relevant covariates and results from one or more independent binary diagnostic tests. An example would be the modeling of a serum enzyme-linked immunosorbent assay (ELISA) procedure for detecting antibodies to an infectious agent when used in conjunction with culture for antigen detection. Our second goal is to (ii) characterize measures of diagnostic performance of continuous tests by estimating receiver-operating characteristic curves and area under the curve, primarily when such extra information is available. When true disease status is unknown, parametric and nonparametric analyses require sufficient separation between the distributions of outcome values for the diseased and nondiseased populations. However, this overlap becomes less problematic when additional information in the form of either an informative 'prior' that is based on real (preferably data-based) scientific input, or when additional information, or both, are available. The additional information can be used to distinguish 'diseased' from 'nondiseased' individuals. We present an example using simulated data that illustrates this point. We also present an example involving data from an animal-health survey for Johne's disease, where the performance of a serum ELISA is evaluated using additional information obtained from fecal culture. Issues related to identifiability and partial identifiability are also discussed.

KW - Evaluation of screening tests

KW - Mixtures of Polya trees

KW - No gold-standard test data

KW - Nonparametric approach

KW - Sensitivity

KW - Specificity

UR - http://www.scopus.com/inward/record.url?scp=44949222621&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44949222621&partnerID=8YFLogxK

U2 - 10.1002/sim.3250

DO - 10.1002/sim.3250

M3 - Article

C2 - 18300333

AN - SCOPUS:44949222621

VL - 27

SP - 2474

EP - 2496

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 13

ER -