Stable classification with applications to microarray data

Chin-Shang Li, Cheng Cheng

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

A stable classification method called minimum-error-distance threshold (MEDT) with variable selection is developed for the two-class prediction (classification) problem. First, a set of "significant" variables (genes) associated with the two classes is selected using the Wilcoxon rank-sum test, and then a data-driven cutoff point for a distance-based classification algorithm is determined by minimizing a combination of the rates of false positives and false negatives estimated by leave-one-out cross validation. This cutoff point is used to classify a given test set based on the selected variables. The proposed methodology is applied to the leukemia data set analyzed in Golub et al. (Science 286 (1999) 531). To compare the proposed methodology with the existing discrimination methods, the diagonal-linear-discriminant analysis and nearest-neighbor classifiers, 1000 cross validations are performed. The data set is randomly split into a training set consisting of 32 patients with acute lymphoblastic leukemia (ALL) and 16 with acute myeloid leukemia (AML) and a test set consisting of 15 patients with ALL and nine with AML. Performance summaries are calculated. A simulation study is conducted to demonstrate the superior stability of MEDT compared with that of the aforementioned existing methods. The stability measure used is the mean-to-standard deviation ratio of the number of correct predictions.

Original languageEnglish (US)
Pages (from-to)599-609
Number of pages11
JournalComputational Statistics and Data Analysis
Volume47
Issue number3
DOIs
StatePublished - Oct 1 2004
Externally publishedYes

Keywords

  • Diagonal-linear- discriminant analysis
  • Microarray
  • Minimum-error-distance threshold
  • Nearest-neighbor classifiers
  • Stable classification
  • Variable selection

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Statistics, Probability and Uncertainty
  • Electrical and Electronic Engineering
  • Computational Mathematics
  • Numerical Analysis
  • Statistics and Probability

Fingerprint Dive into the research topics of 'Stable classification with applications to microarray data'. Together they form a unique fingerprint.

Cite this