A jackknife and voting classifier approach to feature selection and classification

Sandra L. Taylor, Kyoungmi Kim

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

With technological advances now allowing measurement of thousands of genes, proteins and metabolites, researchers are using this information to develop diagnostic and rognostic tests and discern the biological pathways underlying diseases. Often, an investigator's objective is to develop a classification rule to predict group membership of unknown samples based on a small set of features and that could ultimately be used in a clinical setting. While common classification methods such as random forest and support vector machines are effective at separating groups, they do not directly translate into a clinically-applicable classification rule based on a small number of features.We present a simple feature selection and classification method for biomarker detection that is intuitively understandable and can be directly extended for application to a clinical setting. We first use a jackknife procedure to identify important features and then, for classification, we use voting classifiers which are simple and easy to implement. We compared our method to random forest and support vector machines using three benchmark cancer 'omics datasets with different characteristics. We found our jackknife procedure and voting classifier to perform comparably to these two methods in terms of accuracy. Further, the jackknife procedure yielded stable feature sets. Voting classifiers in combination with a robust feature selection method such as our jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.

Original languageEnglish (US)
Pages (from-to)133-147
Number of pages15
JournalCancer Informatics
Volume10
DOIs
StatePublished - 2011

Fingerprint

Politics
Research Personnel
Benchmarking
Routine Diagnostic Tests
Biomarkers

Keywords

  • Classification
  • Feature selection
  • Gene expression
  • Jackknife
  • Voting classifier

ASJC Scopus subject areas

  • Cancer Research
  • Oncology

Cite this

A jackknife and voting classifier approach to feature selection and classification. / Taylor, Sandra L.; Kim, Kyoungmi.

In: Cancer Informatics, Vol. 10, 2011, p. 133-147.

Research output: Contribution to journalArticle

@article{a9f0cd9d757d4ae7ad4ee0eb7261e86d,
title = "A jackknife and voting classifier approach to feature selection and classification",
abstract = "With technological advances now allowing measurement of thousands of genes, proteins and metabolites, researchers are using this information to develop diagnostic and rognostic tests and discern the biological pathways underlying diseases. Often, an investigator's objective is to develop a classification rule to predict group membership of unknown samples based on a small set of features and that could ultimately be used in a clinical setting. While common classification methods such as random forest and support vector machines are effective at separating groups, they do not directly translate into a clinically-applicable classification rule based on a small number of features.We present a simple feature selection and classification method for biomarker detection that is intuitively understandable and can be directly extended for application to a clinical setting. We first use a jackknife procedure to identify important features and then, for classification, we use voting classifiers which are simple and easy to implement. We compared our method to random forest and support vector machines using three benchmark cancer 'omics datasets with different characteristics. We found our jackknife procedure and voting classifier to perform comparably to these two methods in terms of accuracy. Further, the jackknife procedure yielded stable feature sets. Voting classifiers in combination with a robust feature selection method such as our jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.",
keywords = "Classification, Feature selection, Gene expression, Jackknife, Voting classifier",
author = "Taylor, {Sandra L.} and Kyoungmi Kim",
year = "2011",
doi = "10.4137/CIN.S7111",
language = "English (US)",
volume = "10",
pages = "133--147",
journal = "Cancer Informatics",
issn = "1176-9351",
publisher = "Libertas Academica Ltd.",

}

TY - JOUR

T1 - A jackknife and voting classifier approach to feature selection and classification

AU - Taylor, Sandra L.

AU - Kim, Kyoungmi

PY - 2011

Y1 - 2011

N2 - With technological advances now allowing measurement of thousands of genes, proteins and metabolites, researchers are using this information to develop diagnostic and rognostic tests and discern the biological pathways underlying diseases. Often, an investigator's objective is to develop a classification rule to predict group membership of unknown samples based on a small set of features and that could ultimately be used in a clinical setting. While common classification methods such as random forest and support vector machines are effective at separating groups, they do not directly translate into a clinically-applicable classification rule based on a small number of features.We present a simple feature selection and classification method for biomarker detection that is intuitively understandable and can be directly extended for application to a clinical setting. We first use a jackknife procedure to identify important features and then, for classification, we use voting classifiers which are simple and easy to implement. We compared our method to random forest and support vector machines using three benchmark cancer 'omics datasets with different characteristics. We found our jackknife procedure and voting classifier to perform comparably to these two methods in terms of accuracy. Further, the jackknife procedure yielded stable feature sets. Voting classifiers in combination with a robust feature selection method such as our jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.

AB - With technological advances now allowing measurement of thousands of genes, proteins and metabolites, researchers are using this information to develop diagnostic and rognostic tests and discern the biological pathways underlying diseases. Often, an investigator's objective is to develop a classification rule to predict group membership of unknown samples based on a small set of features and that could ultimately be used in a clinical setting. While common classification methods such as random forest and support vector machines are effective at separating groups, they do not directly translate into a clinically-applicable classification rule based on a small number of features.We present a simple feature selection and classification method for biomarker detection that is intuitively understandable and can be directly extended for application to a clinical setting. We first use a jackknife procedure to identify important features and then, for classification, we use voting classifiers which are simple and easy to implement. We compared our method to random forest and support vector machines using three benchmark cancer 'omics datasets with different characteristics. We found our jackknife procedure and voting classifier to perform comparably to these two methods in terms of accuracy. Further, the jackknife procedure yielded stable feature sets. Voting classifiers in combination with a robust feature selection method such as our jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.

KW - Classification

KW - Feature selection

KW - Gene expression

KW - Jackknife

KW - Voting classifier

UR - http://www.scopus.com/inward/record.url?scp=79957793120&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957793120&partnerID=8YFLogxK

U2 - 10.4137/CIN.S7111

DO - 10.4137/CIN.S7111

M3 - Article

C2 - 21584263

AN - SCOPUS:79957793120

VL - 10

SP - 133

EP - 147

JO - Cancer Informatics

JF - Cancer Informatics

SN - 1176-9351

ER -