Dimension reduction for classification with gene expression microarray data

Jian J. Dai, Linh Lieu, David M Rocke

Research output: Contribution to journalArticle

130 Citations (Scopus)

Abstract

An important application of gene expression microarray data is classification of biological samples or prediction of clinical and other outcomes. One necessary part of multivariate statistical analysis in such applications is dimension reduction. This paper provides a comparison study of three dimension reduction techniques, namely partial least squares (PLS), sliced inverse regression (SIR) and principal component analysis (PCA), and evaluates the relative performance of classification procedures incorporating those methods. A five-step assessment procedure is designed for the purpose. Predictive accuracy and computational efficiency of the methods are examined. Two gene expression data sets for tumor classification are used in the study.

Original languageEnglish (US)
JournalStatistical Applications in Genetics and Molecular Biology
Volume5
Issue number1
StatePublished - Feb 24 2006

Fingerprint

Dimension Reduction
Microarrays
Gene Expression Data
Microarray Data
Gene expression
Gene Expression
Multivariate Statistical Analysis
Sliced Inverse Regression
Partial Least Squares
Computational efficiency
Computational Efficiency
Principal component analysis
Principal Component Analysis
Three-dimension
Tumors
Tumor
Statistical methods
Least-Squares Analysis
Necessary
Multivariate Analysis

Keywords

  • Feature extraction
  • Gene expression
  • Partial least squares
  • Sliced inverse regression
  • Tumor classification

ASJC Scopus subject areas

  • Genetics

Cite this

Dimension reduction for classification with gene expression microarray data. / Dai, Jian J.; Lieu, Linh; Rocke, David M.

In: Statistical Applications in Genetics and Molecular Biology, Vol. 5, No. 1, 24.02.2006.

Research output: Contribution to journalArticle

@article{b2ae3ac82057438ab9239882d566f3bd,
title = "Dimension reduction for classification with gene expression microarray data",
abstract = "An important application of gene expression microarray data is classification of biological samples or prediction of clinical and other outcomes. One necessary part of multivariate statistical analysis in such applications is dimension reduction. This paper provides a comparison study of three dimension reduction techniques, namely partial least squares (PLS), sliced inverse regression (SIR) and principal component analysis (PCA), and evaluates the relative performance of classification procedures incorporating those methods. A five-step assessment procedure is designed for the purpose. Predictive accuracy and computational efficiency of the methods are examined. Two gene expression data sets for tumor classification are used in the study.",
keywords = "Feature extraction, Gene expression, Partial least squares, Sliced inverse regression, Tumor classification",
author = "Dai, {Jian J.} and Linh Lieu and Rocke, {David M}",
year = "2006",
month = "2",
day = "24",
language = "English (US)",
volume = "5",
journal = "Statistical Applications in Genetics and Molecular Biology",
issn = "1544-6115",
publisher = "Berkeley Electronic Press",
number = "1",

}

TY - JOUR

T1 - Dimension reduction for classification with gene expression microarray data

AU - Dai, Jian J.

AU - Lieu, Linh

AU - Rocke, David M

PY - 2006/2/24

Y1 - 2006/2/24

N2 - An important application of gene expression microarray data is classification of biological samples or prediction of clinical and other outcomes. One necessary part of multivariate statistical analysis in such applications is dimension reduction. This paper provides a comparison study of three dimension reduction techniques, namely partial least squares (PLS), sliced inverse regression (SIR) and principal component analysis (PCA), and evaluates the relative performance of classification procedures incorporating those methods. A five-step assessment procedure is designed for the purpose. Predictive accuracy and computational efficiency of the methods are examined. Two gene expression data sets for tumor classification are used in the study.

AB - An important application of gene expression microarray data is classification of biological samples or prediction of clinical and other outcomes. One necessary part of multivariate statistical analysis in such applications is dimension reduction. This paper provides a comparison study of three dimension reduction techniques, namely partial least squares (PLS), sliced inverse regression (SIR) and principal component analysis (PCA), and evaluates the relative performance of classification procedures incorporating those methods. A five-step assessment procedure is designed for the purpose. Predictive accuracy and computational efficiency of the methods are examined. Two gene expression data sets for tumor classification are used in the study.

KW - Feature extraction

KW - Gene expression

KW - Partial least squares

KW - Sliced inverse regression

KW - Tumor classification

UR - http://www.scopus.com/inward/record.url?scp=33646383088&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646383088&partnerID=8YFLogxK

M3 - Article

C2 - 16646870

AN - SCOPUS:33646383088

VL - 5

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

SN - 1544-6115

IS - 1

ER -