On partial least squares dimension reduction for microarray-based classification

A simulation study

Danh V. Nguyen, David M Rocke

Research output: Contribution to journalArticle

52 Citations (Scopus)

Abstract

In microarray tumor tissue classification studies, the expressions of thousands of genes (variables) are simultaneously measured across a few tissue samples. Standard statistical methodologies in classification do not work well when the dimension, p, is greater than the sample size, N. One approach to classification problems, when p≫N, is to first apply a dimension reduction method and then perform the classification in the reduced space. In this paper, we study dimension reduction for classification in high dimension based on partial least squares (PLS) and principal components analysis (PCA). In addition, we propose and explore two hybrid-PLS methods for dimension reduction. PLS components are linear combinations of the original predictors, but the weights are nonlinear functions of both the predictors and response variable. This makes it difficult to study the PLS classification methodologies analytically, so, in this paper, we turn to a numerical study using simulation.

Original languageEnglish (US)
Pages (from-to)407-425
Number of pages19
JournalComputational Statistics and Data Analysis
Volume46
Issue number3
DOIs
StatePublished - Jun 15 2004

Fingerprint

Partial Least Squares
Dimension Reduction
Microarrays
Microarray
Simulation Study
Predictors
Tissue
Methodology
Reduction Method
Least Square Method
Nonlinear Function
Classification Problems
Principal Component Analysis
Higher Dimensions
Linear Combination
Numerical Study
Tumor
Sample Size
Principal component analysis
Dimension reduction

Keywords

  • DNA microarray
  • Logistic discrimination
  • Partial least squares
  • Principal components analysis

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Statistics, Probability and Uncertainty
  • Electrical and Electronic Engineering
  • Computational Mathematics
  • Numerical Analysis
  • Statistics and Probability

Cite this

On partial least squares dimension reduction for microarray-based classification : A simulation study. / Nguyen, Danh V.; Rocke, David M.

In: Computational Statistics and Data Analysis, Vol. 46, No. 3, 15.06.2004, p. 407-425.

Research output: Contribution to journalArticle

@article{192a1dc8557b47a2bcd3dbc306282a82,
title = "On partial least squares dimension reduction for microarray-based classification: A simulation study",
abstract = "In microarray tumor tissue classification studies, the expressions of thousands of genes (variables) are simultaneously measured across a few tissue samples. Standard statistical methodologies in classification do not work well when the dimension, p, is greater than the sample size, N. One approach to classification problems, when p≫N, is to first apply a dimension reduction method and then perform the classification in the reduced space. In this paper, we study dimension reduction for classification in high dimension based on partial least squares (PLS) and principal components analysis (PCA). In addition, we propose and explore two hybrid-PLS methods for dimension reduction. PLS components are linear combinations of the original predictors, but the weights are nonlinear functions of both the predictors and response variable. This makes it difficult to study the PLS classification methodologies analytically, so, in this paper, we turn to a numerical study using simulation.",
keywords = "DNA microarray, Logistic discrimination, Partial least squares, Principal components analysis",
author = "Nguyen, {Danh V.} and Rocke, {David M}",
year = "2004",
month = "6",
day = "15",
doi = "10.1016/j.csda.2003.08.001",
language = "English (US)",
volume = "46",
pages = "407--425",
journal = "Computational Statistics and Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",
number = "3",

}

TY - JOUR

T1 - On partial least squares dimension reduction for microarray-based classification

T2 - A simulation study

AU - Nguyen, Danh V.

AU - Rocke, David M

PY - 2004/6/15

Y1 - 2004/6/15

N2 - In microarray tumor tissue classification studies, the expressions of thousands of genes (variables) are simultaneously measured across a few tissue samples. Standard statistical methodologies in classification do not work well when the dimension, p, is greater than the sample size, N. One approach to classification problems, when p≫N, is to first apply a dimension reduction method and then perform the classification in the reduced space. In this paper, we study dimension reduction for classification in high dimension based on partial least squares (PLS) and principal components analysis (PCA). In addition, we propose and explore two hybrid-PLS methods for dimension reduction. PLS components are linear combinations of the original predictors, but the weights are nonlinear functions of both the predictors and response variable. This makes it difficult to study the PLS classification methodologies analytically, so, in this paper, we turn to a numerical study using simulation.

AB - In microarray tumor tissue classification studies, the expressions of thousands of genes (variables) are simultaneously measured across a few tissue samples. Standard statistical methodologies in classification do not work well when the dimension, p, is greater than the sample size, N. One approach to classification problems, when p≫N, is to first apply a dimension reduction method and then perform the classification in the reduced space. In this paper, we study dimension reduction for classification in high dimension based on partial least squares (PLS) and principal components analysis (PCA). In addition, we propose and explore two hybrid-PLS methods for dimension reduction. PLS components are linear combinations of the original predictors, but the weights are nonlinear functions of both the predictors and response variable. This makes it difficult to study the PLS classification methodologies analytically, so, in this paper, we turn to a numerical study using simulation.

KW - DNA microarray

KW - Logistic discrimination

KW - Partial least squares

KW - Principal components analysis

UR - http://www.scopus.com/inward/record.url?scp=2642537531&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2642537531&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2003.08.001

DO - 10.1016/j.csda.2003.08.001

M3 - Article

VL - 46

SP - 407

EP - 425

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

SN - 0167-9473

IS - 3

ER -