Robust estimation of multivariate location and shape

David M Rocke, David L. Woodruff

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

In this paper, we describe an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points. Parts of this strategy have been described in a series of previous papers (Rocke, Ann. Statist., in press; Rocke and Woodruff, Statist. Neerlandica 47 (1993), 27-42, J. Amer. Statist. Assoc., in press; Woodruff and Rocke, J. Comput. Graphical Statist. 2 (1993), 69-95; J. Amer. Statist. Assoc. 89 (1994), 888-896) but the overall structure is presented here for the first time. After describing the first-level architecture of a class of algorithms for this problem, we review available information about possible tactics for each major step in the process. The major steps that we have found to be necessary are as follows: (1) partition the data into groups of perhaps five times the dimension; (2) for each group, search for the best available solution to a combinatorial estimator such as the Minimum Covariance Determinant (MCD) - these are the preliminary estimates; (3) for each preliminary estimate, iterate to the solution of a smooth estimator chosen for robustness and outlier resistance; and (4) choose among the final iterates based on a robust criterion, such as minimum volume. Use of this algorithm architecture can enable reliable, fast, robust estimation of heavily contaminated multivariate data in high (> 20) dimension even with large quantities of data. A computer program implementing the algorithm is available from the authors.

Original languageEnglish (US)
Pages (from-to)245-255
Number of pages11
JournalJournal of Statistical Planning and Inference
Volume57
Issue number2
StatePublished - Feb 1 1997

Fingerprint

Robust Estimation
Iterate
Outlier
Minimum Covariance Determinant
Leverage Points
Estimator
Multivariate Data
Estimate
Computer program listings
Choose
Partition
Robustness
Necessary
Series
Robust estimation
Architecture
Strategy
Outliers

Keywords

  • M-estimator
  • Minimum covariance determinant (MCD)
  • Minimum volume ellipsoid (MVE)
  • S-estimator

ASJC Scopus subject areas

  • Statistics, Probability and Uncertainty
  • Applied Mathematics
  • Statistics and Probability

Cite this

Robust estimation of multivariate location and shape. / Rocke, David M; Woodruff, David L.

In: Journal of Statistical Planning and Inference, Vol. 57, No. 2, 01.02.1997, p. 245-255.

Research output: Contribution to journalArticle

@article{ff2331bc9b1548b6afbd18b6d281bf8e,
title = "Robust estimation of multivariate location and shape",
abstract = "In this paper, we describe an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points. Parts of this strategy have been described in a series of previous papers (Rocke, Ann. Statist., in press; Rocke and Woodruff, Statist. Neerlandica 47 (1993), 27-42, J. Amer. Statist. Assoc., in press; Woodruff and Rocke, J. Comput. Graphical Statist. 2 (1993), 69-95; J. Amer. Statist. Assoc. 89 (1994), 888-896) but the overall structure is presented here for the first time. After describing the first-level architecture of a class of algorithms for this problem, we review available information about possible tactics for each major step in the process. The major steps that we have found to be necessary are as follows: (1) partition the data into groups of perhaps five times the dimension; (2) for each group, search for the best available solution to a combinatorial estimator such as the Minimum Covariance Determinant (MCD) - these are the preliminary estimates; (3) for each preliminary estimate, iterate to the solution of a smooth estimator chosen for robustness and outlier resistance; and (4) choose among the final iterates based on a robust criterion, such as minimum volume. Use of this algorithm architecture can enable reliable, fast, robust estimation of heavily contaminated multivariate data in high (> 20) dimension even with large quantities of data. A computer program implementing the algorithm is available from the authors.",
keywords = "M-estimator, Minimum covariance determinant (MCD), Minimum volume ellipsoid (MVE), S-estimator",
author = "Rocke, {David M} and Woodruff, {David L.}",
year = "1997",
month = "2",
day = "1",
language = "English (US)",
volume = "57",
pages = "245--255",
journal = "Journal of Statistical Planning and Inference",
issn = "0378-3758",
publisher = "Elsevier",
number = "2",

}

TY - JOUR

T1 - Robust estimation of multivariate location and shape

AU - Rocke, David M

AU - Woodruff, David L.

PY - 1997/2/1

Y1 - 1997/2/1

N2 - In this paper, we describe an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points. Parts of this strategy have been described in a series of previous papers (Rocke, Ann. Statist., in press; Rocke and Woodruff, Statist. Neerlandica 47 (1993), 27-42, J. Amer. Statist. Assoc., in press; Woodruff and Rocke, J. Comput. Graphical Statist. 2 (1993), 69-95; J. Amer. Statist. Assoc. 89 (1994), 888-896) but the overall structure is presented here for the first time. After describing the first-level architecture of a class of algorithms for this problem, we review available information about possible tactics for each major step in the process. The major steps that we have found to be necessary are as follows: (1) partition the data into groups of perhaps five times the dimension; (2) for each group, search for the best available solution to a combinatorial estimator such as the Minimum Covariance Determinant (MCD) - these are the preliminary estimates; (3) for each preliminary estimate, iterate to the solution of a smooth estimator chosen for robustness and outlier resistance; and (4) choose among the final iterates based on a robust criterion, such as minimum volume. Use of this algorithm architecture can enable reliable, fast, robust estimation of heavily contaminated multivariate data in high (> 20) dimension even with large quantities of data. A computer program implementing the algorithm is available from the authors.

AB - In this paper, we describe an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points. Parts of this strategy have been described in a series of previous papers (Rocke, Ann. Statist., in press; Rocke and Woodruff, Statist. Neerlandica 47 (1993), 27-42, J. Amer. Statist. Assoc., in press; Woodruff and Rocke, J. Comput. Graphical Statist. 2 (1993), 69-95; J. Amer. Statist. Assoc. 89 (1994), 888-896) but the overall structure is presented here for the first time. After describing the first-level architecture of a class of algorithms for this problem, we review available information about possible tactics for each major step in the process. The major steps that we have found to be necessary are as follows: (1) partition the data into groups of perhaps five times the dimension; (2) for each group, search for the best available solution to a combinatorial estimator such as the Minimum Covariance Determinant (MCD) - these are the preliminary estimates; (3) for each preliminary estimate, iterate to the solution of a smooth estimator chosen for robustness and outlier resistance; and (4) choose among the final iterates based on a robust criterion, such as minimum volume. Use of this algorithm architecture can enable reliable, fast, robust estimation of heavily contaminated multivariate data in high (> 20) dimension even with large quantities of data. A computer program implementing the algorithm is available from the authors.

KW - M-estimator

KW - Minimum covariance determinant (MCD)

KW - Minimum volume ellipsoid (MVE)

KW - S-estimator

UR - http://www.scopus.com/inward/record.url?scp=0031067289&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031067289&partnerID=8YFLogxK

M3 - Article

VL - 57

SP - 245

EP - 255

JO - Journal of Statistical Planning and Inference

JF - Journal of Statistical Planning and Inference

SN - 0378-3758

IS - 2

ER -