### Abstract

In this paper, we describe an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points. Parts of this strategy have been described in a series of previous papers (Rocke, Ann. Statist., in press; Rocke and Woodruff, Statist. Neerlandica 47 (1993), 27-42, J. Amer. Statist. Assoc., in press; Woodruff and Rocke, J. Comput. Graphical Statist. 2 (1993), 69-95; J. Amer. Statist. Assoc. 89 (1994), 888-896) but the overall structure is presented here for the first time. After describing the first-level architecture of a class of algorithms for this problem, we review available information about possible tactics for each major step in the process. The major steps that we have found to be necessary are as follows: (1) partition the data into groups of perhaps five times the dimension; (2) for each group, search for the best available solution to a combinatorial estimator such as the Minimum Covariance Determinant (MCD) - these are the preliminary estimates; (3) for each preliminary estimate, iterate to the solution of a smooth estimator chosen for robustness and outlier resistance; and (4) choose among the final iterates based on a robust criterion, such as minimum volume. Use of this algorithm architecture can enable reliable, fast, robust estimation of heavily contaminated multivariate data in high (> 20) dimension even with large quantities of data. A computer program implementing the algorithm is available from the authors.

Original language | English (US) |
---|---|

Pages (from-to) | 245-255 |

Number of pages | 11 |

Journal | Journal of Statistical Planning and Inference |

Volume | 57 |

Issue number | 2 |

State | Published - Feb 1 1997 |

### Fingerprint

### Keywords

- M-estimator
- Minimum covariance determinant (MCD)
- Minimum volume ellipsoid (MVE)
- S-estimator

### ASJC Scopus subject areas

- Statistics, Probability and Uncertainty
- Applied Mathematics
- Statistics and Probability

### Cite this

*Journal of Statistical Planning and Inference*,

*57*(2), 245-255.

**Robust estimation of multivariate location and shape.** / Rocke, David M; Woodruff, David L.

Research output: Contribution to journal › Article

*Journal of Statistical Planning and Inference*, vol. 57, no. 2, pp. 245-255.

}

TY - JOUR

T1 - Robust estimation of multivariate location and shape

AU - Rocke, David M

AU - Woodruff, David L.

PY - 1997/2/1

Y1 - 1997/2/1

N2 - In this paper, we describe an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points. Parts of this strategy have been described in a series of previous papers (Rocke, Ann. Statist., in press; Rocke and Woodruff, Statist. Neerlandica 47 (1993), 27-42, J. Amer. Statist. Assoc., in press; Woodruff and Rocke, J. Comput. Graphical Statist. 2 (1993), 69-95; J. Amer. Statist. Assoc. 89 (1994), 888-896) but the overall structure is presented here for the first time. After describing the first-level architecture of a class of algorithms for this problem, we review available information about possible tactics for each major step in the process. The major steps that we have found to be necessary are as follows: (1) partition the data into groups of perhaps five times the dimension; (2) for each group, search for the best available solution to a combinatorial estimator such as the Minimum Covariance Determinant (MCD) - these are the preliminary estimates; (3) for each preliminary estimate, iterate to the solution of a smooth estimator chosen for robustness and outlier resistance; and (4) choose among the final iterates based on a robust criterion, such as minimum volume. Use of this algorithm architecture can enable reliable, fast, robust estimation of heavily contaminated multivariate data in high (> 20) dimension even with large quantities of data. A computer program implementing the algorithm is available from the authors.

AB - In this paper, we describe an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points. Parts of this strategy have been described in a series of previous papers (Rocke, Ann. Statist., in press; Rocke and Woodruff, Statist. Neerlandica 47 (1993), 27-42, J. Amer. Statist. Assoc., in press; Woodruff and Rocke, J. Comput. Graphical Statist. 2 (1993), 69-95; J. Amer. Statist. Assoc. 89 (1994), 888-896) but the overall structure is presented here for the first time. After describing the first-level architecture of a class of algorithms for this problem, we review available information about possible tactics for each major step in the process. The major steps that we have found to be necessary are as follows: (1) partition the data into groups of perhaps five times the dimension; (2) for each group, search for the best available solution to a combinatorial estimator such as the Minimum Covariance Determinant (MCD) - these are the preliminary estimates; (3) for each preliminary estimate, iterate to the solution of a smooth estimator chosen for robustness and outlier resistance; and (4) choose among the final iterates based on a robust criterion, such as minimum volume. Use of this algorithm architecture can enable reliable, fast, robust estimation of heavily contaminated multivariate data in high (> 20) dimension even with large quantities of data. A computer program implementing the algorithm is available from the authors.

KW - M-estimator

KW - Minimum covariance determinant (MCD)

KW - Minimum volume ellipsoid (MVE)

KW - S-estimator

UR - http://www.scopus.com/inward/record.url?scp=0031067289&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031067289&partnerID=8YFLogxK

M3 - Article

VL - 57

SP - 245

EP - 255

JO - Journal of Statistical Planning and Inference

JF - Journal of Statistical Planning and Inference

SN - 0378-3758

IS - 2

ER -