Logo image
A comparison of procedures for robust multivariate outlier detection
Thesis   Open access

A comparison of procedures for robust multivariate outlier detection

Andrew Grose
Honours, Murdoch University
2019
pdf
Grose2019.pdfDownloadView
Whole Thesis Open Access

Abstract

This thesis aims to make a comparison between the efficacy of three prominent multivariate outlier detection algorithms, and in addition compare the efficiency of the methods (methods with outliers removed) with common robust estimation methods. This is done by comparing estimates of their relative efficiency based on estimates of multivariate mean in Monte Carlo simulation. The number of samples generated for these simulations is chosen to ensure reliable estimates yet avoid excessive computing times. Efficacy in comparison of multivariate outlier detection is judged by examining performance in terms of power in finding outliers in simulated data from contaminated distributions, as well as size for uncontaminated distributions, as well as looking at power in real-world data sets with known or planted outliers. Comparisons of time elapsed for various routines are briefly investigated, albeit on an ad hoc basis. A particular motivation for this study is the focus on a specific adaptive method known as the adaptive trimmed likelihood algorithm (ATLA). ATLA is the multivariate version of a method which developed out of adaptive univariate location estimation first explored in Clarke (1994) and later related in terms of asymptotic theory in Bednarski and Clarke (2002). The asymptotic theory for the trimmed likelihood estimator was countenanced in Bednarski and Clarke (1993). A numerical routine using what is termed forward search and various comparisons made using ATLA are also described in Schubert (2005). The routine was later slightly modified by Robert Hammarstrand for completeness and is available in the supplementary materials of Clarke (2018) at the website: https://www.wiley.com/en-au/Robustness+Theory+and+Application-p- 9781118669303 Also available at that website is an algorithm called Onesample written by Brenton R Clarke and Betty Mouchel that implements the initial algorithm used to evaluate the estimator described in Clarke (1994) in the case of univariate estimation. For multivariate estimation, ATLA serves as an outlier detection method based on the use of the minimum covariance determinant (MCD) (Rousseeuw; 1983) used in an adaptive way. Simulations and resulting output given in this thesis are presented with software in R and MATLAB, calling on new and pre-established functions available in downloadable packages. The code is presented in the appendix, together with supporting information that details the respective tasks of the associated functions along with the various functions and/or packages that are required.

Details

Metrics

46 File views/ downloads
144 Record Views
Logo image