Output list
Journal article
A smoothed three-part redescending M-estimator
First online publication 2025
Stats, 8, 2, 33
A smoothed M-estimator is derived from Hampel’s three-part redescending estimator for location and scale. The estimator is shown to be weakly continuous and Fréchet differentiable in the neighbourhood of the normal distribution. Asymptotic assessment is conducted at asymmetric contaminating distributions, where smoothing is shown to improve variance and change-of-variance sensitivity. Other robust metrics compared are largely unchanged, and therefore, the smoothed functions represent an improvement for asymmetric contamination near the rejection point with little downside.
Journal article
Published 2024
Australian & New Zealand journal of statistics, 65, 4, 309 - 326
Summary Two‐way layouts are common in grain industry research where it is often the case that there are one or more covariates. It is widely recognised that when estimating fixed effect parameters, one should also examine for possible extra error variance structure. An exact test for heteroscedasticity, when there is a covariate, is illustrated for a data set from frost trials in Western Australia. While the general algebra for the test is known, albeit in past literature, there are computational aspects of implementing the test for the two way when there are covariates. In this scenario the test is shown to have greater power than the industry standard, and because of its exact size, is preferable to use of the restricted maximum likelihood ratio test (REMLRT) based on the approximate asymptotic distribution in this instance. Formulation of the exact test considered here involves creation of appropriate contrasts in the experimental design. This is illustrated using specific choices of observations corresponding to an index set in the linear model for the two‐way layout. Also an algorithm supplied complements the test. Comparisons of size and power then ensue. The test has natural extensions when there are unbalanced data, and more than one covariate may be present. Results can be extended to Balanced Incomplete Block Designs.
Journal article
Published 2023
Statistical Papers, 64, 395 - 420
This paper makes comparisons of automated procedures for robust multivariate outlier detection through discussion and simulation. In particular, automated procedures that use the forward search along with Mahalanobis distances to identify and classify multivariate outliers subject to predefined criteria are examined. Procedures utilizing a parametric model criterion based on a χ2-distribution are among these, whereas the multivariate Adaptive Trimmed Likelihood Algorithm (ATLA) identifies outliers based on an objective function that is derived from the asymptotics of the location estimator assuming a multivariate normal distribution. Several criterion including size (false positive rate), sensitivity, and relative efficiency are canvassed. To illustrate relative efficiency in a multivariate setting in a new way, measures of variability of the multivariate location parameter when the underlying distribution is chosen from a multivariate generalization of the Tukey–Huber ϵ-contamination model are used. Mean slippage models are also entertained. The simulation results here are illuminating and demonstrate there is no broadly accepted procedure that outperforms in all situations, albeit one may ascertain circumstances for which a particular method may be best if implemented. Finally the paper explores graphical monitoring for existence of clusters and the potential of classification through occurrence of multiple minima in the objective function using ATLA.
Journal article
Clinically and temporally specific diagnostic thresholds for plasma ACTH in the horse
Published 2021
Equine Veterinary Journal, 53, 2, 250 - 260
Objectives To derive temporally specific diagnostic thresholds for equine plasma ACTH concentration to be used alongside clinical judgement in each individual week of the year and appropriate for the degree of clinical suspicion in any given case. Furthermore, to apply these thresholds to compare the prevalence of high and low ACTH in two subgroups of animals with high and low clinical suspicion of PPID. Study design A retrospective population study examining a large laboratory database of equine plasma ACTH concentrations using an indirect approach to calculate diagnostic thresholds. Methods Logs of plasma ACTH concentrations from 75 892 individual horses were examined using robust L 2 estimation of mixtures of two normal distributions in categories of each week and month of the year. Thresholds dividing the two populations of high‐ACTH and low‐ACTH horses were then established at different levels of sensitivity and specificity and compared with clinical subgroups of horses divided based on reported clinical signs, as having high (n = 4036) or low (n = 3022) clinical suspicion of PPID. Results For most of the year there were small interweek differences in diagnostic thresholds. However, from mid‐June to early‐December diagnostic thresholds showed greater interweek variability, reaching a maximum in late September and early October. Grouping of high‐ and low‐ACTH compared favourably with grouping based on clinical signs. Main limitations Given the multiple sources of diagnostic samples, pre‐analytical data could not be fully verified. Conclusions Diagnostic thresholds for equine plasma ACTH vary through the year. It is especially important to consider the temporally specific threshold between June and December. Different clinical thresholds can be used depending on the case circumstances and whether a false‐positive or false‐negative diagnosis is deemed least desirable.
Journal article
A note on the Helmert transformation
Published 2020
Communications in Statistics - Theory and Methods, Latest Article
In this note, we consider a generalization of the well-known Helmert transformation. The main idea is that the proposed generalization allows us to obtain some new transformed normal variables...
Journal article
Performance of sample preparation for determination of gold in samples of geological origin
Published 2019
Geostandards and Geoanalytical Research, 43, 3, 435 - 452
We investigate the performance of sample preparation of gold ores using vibratory (bowl, ring and puck type) mills in common use in mineral analytical laboratories. The main criteria for effective grinding are using reduced grinding charge masses ≤ ca. 50% of nominal bowl capacity and using a grinding aid to prevent caking. We show that gold particles of millimetre scale can be comminuted to ≤ 100 µm by grinding in silica flour, bauxite, synthetic carborundum, or mixtures of silica and these materials using times of up to 5 min and that 95% < 50 µm is achievable with extended grinding. This suggests that modified grinding techniques can be used to make sample masses ≤ 5 g viable for routine determination of gold in geological samples. We also demonstrate homogenisation of a gold‐bearing copper sulfide mineral flotation concentrate alone and in mixtures with silica by extended grinding at reduced charge masses. To support this work, we develop a convenient new benchmark of gold ore sample preparation performance ‘G’, an apparent maximum gold particle size interpolated from replicate analytical variance in order to overcome the limitations of laborious sieve fraction analysis of gold particle size. We show useful agreement between G and sieve fraction analysis of gold particle size in samples and test the viability of G experimentally and by analysis of literature data.
Journal article
Published 2018
Biostatistics and Biometrics, 5, 1
The trial assessed the accuracy of anaesthesia clinicians in estimating an anaesthetised patient’s systolic blood pressure (SBP) by feeling the radial pulse. To credit their accuracy to luck, skill or circumstance, the volunteer medical participants were sequentially randomized to one of four groups: one group given no help (control), the second allowed to feel the pulse, the third given pre- and peri-operative clinical information, the fourth given both. We set out to collect 60 estimates for each group (240 estimates). The accuracy of their estimations was assessed to clinical and statistical significance. Specific objectives were to determine whether palpation statistically improved estimation of SBP and whether it could be clinically useful. Irrespective of the level of training or self-confidence, the doctors in the study performed better statistically against controls and to within pre-determined clinical relevance ranges when they were allowed to palpate the radial pulse. The degree of accuracy was enhanced by giving pre- and peri-operative information to the extent that the participant clinicians were able to estimate the systolic blood pressure to within 30mmHg accuracy 96.7% of the time.
Book
Robustness Theory and Application
Published 2018
A preeminent expert in the field explores new and exciting methodologies in the ever-growing field of robust statistics Used to develop data analytical methods, which are resistant to outlying observations in the data, while capable of detecting outliers, robust statistics is extremely useful for solving an array of common problems, such as estimating location, scale, and regression parameters. Written by an internationally recognized expert in the field of robust statistics, this book addresses a range of well-established techniques while exploring, in depth, new and exciting methodologies. Local robustness and global robustness are discussed, and problems of non-identifiability and adaptive estimation are considered. Rather than attempt an exhaustive investigation of robustness, the author provides readers with a timely review of many of the most important problems in statistical inference involving robust estimation, along with a brief look at confidence intervals for location. Throughout, the author meticulously links research in maximum likelihood estimation with the more general M-estimation methodology. Specific applications and R and some MATLAB subroutines with accompanying data sets—available both in the text and online—are employed wherever appropriate. Providing invaluable insights and guidance, Robustness Theory and Application: - Offers a balanced presentation of theory and applications within each topic-specific discussion - Features solved examples throughout which help clarify complex and/or difficult concepts - Meticulously links research in maximum likelihood type estimation with the more general M-estimation methodology - Delves into new methodologies which have been developed over the past decade without stinting on coverage of “tried-and-true” methodologies - Includes R and some MATLAB subroutines with accompanying data sets, which help illustrate the power of the methods described Robustness Theory and Application is an important resource for all statisticians interested in the topic of robust statistics. This book encompasses both past and present research, making it a valuable supplemental text for graduate-level courses in robustness.
Journal article
Published 2017
Statistical Papers, 58, 4, 1247 - 1266
The method of maximum likelihood using the EM-algorithm for fitting finite mixtures of normal distributions is the accepted method of estimation ever since it has been shown to be superior to the method of moments. Recent books testify to this. There has however been criticism of the method of maximum likelihood for this problem, the main criticism being when the variances of component distributions are unequal the likelihood is in fact unbounded and there can be multiple local maxima. Another major criticism is that the maximum likelihood estimator is not robust. Several alternative minimum distance estimators have since been proposed as a way of dealing with the first problem. This paper deals with one of these estimators which is not only superior due to its robustness, but in fact can have an advantage in numerical studies even at the model distribution. Importantly, robust alternatives of the EM-algorithm, ostensibly fitting t distributions when in fact the data are mixtures of normals, are also not competitive at the normal mixture model when compared to the chosen minimum distance estimator. It is argued for instance that natural processes should lead to mixtures whose component distributions are normal as a result of the Central Limit Theorem. On the other hand data can be contaminated because of extraneous sources as are typically assumed in robustness studies. This calls for a robust estimator
Journal article
Published 2017
Australian & New Zealand Journal of Statistics, 59, 4, 413 - 431
Long-term historical daily temperatures are used in electricity forecasting to simulate the probability distribution of future demand but can be affected by changes in recording site and climate. This paper presents a method of adjusting for the effect of these changes on daily maximum and minimum temperatures. The adjustment technique accommodates the autocorrelated and bivariate nature of the temperature data which has not previously been taken into account. The data are from Perth, Western Australia, the main electricity demand centre for the South-West of Western Australia. The statistical modelling involves a multivariate extension of the univariate time series ‘interleaving method’, which allows fully efficient simultaneous estimation of the parameters of replicated Vector Autoregressive Moving Average processes. Temperatures at the most recent weather recording location in Perth are shown to be significantly lower compared to previous sites. There is also evidence of long-term heating due to climate change especially for minimum temperatures.