Machine learning has opened the door for the automated sorting (classification) of images, holograms and acoustic backscatters of individual plankton, invertebrates, fish and marine mammals. However, this field is complicated by decades of paradoxically promising reports of classifier performance that do not correlate with real-world uptake of this technology in aquatic sciences. Simple metrics of classifier performance are essential for optimizing, evaluating and comparing machine learning classifiers, but a wide variety of metrics and calculation variants have been proposed. Several characteristics of species count data influence metric behavior: severe imbalance and variance, zero-inflation, high class numbers and contamination with non-target classes. This study explores the hidden complexity of classifier performance metrics for species count data using synthetic datasets and simulated classifier outputs. It demonstrates how these data characteristics can severely distort metric values, with seven of eight variants of the most common metric, Accuracy, returning near-perfect scores (up to 98%) even when no instances are correctly classified. Clear recommendations are made for classifier evaluation pitfalls and metric variants to avoid, ultimately finding one variant of the F1-Score (mF1) to be the most suitable single metric, with several important calculation caveats specific to species count data. Due to ambiguous terminology and inconsistent definitions, it is often impossible to identify which variant of a performance metric has been applied in classifier studies. It is vital that authors are intentional and transparent about their metric use to support the vast potential for machine learning to revolutionize the research and monitoring of aquatic environments.
Details
Title
What is "accuracy"? Rethinking machine learning classifier performance metrics for highly imbalanced, high variance, zero-inflated species count data
Authors/Creators
Bianca Owen
James Tweedley - Murdoch University, Centre for Sustainable Aquatic Ecosystems
Navid Moheimani - Murdoch University, Centre for Water, Energy and Waste
Christopher Hallett
Jeff Cosgrove
Leopold Silberstein
Publication Details
Limnology and oceanography, fluids and environments, Early Access
Publisher
Wiley Periodicals LLC on behalf of Association for the Sciences of Limnology and Oceanography.