Logo image
What is "accuracy"? Rethinking machine learning classifier performance metrics for highly imbalanced, high variance, zero-inflated species count data
Journal article   Open access

What is "accuracy"? Rethinking machine learning classifier performance metrics for highly imbalanced, high variance, zero-inflated species count data

Bianca Owen, James Tweedley, Navid Moheimani, Christopher Hallett, Jeff Cosgrove and Leopold Silberstein
Limnology and oceanography, fluids and environments, Early Access
2025
pdf
accuracy1.87 MBDownloadView
CC BY V4.0 Open Access

Abstract

Machine learning has opened the door for the automated sorting (classification) of images, holograms and acoustic backscatters of individual plankton, invertebrates, fish and marine mammals. However, this field is complicated by decades of paradoxically promising reports of classifier performance that do not correlate with real-world uptake of this technology in aquatic sciences. Simple metrics of classifier performance are essential for optimizing, evaluating and comparing machine learning classifiers, but a wide variety of metrics and calculation variants have been proposed. Several characteristics of species count data influence metric behavior: severe imbalance and variance, zero-inflation, high class numbers and contamination with non-target classes. This study explores the hidden complexity of classifier performance metrics for species count data using synthetic datasets and simulated classifier outputs. It demonstrates how these data characteristics can severely distort metric values, with seven of eight variants of the most common metric, Accuracy, returning near-perfect scores (up to 98%) even when no instances are correctly classified. Clear recommendations are made for classifier evaluation pitfalls and metric variants to avoid, ultimately finding one variant of the F1-Score (mF1) to be the most suitable single metric, with several important calculation caveats specific to species count data. Due to ambiguous terminology and inconsistent definitions, it is often impossible to identify which variant of a performance metric has been applied in classifier studies. It is vital that authors are intentional and transparent about their metric use to support the vast potential for machine learning to revolutionize the research and monitoring of aquatic environments.

Details

UN Sustainable Development Goals (SDGs)

This output has contributed to the advancement of the following goals:

#13 Climate Action
#14 Life Below Water

Source: InCites

Metrics

2 File views/ downloads
13 Record Views

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Citation topics
3 Agriculture, Environment & Ecology
3.2 Marine Biology
3.2.1032 Marine Zooplankton
Web Of Science research areas
Limnology
Oceanography
ESI research areas
Environment/Ecology
Logo image