What is "accuracy"? Rethinking machine learning classifier performance metrics for highly imbalanced, high variance, zero-inflated species count data

Bianca Owen; James Tweedley; Navid Moheimani; Christopher Hallett; Jeff Cosgrove; Leopold Silberstein

doi:10.1002/lom3.70009

Back

What is "accuracy"? Rethinking machine learning classifier performance metrics for highly imbalanced, high variance, zero-inflated species count data

Journal article

Open access

What is "accuracy"? Rethinking machine learning classifier performance metrics for highly imbalanced, high variance, zero-inflated species count data

Bianca Owen, James Tweedley, Navid Moheimani, Christopher Hallett, Jeff Cosgrove and Leopold Silberstein

Limnology and oceanography, fluids and environments, Early Access

2025

DOI: https://doi.org/10.1002/lom3.70009

Files and links (1)

pdf

accuracy1.87 MBDownload View

CC BY V4.0, Open Access

Abstract

Machine learning has opened the door for the automated sorting (classification) of images, holograms and acoustic backscatters of individual plankton, invertebrates, fish and marine mammals. However, this field is complicated by decades of paradoxically promising reports of classifier performance that do not correlate with real-world uptake of this technology in aquatic sciences. Simple metrics of classifier performance are essential for optimizing, evaluating and comparing machine learning classifiers, but a wide variety of metrics and calculation variants have been proposed. Several characteristics of species count data influence metric behavior: severe imbalance and variance, zero-inflation, high class numbers and contamination with non-target classes. This study explores the hidden complexity of classifier performance metrics for species count data using synthetic datasets and simulated classifier outputs. It demonstrates how these data characteristics can severely distort metric values, with seven of eight variants of the most common metric, Accuracy, returning near-perfect scores (up to 98%) even when no instances are correctly classified. Clear recommendations are made for classifier evaluation pitfalls and metric variants to avoid, ultimately finding one variant of the F1-Score (mF1) to be the most suitable single metric, with several important calculation caveats specific to species count data. Due to ambiguous terminology and inconsistent definitions, it is often impossible to identify which variant of a performance metric has been applied in classifier studies. It is vital that authors are intentional and transparent about their metric use to support the vast potential for machine learning to revolutionize the research and monitoring of aquatic environments.

Details

Title: What is "accuracy"? Rethinking machine learning classifier performance metrics for highly imbalanced, high variance, zero-inflated species count data
Authors/Creators: Bianca Owen
James Tweedley - Murdoch University, Centre for Sustainable Aquatic Ecosystems
Navid Moheimani - Murdoch University, Centre for Water, Energy and Waste
Christopher Hallett
Jeff Cosgrove
Leopold Silberstein
Publication Details: Limnology and oceanography, fluids and environments, Early Access
Publisher: Wiley Periodicals LLC on behalf of Association for the Sciences of Limnology and Oceanography.
Number of pages: 21
Identifiers: 991005830347407891
Copyright: © 2025 The Author(s).
Murdoch Affiliation: School of Environmental and Conservation Sciences; Centre for Sustainable Aquatic Ecosystems; Centre for Water, Energy and Waste; Algae R&D Centre
Resource Type: Journal article

UN Sustainable Development Goals (SDGs)

This output has contributed to the advancement of the following goals:

Source: InCites

Metrics

10 Record Views

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Citation topics: 3 Agriculture, Environment & Ecology; 3.2 Marine Biology; 3.2.1032 Marine Zooplankton
Web Of Science research areas: Limnology; Oceanography
ESI research areas: Environment/Ecology