Pilot study: external validation of commercial veterinary radiology artificial intelligence services shows deficiencies in interpretation of general practice-sourced canine abdominal radiographs

Doris Ma; Josephine E Faulkner; Nerissa Stander; Anthea Raisis; Stephen Joslyn

doi:10.2460/javma.25.10.0691

Back

Pilot study: external validation of commercial veterinary radiology artificial intelligence services shows deficiencies in interpretation of general practice-sourced canine abdominal radiographs

Journal article

Open access

Peer reviewed

Pilot study: external validation of commercial veterinary radiology artificial intelligence services shows deficiencies in interpretation of general practice-sourced canine abdominal radiographs

Doris Ma, Josephine E Faulkner, Nerissa Stander, Anthea Raisis and Stephen Joslyn

Journal of the American Veterinary Medical Association

2026

DOI: https://doi.org/10.2460/javma.25.10.0691

PMID: 41861469

Files and links (1)

pdf

abdominal radiographs845.48 kBDownload View

Open Access

Abstract

external validation

diagnostic performance

radiology

artificial intelligence

transparency

To evaluate the diagnostic performance of commercial veterinary radiology AI platforms on general practice canine abdominal radiographs with confirmed diagnoses. For this pilot study, canine abdominal radiographs with definitive diagnoses were collected and submitted to 6 AI platforms between September and December 2024. Confirmation of diagnosis was obtained with surgery, necropsy, CT, ultrasound, cytology, or treatment response when appropriate. 53 cases were selected and submitted to AI platforms. After platform rejections, 307 evaluations were available for analysis. When differentiating cases with pathology (51 of 53) and without pathology (2 of 53), platform performance was variable and mostly low to moderate, including mean accuracy (70% to 90%), balanced accuracy (60% to 65%), and Matthews correlation coefficient (-0.08 to 0.43). Across all platforms, classification of radiographic findings (labels) showed low sensitivity (28% to 78%), F1 score (28% to 51%), and positive predictive value (25% to 54%) due to frequent missed diagnoses. Matthews correlation coefficient was higher (0.16 to 0.45), as it was less impacted by label misclassification. Small intestinal obstruction, a critical finding, was often not identified, with a sensitivity of 23% to 69%. Diagnostic performance varied between the 6 AI platforms tested and was overall low to moderate for this small sample. Even the best-performing algorithm had notable limitations, and none appeared suitable for clinical use in their current form. Further independent external validations on a larger scale and performance gains are needed before AI platforms can be safely integrated into clinical practice.

Details

Title: Pilot study: external validation of commercial veterinary radiology artificial intelligence services shows deficiencies in interpretation of general practice-sourced canine abdominal radiographs
Authors/Creators: Doris Ma - Murdoch University
Josephine E Faulkner - Murdoch University
Nerissa Stander - Murdoch University
Anthea Raisis - Murdoch University
Stephen Joslyn - Murdoch University
Publication Details: Journal of the American Veterinary Medical Association
Publisher: AVMA Publications
Identifiers: 991005875041807891
Murdoch Affiliation: School of Veterinary Medicine
Language: English
Resource Type: Journal article

Metrics

1 Record Views