Statistical Machine-Learning Methods for Genomic Prediction Using the SKM Library

Osval A. Montesinos López; Brandon Alejandro Mosqueda González; Abelardo Montesinos López; José Crossa

doi:10.3390/genes14051003

Back

Statistical Machine-Learning Methods for Genomic Prediction Using the SKM Library

Journal article

Open access

Peer reviewed

Statistical Machine-Learning Methods for Genomic Prediction Using the SKM Library

Osval A. Montesinos López, Brandon Alejandro Mosqueda González, Abelardo Montesinos López and José Crossa

Genes, Vol.14(5), 1003

2023

DOI: https://doi.org/10.3390/genes14051003

PMID: 37239363

Files and links (1)

pdf

Published3.79 MBDownload View

CC BY V4.0, Open Access

Abstract

genomic selection

R package

SKM

statistical machine learning

Genomic selection (GS) is revolutionizing plant breeding. However, because it is a predictive methodology, a basic understanding of statistical machine-learning methods is necessary for its successful implementation. This methodology uses a reference population that contains both the phenotypic and genotypic information of genotypes to train a statistical machine-learning method. After optimization, this method is used to make predictions of candidate lines for which only genotypic information is available. However, due to a lack of time and appropriate training, it is difficult for breeders and scientists of related fields to learn all the fundamentals of prediction algorithms. With smart or highly automated software, it is possible for these professionals to appropriately implement any state-of-the-art statistical machine-learning method for its collected data without the need for an exhaustive understanding of statistical machine-learning methods and programing. For this reason, we introduce state-of-the-art statistical machine-learning methods using the Sparse Kernel Methods (SKM) R library, with complete guidelines on how to implement seven statistical machine-learning methods that are available in this library for genomic prediction (random forest, Bayesian models, support vector machine, gradient boosted machine, generalized linear models, partial least squares, feed-forward artificial neural networks). This guide includes details of the functions required to implement each of the methods, as well as others for easily implementing different tuning strategies, cross-validation strategies, and metrics to evaluate the prediction performance and different summary functions that compute it. A toy dataset illustrates how to implement statistical machine-learning methods and facilitate their use by professionals who do not possess a strong background in machine learning and programing.

Details

Title: Statistical Machine-Learning Methods for Genomic Prediction Using the SKM Library
Authors/Creators: Osval A. Montesinos López
Brandon Alejandro Mosqueda González - Instituto Politécnico Nacional
Abelardo Montesinos López
José Crossa
Publication Details: Genes, Vol.14(5), 1003
Publisher: MDPI
Grant note: CIMMYT CRP W0293; MTO 069018 / International Wheat Yield Partnership (IWYP) Hub Project 9 MTO 069033 / USAID projects Foundation for Research Levy on Agricultural Products (FFL) DFs-19-0000000013 / Heat and Drought Wheat Improvement Consortium (HeDWIC); Foundation for Food and Agriculture Research INV-003439 / Bill & Melinda Gates Foundation 301835; 320090 / Agricultural Agreement Research Fund (JA); Research Council of Norway
Identifiers: 991005581069607891
Murdoch Affiliation: Centre for Crop and Food Innovation
Language: English
Resource Type: Journal article

UN Sustainable Development Goals (SDGs)

This output has contributed to the advancement of the following goals:

Source: InCites

Metrics

7 File views/ downloads

69 Record Views

7 Times Cited - Web of Science

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Collaboration types: Domestic collaboration; International collaboration
Citation topics: 3 Agriculture, Environment & Ecology; 3.51 Dairy & Animal Sciences; 3.51.115 Livestock Reproduction
Web Of Science research areas: Genetics & Heredity
ESI research areas: Molecular Biology & Genetics