Abstract
Genomic selection (GS) has revolutionized breeding programmes by enabling the prediction of phenotypes based on genetic data. However, GS often only explains a portion of the phenotypic variation. This review explores the potential of integrating various data types beyond genomics to enhance the prediction ability of phenotypes. We categorize data integration strategies into five categories: eliminate, facilitate, aggregate, incorporate, and modulate. Eliminating refers to removing the effect of non-genomic data on the phenotype, such as environmental data. Facilitating methods leverage non-genomic data to improve the accuracy of GS models. Aggregating approaches combine different data types for analysis, potentially revealing variation components not captured by individual data sources. Incorporation focuses on explicitly modelling interactions between data types. Modulating methods transform data into formats suitable for advanced models such as deep learning convolutional neural networks (CNNs). The review discusses the advantages and limitations of each strategy, providing a comprehensive overview of the current state of the field. We conclude by emphasizing the prospects of multi-data phenotypic prediction towards the development of a holistic prediction approach that facilitates a more comprehensive understanding of complex biological systems and significantly enhances prediction accuracy.