Abstract
Artificial Intelligence (AI) has emerged as a transformative tool in precision agriculture, facilitating datadriven decision-making and crop improvement. In the context of agricultural crops, data from multiple modalities, such as phenotypic traits, genomic markers, and environmental conditions, offer diverse insights into crop development and yield potential. However, single-modality approaches may fail to capture the complex interplay between genomics, environment, and other factors affecting crop traits. To address this challenge, this study investigates the integration of multimodal data to improve genotype-to-phenotype predictions. Focusing on barley (Hordeum vulgare L.), a globally and nationally important cereal crop, we propose a new barley-Multimodal Deep Learning (barley-MMDL) model to predict flowering time and grain yield using heterogeneous multimodal datasets. The model combines Convolutional Neural Networks (CNNs) to process high-dimensional genomic markers with Long Short-Term Memory (LSTM) networks to capture temporal patterns in environmental data. These modality-specific latent features are then fused to enable joint optimization of feature extraction and prediction in an end-to-end manner. The proposed barleyMMDL model achieved the lowest RMSE values of 8.84 for flowering time and 778.50 for grain yield, outperforming baseline unimodal and multimodal models. These results demonstrate the improved predictive capability of barleyMMDL and underscore the potential of multimodal data integration to advance prediction capability in precision agriculture and contribute to sustainable agricultural practices.