Abstract
This paper introduces a two-phase learning approach for hyperspectral image (HSI) classification using few-shot learning. For the first phase, we present a novel spatiospectral masked autoencoder (ssMAE) - an advanced self-supervised learner. For the ssMAE backbone network, we designed a transformer encoder-decoder network, where we replaced the linear layer that is used as the initial feature embedding with a 3D convolutional layer to better extract local spectral-spatial features from 3D visible sub-patches. By tapping into vast unlabelled data, the ssMAE learns general HSI features. In the second phase, the ssMAE encoder is fine-tuned to extract discriminative features for classification by using the few-shot labelled training samples. This is achieved through a unique hybrid episode learning method that integrates the ssMAE encoder in a prototypical network. We innovate with a mix of global and local prototypes (CGL prototype) to refine label predictions. This technique maximizes data usage, focuses on specific samples, and mitagates issues from subpar episodes. Tested on three HSI datasets, our approach outperforms alternative few-shot methods. The code will be made publicly available at https://github.com/Weejaa04/SSMAE.