Improving Deep Hyperspectral Image Classification Performance with Spectral Unmixing

2020·Arxiv

Abstract

Abstract

Recent advances in neural networks have made great progress in addressing the hyperspectral image (HSI) classification problem. However, the overfitting effect, which is mainly caused by complicated model structure and small training set, remains a major concern when applying neural networks to HSIs analysis. Reducing the complexity of the neural networks could prevent overfitting to some extent, but it declines the networks’ ability to extract more abstract features. Enlarging the training set is also difficult. To tackle the overfitting problem, we propose an abundance-based multi-HSI classification method. By applying an autoencoder-based spectral unmixing technique, different HSIs are firstly converted from the spectral domain to the abundance domain. After that, the abundance representations from multiple HSIs are collected to form an enlarged dataset. Lastly, a simple classifier is trained, which makes predictions over all the involved datasets. Taking advantage of the spectral unmixing, transforming the spectral features to the abundance features significantly simplifies the classification tasks. This enables the use of a simple network as the classifier, thus alleviating the overfitting effect. Moreover, as much dataset-specific information is eliminated by the spectral unmixing, a compatible classifier suitable for different HSIs is trained. A several times enlarged training set is constructed by assembling the abundances from different HSIs. The effectiveness of the proposed method is verified by the ablation study and the comparative experiments. On four datasets, the proposed method provides comparable results with two state-of-the-art methods, but using a much simpler model.

1 Introduction

Hyperspectral image (HSI) is a data cube consisting of reflection or radiance spectra, acquired by the remote sensors when flying over real-world objects or scenes. The height and width of an HSI are decided by the monitored scene at a specific resolution, while the depth records the measurements across a certain wavelength range. Thus, each pixel corresponds to a spectral vector. In the past decades, the HSI analysis has witnessed rapid development with plenty of applications [1, 2]. To effectively explore the rich spectral and spatial information contained in HSIs, different categories of processing techniques have been proposed, e.g., the spectral unmixing [1], the classification [3], the image restoration [4], and the target detection [5]. In this paper, we particularly address the deep learning-based HSI classification problem, where the spectral unmixing technique is smartly integrated to the classification task, such that the classification model is simplified and the training set is amplified. This helps to improve the classification performance.

The HSI classification refers to classifying each of the pixels to a certain class according to the spectral and spatial characteristics. Earlier approaches were developed based on conventional machine learning algorithms, including the principal component analysis (PCA) [6], the independent component analysis (ICA) [7], the linear discriminant analysis (LDA) [8], the support vector machine (SVM) [9], and the sparse representation [10], to name a few. Despite the great progress recently brought by the neural networks (NN) to HSI classification, the none-NN methods continue to play a role independently or as a part of the NN-based algorithms [11–14], on account of the drawbacks of the NN, e.g., the overfitting problem on small training sets [15].

Spectral unmixing (SU) is another active area of research in HSI analysis. It is assumed that each spectrum is a mixture of several “pure” material signatures, termed endmembers. The aim of SU is to extract the endmembers and to estimate their respective proportions, namely the abundance fractions, at each pixel [16]. Recently, several works have introduced the SU as a complementary source of information in HSI classification. In [17,18], the SU was applied to reduce the spectral dimension, in order to avoid the Hughes phenomenon when applying the SVM classifier. In [19], the authors investigated several SU methods and assigned the label to a pixel according to its maximum abundance. In [20, 21], the extracted abundances were used as supplementary information to improve the classification accuracy on the hard samples, namely the highly-mixed spectra. The authors in [22] considered a region-based nonnegative matrix factorization for band group based abundance estimation. The abundance matrices at different ranges of wavelengths were used as input to train a convolutional neural network (CNN) based classifier. Moreover, in the scope of HSI classification by semisupervised learning, the SU was applied in selecting the most informative samples [23–26].

The NN has gained great popularity and achieved remarkable results in many machine learning fields, e.g., the computer vision, especially after the introduction of CNN and deep learning [27, 28]. Since then, numerous investigations have been made to apply deep learning-based algorithms to the HSI classification tasks. Many of the early works considered to utilize the stacked autoencoder (SAE) to extract denoised or sparse features from the spectral or spatial-spectral data, and the obtained features were j’j usually classified with a traditional classifier, such as the SVM and the logistic regression [29–33]. More recently, several end-to-end classification methods based on CNN [34–36] and the recurrent neural network (RNN) [37, 38] have been proposed and significantly improved the classification results. See [15] for an overview of the deep learning methods for HSI classification.

To train a deep learning-based model typically requires a large amount of labeled data, otherwise, the learned model would be prone to overfitting. However, the availability of training samples is limited in HSIs due to the high expense of acquisition and manual labeling [35]. To tackle this contradiction, different strategies have proved their effectiveness in existing works, e.g., data augmentation and transfer learning. In [35], a virtual sample enhanced method was proposed to improve the performance of the CNN-based model. In [39], the authors designed a pixel-pair-based model, where the training set is composed of pixel-pairs instead of pixels. This ensures the sufficiency of labeled samples for training a deep CNN. Alternatively, transfer learning has been employed to alleviate the overfitting issue, that is, by transferring the knowledge acquired from the source domain to the target domain, the demand of training samples would be reduced. Knowledge was transferred from the ordinary RGB images to HSI classification tasks [36], and transferred from multiple HSIs to the classification tasks on small-scale HSIs [40]. Moreover, the authors in [41] applied the knowledge learned from unsupervised tasks to classification tasks on the same HSI, by transferring a pre-trained stacked denoising autoencoder and fine-tuning on the labeled samples.

In this paper, we propose an abundance-based multi-HSI classification (ABMHC) method, which alleviates the overfitting issues by taking advantage of the abundance information from multiple HSIs. To be precise, the proposed method benefits from the SU in two perspectives.

• Simple network structure: The SU maps the HSI from the high-dimensional spectral domain to the straightforward abundance domain. Benefiting from the effectiveness of the applied SU method, the estimated abundance features are expected to have more discriminative ability compared with the raw data. By performing classification on the abundance-based features, the original classification tasks will be significantly simplified, which enables the use of simple networks. It is noteworthy that simple networks usually have less overfitting issues [42].

• Enlarged training set: Transforming the HSIs into the abundance domain will eliminate the data-specific information in different HSIs, e.g., the type of sensor and the spatial-spectral resolutions. By considering a unified and relatively large number of endmembers in SU of different HSIs, the estimated abundance features of different HSIs are with the same dimension. This ensures the construction of an enlarged training set, that gathers the labeled data from all the HSIs in this study for the subsequent classifier.

The proposed ABMHC is generally composed of two featured procedures, namely 1) SU with deep autoencoder network; 2) CNN-based classification with extracted abundances. Briefly, by the deep autoencoder-based SU algorithm, the spectra from every HSI are firstly encoded into abundance vectors, that are of the identical dimension. After that, the abundance vectors from different HSIs are processed to construct an enlarged dataset. Lastly, a CNN-based classifier is trained based on the abundance patches from the enlarged dataset. The flowchart of the proposed ABMHC is given in Fig. 1.

The main contributions of the proposed ABMHC method are summarized as the following aspects.

• We verify that performing classification over abundance representations facilitates the use of simple networks, without deteriorating the performance.

Figure 1: Flowchart of the proposed abundance-based multi-HSI classification method.

• We verify that the classification performance is improved using a unified dataset constructed from unrelated HSIs, compared with using each single HSI.

• We propose a method termed ABMHC that fulfills the aforementioned motivations of simplifying network structure and enlarging training set. The proposed ABMHC is comparable to the state-of-art methods on several datasets.

The remainder of this paper is organized as follows. Section 2 briefly presents the SU model used in this paper. Section 3 presents the spectral unmixing stage with the autoencoder network, while Section 4 presents the classification stage with CNN. Section 5 reports the evaluation of the proposed method by ablation study and comparative experiments. Conclusions are drawn in Section 6.

2 Notations in Spectral unmixing

The SU consists of decomposing each observed spectrum as a mixture of endmembers with their proportions being abundances. According to different underlying mixing mechanisms, the SU models and associated algorithms are roughly divided into the linear and the nonlinear ones. Extensive SU models and algorithms have been proposed, as reviewed in [1, 43]. Of particular note is the recent applications of deep autoencoders for SU, as investigated in [44–49]. In this section, we succinctly present the SU model to be considered in this paper, which is proposed in [45,46].

Given an HSI, let X = [be a matrix composed by N observed spectra over B bands, where is the i-th spectrum vector, for i = 1, 2, ..., N. Assume that the HSI is known to be mixed by R endmembers. Let M = [represent the endmember matrix, with being the spectrum of the i-th endmember. The abundance vector associated with the i-th pixel is denoted as = [, its entry being the fractional abundance w.r.t. the j-th endmember, The linear mixing model (LMM) assumes each observed pixel to be represented as a linear combination of the endmembers, with

where is the additive noise vector. Similar to [46], this paper considers a generalized SU model that combines the LMM and an additive nonlinear model, given by

where Φ is a nonlinear function that characterizes the interactions between the endmembers, parameterized by the abundance vector, and is a hyperparameter balancing the weights of the linear and nonlinear parts. To satisfy a physical interpretation, both the abundance nonnegativity constraint (ANC) and abundance sum-to-one constraint

(ASC) are enforced to the model, which are

3 Spectral unmixing with deep autoencoder network

In this section, we introduce a deep autoencoder, the encoder of which mimics the generalized SU procedure in (2), to estimate the abundance representations from the HSI. The proposed deep autoencoder follows the same procedure as in [45,46], but has different network structures and implementation.

Basically, an autoencoder is composed of two parts, namely an encoder and a decoder. The encoder, encode : , maps a sample from the input space to the feature space by

In most cases, the dimension of the input space is higher than that of the feature space, i.e., R < B, which indicates that the encoder compresses the information from input vector x to feature vector ˆa. The decoder, decode : , maps the feature vector ˆa to an approximation ˆx of the original sample x, from a low dimensional space to a high dimensional space, by

When the autoencoder is good enough so that the input sample x and the reconstruction ˆx are similar under some metric, it is inferred that the feature vector ˆa retrieves most of the information from x. Specifically, when the feature vector ˆa, namely the output of the encoder, satisfies both the ANC and the ASC in (3), the encoder itself is interpreted as a blind SU procedure, and ˆa is taken as the abundance vector.

In this paper, both the encoder and the decoder are realized by NN. Let and be the learnable parameters of the encoder network and the decoder network, respectively. We use the notations encode() and decode() to represent the encoder and the decoder, where the endmember matrix M is another part of learnable parameters in the decoder.

The structure of the encoder encode() is shown in Fig. 2. It has 4 layers in total, namely two 1D convolutional layers [50], one fully connected layer, and one normalization layer. The 1D convolutional layer operates similarly as the plain 2-D convolution, but the convolutional operation is limited to one dimension. In this paper, the 1D convolutional layers are set with kernel size 3, stride 1, followed by the ReLU activations. The normalization layer is used to impose ANC and ASC to the encoded abundance feature ˆa = [ˆby

as suggested in [45].

The structure of the decoder decode() is more complicated. It is designed to consist of two parts, that correspond to the linear and nonlinear mixing models, in accordance with the latent mixing mechanism in (2). As illustrated in Fig. 3, the upper part corresponds to the LMM, expressed by Ma, while the lower part represents the nonlinear mixing model given by Φ(M, a). For the nonlinear part, we first multiply each endmember by its fractional abundance, thus generating a set of weighted endmembers, given by [ˆ]. Later, the weighted endmembers flow through five 1D convolutional layers with different numbers of output channels and end up with a vector of length B. All the five 1D convolutional layers are set to have kernel size 1 and stride 1. This setting ensures that the effect of the 1D convolutional layer could be interpreted by the interaction between the channels of the input data. In view of this, the nonlinear part of the decoder simulates the interactions between the endmember signatures. Each of the 1D convolutional layers is followed by the ReLU activation, except for the last one. In the end, the reconstructed spectrum is estimated by the weighted sum of the linear and nonlinear estimations.

To train the proposed autoencoder for SU, which is expressed by

the well-known gradient-based optimization algorithm is applied. We adopt the mean squared error

Figure 2: Structure of the encoder.

Figure 3: Structure of the decoder.

as the reconstruction error. By minimizing (9), the optimized parameters ˆand (ˆˆM) is learned. Following [45], before optimization, the endmember matrix M is initialized by the vertex component analysis (VCA) [51], while the initial values of and are selected from the uniform distribution. Hereafter we use encode() and decode() to denote the learned models encode(; ˆ) and decode(; ˆˆM), respectively, unless otherwise stated.

In practice, the size of network input is decided by the number of bands of the HSI under process. The number of endmembers is set to be 16, a number that is larger than the real number of endmembers for most HSIs. The hyperparameter is set to be 0.5, following [45,46]. The HSIs are normalized to the range [0, 1] before fed to the autoencoder. As the research focus of this work is not the SU model and method, the analysis of the effects of these hyperparameters is omitted. Readers may refer to [45] for a detailed hyperparameter analysis of a similar SU procedure. The autoencoder is optimized by the Adam algorithm [52]. The structures of networks are designed and realized with AutoKeras [53] and Tensorflow [54].

4 Multi-HSI classiﬁcation with convolutional neural network

In this section, we use a simple CNN model based on both the spatial information and the abundance representations to jointly classify multiple HSIs. To alleviate the overfitting issue, merging different HSIs into one big dataset is one of the most fundamental motivations of this paper. Different from existing CNN models, which process the raw data from single HSI, the proposed algorithm is capable to process the abundance data from multiple HSIs simultaneously.

4.1 Preparation of training and testing data

To construct a big dataset from different HSIs, we propose the following processings of the autoencoder-extracted abundance representations. Given K HSIs to be classified, namely = 1, 2, . . . , K}, assume that contains labeled classes. By training an autoencoder decode(encode)), which is described in Section 3, for every HSI separately, we obtain the following abundance representations

To take advantage of both spatial context and abundance information for improving the classification performance, each abundance representation ˆis firstly divided into the labeled abundance patches

where ˆrepresents an abundance patch from ˆ, and corresponds to a pixel patch in the original image . The label is selected as the label of the pixel patch center, and ranges from 0 to 1. Assume we have following K sets of labeled abundance patches generated from K HSIs,

A big merged dataset S is constructed by collecting all sets of labeled abundance patches, with labels rearranged to avoid overlap. To be precise, the labeled abundance patch (ˆ) is relabeled to

before collected into S. By doing so, in the merged dataset, each HSI occupies a specific interval of integers as class labels, without mutual overlaps of labels with other HSIs. In summary, the big dataset S assembles all the samples from ˆ= 1, 2, . . . , K, with C = being the total number of classes.

4.2 Classification with CNN

Given a dataset S and a sample (to classify, the classification task is interpreted as finding a function classify() that maps the abundance patch a to the correct label y. This task is realized by a simple neural network that consists of most of the well-known CNN layers. As illustrated in Fig. 4, the network takes abundance data, which has R channels, as input. Later, the data flows through three CNN layers, each followed by a ReLU activation. These CNN layers are set with kernel size 3 3 and stride 1. In the last two steps, a fully connected layer maps the data into a vector with length C, namely the total number of classes in S; and a softmax layer finally transforms the vector into the output with one-hot style. The softmax function produces the predicted probability distribution of sample a as follows

Separate the merged dataset S into the training set and the testing set . Let be the parameters of the proposed CNN, and ˆy be the prediction on a. In the training stage, the parameters are optimized by minimizing the following cross-entropy loss

over the training set, where y = [and = [ˆare the one-hot encodes of y and ˆy, respectively. After minimizing (14) by gradient-based optimization, the optimized parameters ˆare obtained. Hereafter the trained classifier classify(; ˆ) is abbreviated by classify().

Figure 4: Structure of the classifier.

In the testing stage, when a testing sample a is fed into the trained classifier classify(), the one-hot style prediction is generated as

In most of the existing classification models, the final prediction ˆy is calculated by

However, in this paper, as we know in advance from which HSI the sample comes, an elaborated strategy is applied to further improve the accuracy of testing. Assume the testing sample a in known to be from the abundance representation of the k-th HSI. The predicted label of a is calculated by performing arg max function merely on the fragment of corresponding to , by

5 Experiments

In this section, we perform a series of experiments including the ablation study and comparative study with several state-of-the-art methods on four public HSI datasets, to verify the effectiveness of the innovative ideas and to demonstrate the performance of the proposed method.

5.1 Datasets

In this paper, experiments are performed on four public HSI datasets, i.e., the Paiva University scene, the Pavia Centre scene, the Salinas scene, and the Houston2018 scene(grss dfc 2018) [55].

The Pavia University scene is acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor. Removing the noisy bands and a blank strip, the data size in format heightwidthdepth is 610 pixel340 pixel103 band. The spatial resolution is about 1.3 meters. As shown in TABLE 1, the pixels are labeled with 9 classes. In practice, 200 9 labeled samples are chosen to form the training set, while the rest of the labeled samples form the testing set. The non-labeled pixels constitute the backgrounds. Fig. 5 depicts the false color composite and the representation of groundtruth.

The Pavia Centre scene is also acquired by the ROSIS sensor. The data size is 1096 pixel 715 pixel 102 band after removing the noisy bands. The spatial resolution is also 1.3 meters. As illustrated in TABLE 2, the pixels in the Pavia University scene are labeled with 9 classes. The constructions of the training and testing set are in the same way as in the Pavia University scene. Fig. 6 illustrates the false color composite and the representation of groundtruth.

Table 1: Reference classes and sizes of training and testing sets of Pavia University image

Figure 5: The false color composite (band 10, 20, 40) and groundtruth representation of Pavia University

Table 2: Reference classes and sizes of training and testing sets of Pavia Centre image

Figure 6: The false color composite (band 10, 20, 40) and groundtruth map of Pavia Centre

Table 3: Reference classes and sizes of training and testing sets of Salinas image

Figure 7: The false color composite (band 180, 100, 10) and groundtruth representation of Salinas

The Salinas scene is collected by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS). After the removal of the water absorption bands, the remained HSI has a size of 512 pixel 217 pixel 204 band. The pixels are classified into 16 categories, and 20016 samples are picked for training, as shown in TABLE 3. The false color composite and the representation of groundtruth is shown in Fig. 7.

The Houston2018 (grss dfc 2018) scene is acquired by the National Center for Airborne Laser Mapping over the University of Houston campus and its neighborhood. The size of this HSI is 601 pixel 2384 pixel 48 band, with a 1-meter ground sample distance. However, the groundtruth matrix has a quadrupled size 1202 pixel 4768 pixel with a 0.5-meter ground sample distance. In practice, the label for each pixel is determined by the largest vote strategy using the groundtruth matrix. As illustrated in TABLE 4, there are 20 classes in the grss dfc 2018, and a large variance exists among the sample numbers of different classes. For each class, the size of the training set is chosen as 20% of the samples, truncated by 3200.

Finally, for all the datasets, the number of training samples in each class is enlarged to 3200 by data augmentation, using rotation, mirroring and duplicating.

Table 4: Reference classes and sizes of training and testing sets of grss dfc 2018 image

Figure 8: The false color composite (band 48, 28, 8) and groundtruth representation of grss dfc 2018

5.2 Ablation study and comparative experiments

We design the ablation study to verify that the effectiveness of the proposed ABMHC is mainly attributed to the following two factors: 1) The abundance features extracted by autoencoder-based SU have more discriminative ability than the raw spectra, hence they are better classified by the CNN-based classifier; 2) The combination of abundance representations from different HSIs yields a compatible classifier that is more powerful than the data-specific classifier. As a baseline, we train a CNN-based classifier directly on the raw spectral data for each HSI, and term this method as raw-CNN. Besides, the the abundance-based HSI classification is performed on each dataset, and we refer this series of experiments as abun-CNN. The first aforementioned factor can be evaluated by comparing the results of raw-CNN and abun-CNN. Finally, we perform the abundance-based and multi-HSIs-based algorithm on the merged big training set, which is the proposed ABMHC. The comparison between abun-CNN and ABMHC proves the effectiveness of the second aforementioned factor. To keep a fair comparison, the same network structures and hyperparameters are utilized in raw-CNN, abun-CNN and ABMHC, as explicated in Section 3 and Section 4. The size of abundance patches and HSI patches adopted in the CNNs of the proposed ABMHC and its ablation study is set to 11 pixel11 pixel. The effect of this parameter on classification performance is not analyzed in this paper, as investigations have already been made in several existing works [56,57].

To further evaluate the performance of proposed ABMHC, we choose two lately-proposed classification algorithms for comparison, namely the method of pixel-pair feature (PPF) [39] and the hybrid spectral net (HybridSN) [58,59]. Both state-of-the-art methods are based on deep learning with CNN, and have shown promising classification results on several HSIs datasets. For fairness, all the comparing methods are performed using the training sets with the same size, as described in Section 4.1, except for the PPF algorithm on grss dfc 2018. In fact, the PPF generates pixel pairs as training samples, so that the size of the training set is squared. This leads to an out-of-memory situation on our server equipped with 256G RAM. In practice, the original training set for PPF on grss dfc 2018 is constructed by choosing 20% of the labeled samples from each class, and truncating the number by 1600. For this reason, the performance of PPF on this dataset is not satisfactory, as to be given in TABLE 8.

5.3 Results analysis

We apply three commonly used metrics to evaluate the performances of all the algorithms, namely OA, AA, and . The overall accuracy (OA) represents the ratio of the correctly classified samples number to the total samples number; the average accuracy (AA) is the mean accuracy of different classes; the Cohen’s kappa coefficient measures the agreement between the predicted labels and the groundtruth labels.

The results obtained by the proposed ABMHC, the ablation study methods, i.e., raw-CNN and abun-CNN and two state-of-the-art methods on the aforementioned HSI datasets are listed in TABLES 5–8.

Table 6: classification accuracies (averaged over 5 runs) on pavia centre scene

Table 8: classification accuracies (averaged over 5 runs) on grss dfc 2018 scene

We observe the following facts from the ablation study. Firstly, the abun-CNN method always outperforms the raw-CNN method with a large margin in terms of all the metrics, on all the datasets. This indicates that compared with the raw spectral features, the abundance features extracted by autoencoder demonstrate improved discriminative ability, thus boosting the performance of the classifier. Secondly, compared with the abun-CNN method, which employs the abundance representation from one single HSI, the multi-HSI based ABMHC always leads to better classification performances in all the metrics, on all the datasets. This demonstrates that enlarging the training set by merging the abundance information from different HSIs augments the classifier performance. A convincing explanation of this phenomenon is that the merged training set alleviates the overfitting issue on the CNN-based classifier by increasing the number of training samples.

The proposed ABMHC leads to comparable performance to the state-of-the-art algorithms. It outperforms the PPF method with a large margin on all the datasets. In comparison with the latest HybridSN, the proposed ABMHC generally provides comparable results. To be precise, our method slightly outperforms HybridSN on the Pavia University scene and the Pavia Centre scene, and obtains almost equal results as the HybridSN on Salinas scene. On grss dfc 2018 dataset, the proposed method surpasses the HybridSN by almost 9% in AA, while slightly inferior to its counterpart in OA and . It is worth noting that, while the proposed ABMHC utilizes a simple CNN classifier, which is plain and shallow, the HybridSN method employs a far more complicated network structure [58].

6 Conclusion

In this paper, we proposed an abundance-based multi-HSI classification method, to address the overfitting issue in deep learning-based classification. The original intention of the proposed method is two-fold: 1) The abundance features extracted by SU have more discriminative ability than the raw spectral features, which enables the use of simple networks to alleviate the overfitting issue; 2) Training a classifier with multiple HSIs will lead to superior performance than training with one single HSI, as enlarging the training set usually alleviates the overfitting issue. This idea becomes feasible by transforming multiple HSIs from the spectral domain to the abundance domain by SU.

From these two aspects, we first designed and trained autoencoder-based SU model for each HSI seperately. After that, the HSIs were mapped to the abundance domain by the learned autoencoders. Lastly, a compatible classifier was trained by the abundance features from multiple HSIs, and further applied to predict the labels on the testing sets. The ablation study and comparative experiments were performed on four datasets. The results in the ablation study confirmed the original intention. The comparative experiments showed that our method provided comparable classification performance to the state-of-the-art methods, but using a far more simplified model structure.

7 Acknowledgement

The authors would like to thank the National Center for Airborne Laser Mapping and the Hyperspectral Image Analysis Laboratory at the University of Houston for acquiring and providing the data used in this study, and the IEEE GRSS Image Analysis and Data Fusion Technical Committee.

References

[1] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and J. Chanussot, “Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 2, pp. 354–379, Apr. 2012.

[2] C. I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification. Plenum Publishing Co., 2003.

[3] L. Wang, C. Shi, C. Diao, W. Ji, and D. Yin, “A survey of methods incorporating spatial information in image classification and spectral unmixing,” International Journal of Remote Sensing, vol. 37, no. 16, pp. 3870–3910, 2016.

[4] H. Fan, Y. Chen, Y. Guo, H. Zhang, and G. Kuang, “Hyperspectral image restoration using low-rank tensor recovery,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. PP, no. 99, pp. 1–16, Oct. 2017.

[5] N. M. Nasrabadi, “Hyperspectral target detection : An overview of current and future challenges,” IEEE Signal Processing Magazine, vol. 31, no. 1, pp. 34–44, Jan. 2014.

[6] J. Jiang, J. Ma, C. Chen, Z. Wang, Z. Cai, and L. Wang, “Superpca: A superpixelwise pca approach for unsupervised feature extraction of hyperspectral imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 8, pp. 4581–4593, Aug. 2018.

[7] A. Villa, J. A. Benediktsson, J. Chanussot, and C. Jutten, “Hyperspectral image classification with independent component discriminant analysis,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 12, pp. 4865–4876, Dec. 2011.

[8] W. Li, S. Prasad, J. E. Fowler, and L. M. Bruce, “Locality-preserving dimensionality reduction and classification for hyperspectral image analysis,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 4, pp. 1185–1198, Apr. 2012.

[9] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector ma- chines,” IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 8, pp. 1778–1790, Aug. 2004.

[10] L. Fang, S. Li, X. Kang, and J. A. Benediktsson, “Spectral-spatial hyperspectral image classification via multiscale adaptive sparse representation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 12, pp. 7738–7749, Dec. 2014.

[11] R. Neware and A. Khan, “Survey on classification techniques used in remote sensing for satellite images,” in 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2018, pp. 1860–1863.

[12] I. Pachn, S. Ramrez, D. Fonseca, P. Lozano-Rivera, C. Ariza, M. Paula Mancipe, M. Villamizar, H. Castro, E. Cabrera, and M. Teresa Becerra, “Random forest data cube based algorithm for land cover classification: A colombian case,” in IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, 2018, pp. 8651–8654.

[13] H. Huang, Y. Duan, H. He, and G. Shi, “Local linear spatialspectral probabilistic distribution for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 2, pp. 1259–1272, Feb. 2020.

[14] C. Chang, “Statistical detection theory approach to hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 4, pp. 2057–2074, Apr. 2019.

[15] M. Paoletti, J. Haut, J. Plaza, and A. Plaza, “Deep learning classifiers for hyperspectral imaging: A review,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 158, pp. 279 – 317, 2019.

[16] N. Keshava and J. F. Mustard, “Spectral unmixing,” IEEE Signal Processing Magazine, vol. 19, no. 1, pp. 44–57, Jan. 2002.

[17] I. Dopido, M. Zortea, A. Villa, A. Plaza, and P. Gamba, “Unmixing prior to supervised classification of remotely sensed hyperspectral images,” IEEE Geoscience and Remote Sensing Letters, vol. 8, no. 4, pp. 760–764, Jul. 2011.

[18] I. Dopido, A. Villa, A. Plaza, and P. Gamba, “A quantitative and comparative assessment of unmixing- based feature extraction techniques for hyperspectral image classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 2, pp. 421–435, Apr. 2012.

[19] E. Ibarrola-Ulzurrun, L. Drumetz, J. Marcello, C. Gonzalo-Martn, and J. Chanussot, “Hyperspectral classifi- cation through unmixing abundance maps addressing spectral variability,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 7, pp. 4775–4788, Jul. 2019.

[20] A. Villa, J. Chanussot, J. A. Benediktsson, and C. Jutten, “Spectral unmixing for the classification of hyper- spectral images at a finer spatial resolution,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 3, pp. 521–533, Jun. 2011.

[21] B. Fang, Y. Bai, and Y. Li, “Combining spectral unmixing and 3d/2d dense networks with early-exiting strategy for hyperspectral image classification,” Remote Sensing, vol. 12, no. 5, 2020.

[22] F. I. Alam, J. Zhou, L. Tong, A. W. Liew, and Y. Gao, “Combining unmixing and deep feature learning for hyperspectral image classification,” in 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2017, pp. 1–8.

[23] I. Dpido, J. Li, P. Gamba, and A. Plaza, “A new hybrid strategy combining semisupervised classification and unmixing of hyperspectral data,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 8, pp. 3619–3629, Aug. 2014.

[24] J. Li, I. Dpido, P. Gamba, and A. Plaza, “Complementarity of discriminative classifiers and spectral unmixing techniques for the interpretation of hyperspectral images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 5, pp. 2899–2912, May 2015.

[25] A. Samat, J. Li, S. Liu, P. Du, Z. Miao, and J. Luo, “Improved hyperspectral image classification by active learning using pre-designed mixed pixels,” Pattern Recognition, vol. 51, pp. 43 – 58, 2016.

[26] Y. Sun, X. Zhang, A. Plaza, J. Li, I. Dpido, and Y. Liu, “A new semi-supervised classification strategy combining active learning and spectral unmixing of hyperspectral data,” in High-Performance Computing in Geoscience and Remote Sensing VI, B. Huang, S. Lpez, Z. Wu, J. M. Nascimento, J. Li, and V. V. Strotov, Eds., vol. 10007, International Society for Optics and Photonics. SPIE, 2016, pp. 44 – 51.

[27] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.

[28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural net- works,” in Advances in neural information processing systems, 2012, pp. 1097–1105.

[29] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-based classification of hyperspectral data,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp. 2094– 2107, Jun. 2014.

[30] X. Ma, H. Wang, and J. Geng, “Spectralspatial classification of hyperspectral image based on deep autoencoder,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 9, no. 9, pp. 4073–4085, Sep. 2016.

[31] C. Tao, H. Pan, Y. Li, and Z. Zou, “Unsupervised spectralspatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 12, pp. 2438–2442, Dec. 2015.

[32] P. Zhou, J. Han, G. Cheng, and B. Zhang, “Learning compact and discriminative stacked autoencoder for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 7, pp. 4823–4833, Jul. 2019.

[33] X. Zhang, Y. Liang, C. Li, N. Huyan, L. Jiao, and H. Zhou, “Recursive autoencoders-based unsupervised feature learning for hyperspectral image classification,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 11, pp. 1928–1932, Nov. 2017.

[34] V. Slavkovikj, S. Verstockt, W. De Neve, S. Van Hoecke, and R. Van de Walle, “Hyperspectral image classifica- tion with convolutional neural networks,” in Proceedings of the ACM international conference on Multimedia. ACM, 2015, pp. 1159–1162.

[35] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extraction and classification of hyperspectral images based on convolutional neural networks,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 10, pp. 6232–6251, Oct. 2016.

[36] L. Jiao, M. Liang, H. Chen, S. Yang, H. Liu, and X. Cao, “Deep fully convolutional network-based spatial distribution prediction for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 10, pp. 5585–5599, Oct. 2017.

[37] L. Mou, P. Ghamisi, and X. X. Zhu, “Deep recurrent neural networks for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 7, pp. 3639–3655, Apr. 2017.

[38] X. Zhang, Y. Sun, K. Jiang, C. Li, L. Jiao, and H. Zhou, “Spatial sequential recurrent neural network for hyperspectral image classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 11, no. 11, pp. 4141–4155, Nov. 2018.

[39] W. Li, G. Wu, F. Zhang, and Q. Du, “Hyperspectral image classification using deep pixel-pair features,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 2, pp. 844–853, Feb. 2017.

[40] X. Zhao, Y. Liang, A. J. Guo, and F. Zhu, “Classification of small-scale hyperspectral images with multi-source deep transfer learning,” Remote Sensing Letters, vol. 11, no. 4, pp. 303–312, 2020.

[41] C. Xing, L. Ma, and X. Yang, “Stacked denoise autoencoder based feature extraction and classification for hyperspectral images,” Journal of Sensors, vol. 2016, 2016.

[42] G. E. Hinton and D. van Camp, “Keeping the neural networks simple by minimizing the description length of the weights,” in Proceedings of the Sixth Annual Conference on Computational Learning Theory, ser. COLT 93. New York, NY, USA: Association for Computing Machinery, 1993, p. 513.

[43] N. Dobigeon, J. Tourneret, C. Richard, J. C. M. Bermudez, S. McLaughlin, and A. O. Hero, “Nonlinear unmixing of hyperspectral images: Models and algorithms,” IEEE Signal Processing Magazine, vol. 31, no. 1, pp. 82–94, Jan. 2014.

[44] R. Guo, W. Wang, and H. Qi, “Hyperspectral image unmixing using autoencoder cascade,” in 2015 7th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Jun. 2015, pp. 1–4.

[45] M. Wang, M. Zhao, J. Chen, and S. Rahardja, “Nonlinear unmixing of hyperspectral data via deep autoencoder networks,” IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 9, pp. 1467–1471, Sep. 2019.

[46] M. Zhao, M. Wang, J. Chen, and S. Rahardja, “Hyperspectral unmixing via deep autoencoder networks for a generalized linear-mixture/nonlinear-fluctuation model,” arXiv:1904.13017.

[47] S. Ozkan, B. Kaya, and G. B. Akar, “Endnet: Sparse autoencoder network for endmember extraction and hyperspectral unmixing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 1, pp. 482–496, Jan. 2019.

[48] Y. Su, J. Li, A. Plaza, A. Marinoni, P. Gamba, and S. Chakravortty, “Daen: Deep autoencoder networks for hyperspectral unmixing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 7, pp. 4309–4321, Jul. 2019.

[49] F. Khajehrayeni and H. Ghassemian, “Hyperspectral unmixing using deep convolutional autoencoders in a supervised scenario,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 567–576, 2020.

[50] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman, “1d convolutional neural networks and applications: A survey,” arXiv preprint arXiv:1905.03554, 2019.

[51] J. M. P. Nascimento and J. M. B. Dias, “Vertex component analysis: a fast algorithm to unmix hyperspectral data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 43, no. 4, pp. 898–910, Apr. 2005.

[52] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015.

[53] H. Jin, Q. Song, and X. Hu, “Auto-keras: An efficient neural architecture search system,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2019, pp. 1946–1956.

[54] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: a system for large-scale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283.

[55] Y. Xu, B. Du, L. Zhang, D. Cerra, M. Pato, E. Carmona, S. Prasad, N. Yokoya, R. Hnsch, and B. Le Saux, “Advanced multi-sensor optical remote sensing for urban land use and land cover classification: Outcome of the 2018 ieee grss data fusion contest,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 6, pp. 1709–1724, Jun. 2019.

[56] A. J. X. Guo and F. Zhu, “Spectral-spatial feature extraction and classification by ann supervised with center loss in hyperspectral imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 3, pp. 1755–1767, Mar. 2019.

[57] X. Cao, X. Wang, D. Wang, J. Zhao, and L. Jiao, “Spectralspatial hyperspectral image classification using cascaded markov random fields,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 12, pp. 4861–4872, Dec. 2019.

[58] S. K. Roy, G. Krishna, S. R. Dubey, and B. B. Chaudhuri, “Hybridsn: Exploring 3-d2-d cnn feature hierarchy for hyperspectral image classification,” IEEE Geoscience and Remote Sensing Letters, vol. 17, no. 2, pp. 277– 281, Feb. 2020.

[59] S. K. Roy, S. Chatterjee, S. Bhattacharyya, B. B. Chaudhuri, and J. Plato, “Lightweight spectral-spatial squeeze-and-excitation residual bag-of-features learning for hyperspectral classification,” IEEE Transactions on Geoscience and Remote Sensing, pp. 1–14, 2020.

designed for accessibility and to further open science