Melanoma is one of the most lethal types of skin cancer. Late detection of melanoma is associated with very low survival times,making tools for early diagnosis vital for patient care. In particular, dermoscopy has significantly improved diagnostic accuracy over examination with the naked eye.
Still, melanoma detection remains a challenging task. In general, excision is performed very often which leads to many melanomas being detected (high sensitivity) but also a lot of unnecessary, costly, and invasive procedures (low specificity).
Therefore, additional tools for melanoma detection have been proposed, such as electrical impedance spectroscopy (EIS).
Here, an electrode is placed on the skin lesion and multiple impedance measurements at different frequencies are performed between several bars. Then, a machine learning algorithm uses the complex impedance data as a feature vector to estimate the probability of the lesion being benign or malignant. The method has been shown to be effective in multiple clinical studies.
Recently, other automatic diagnosis methods have been proposed using dermoscopic images as an input to convolutional neural networks (CNNs).Also, the ISIC 2018 Skin Lesion Analysis Towards Melanoma Detection Challenge
resulted in numerous deep learning-based methods for high-performance skin lesion classification.
So far, machine learning methods for EIS and dermoscopy have been addressed separately, although dermoscopic images are generally acquired before EIS measurements.EIS and dermoscopy represent very different ways of measuring skin lesion properties. While dermoscopy captures light absorption mostly at the skin surface, EIS provides electrical resistance properties in deeper skin layers. Thus, the two methods might complement each
Figure 1. Schematic drawing of the EIS probe, following.The colors indicate measurements at different depths.
other and fusing them in a single model could improve clinical decision support. Furthermore, generalization of CNNs for dermoscopy across datasets has been shown to be problematic as models tend to overfit specific image datasets.Incorporating EIS could be helpful in this regard, as EIS classifiers have been shown to generalize well across data from different clinical studies.
As a first step, we study deep learning methods for EIS as an alternative to approaches relying on classic models such as SVMs.We propose a new recurrent model with state-max-pooling which takes domain knowledge on EIS into account. For each lesion, a varying number of EIS measurements is performed, depending on lesion size. In contrast to previous methods, we directly learn the importance of measurements by treating them as arbitrary-length sequences that are fed into recurrent models.
Second, we explore lesion classification with dermoscopic images from an EIS study using state-of-the-art CNN-based approaches.
Third, we build combined deep learning-based EIS and dermoscopic models. Here, we propose to use a cross-attention mechanism guiding information exchange between the two data sources. We compare this approach to other linear and nonlinear combination methods. For clinical decision support, automatic systems should operate at a very high sensitivity to ensure that critical lesions are not missed.Therefore, we evaluate all models at a sensitivity of 98 %.
Summarized, our contributions are two-fold. First, we propose a new model for deep learning with EIS by taking domain knowledge into account. Second, we show that combining EIS and dermoscopy with a cross-attention module substantially outperforms other combination approaches and methods using either EIS or dermoscopy.
2.1 Dataset
The dataset we use was collected in a previous multicenter clinical trial across Europe and was kindly provided to us by the curators.The final dataset we use contains 988 lesions with one dermoscopic image each and a total of 3131 impedance measurements. There are 631 benign lesions and 357 malignant lesions which include melanoma, other skin cancer or severe dysplastic lesions. The diagnosis was obtained with histopathological evaluation by three pathologists. Due to the small dataset size we use five-fold cross-validation, i.e. we split the 988 lesions into five equally sized sets with the same class balance as the full dataset. For validation, we train on three folds, validate on the fourth and leave out the fifth, repeated for each fold. For testing, we train on four folds and evaluate on the previously left out fold, repeated for each fold. In this way, we obtain predictions for all lesions. Our models classify whether a lesion is benign or malignant. For evaluation, we use all benign lesions and all malignant melanomas, i.e. our reported metrics reflect the performance for melanoma detection, the clinically most challenging lesion type. During training, we use all samples for more data variety.
The EIS device is a Nevisense, manufactured by Scibase AB. The EIS electrode contains five bars. By measuring between different bars, ten different permutations are recorded which correspond to different measurement depths. Measurements between neighboring bars record impedance close to the skin surface while measurements
Figure 2. Example dermoscopy images and impedance data. The complex impedance data is shown as magnitude and phase. Data for a benign lesion is shown. The colors indicate measurements at different depths as indicated in Figure 1.
between bars with larger distance correspond to the impedance in deeper skin layers, see Figure 1. For each permutation, frequencies from 1 kHz to 2.5 MHz, distributed on a logarithmic scale, are measured. Thus, the complex impedance values are aggregated in a feature vector of size 700.
The dermoscopic images are of varying size from different devices. First, we manually crop the lesions to cut out zero-signal (black) parts of the image. Then, we apply color constancy to all images using the Shades of Gray method with Minkowski norm p = 6. Last, we resize the images to 600 450 pixels, following Tschandl et al.
Examples for the dermoscopic images and the EIS data are shown in Figure 2.
2.2 Models
Baseline Models for EIS Data. First, we reimplement classical machine learning methods for comparison to previous approaches.For this purpose we use SVMs and fully-connected neural networks (FC-NNs). The models process one measurement at a time, i.e. for each lesion i and measurement j the model input is a feature vector
. We normalize the input data to have zero mean and unit variance. For the SVM, we use a gaussian kernel and a box constraint of C = 1. For the FC-NN, we use a two-layer network with ReLU activations and batch normalization. As most lesions have multiple measurements j for each lesion i, previous methods have used heuristics for combination. Similar to previous EIS models, we select the prediction with the highest probability of the lesion being malignant which leads to a high sensitivity of the classifier. Thus, the prediction
for lesion i is computed as
= max
where
is the number of measurements for lesion i. This can be interpreted as max-pooling over the predictions of all measurements for each lesion.
Recurrent Aggregation with EIS Data. Next, we propose a new model to process impedance data. The measurements of each lesion can be interpreted as a sequence of variable length. For this type of data, recurrent models have been very successful.Therefore, we propose the use of a model with gated recurrent units (GRUs).
The GRU receives a sequence
as its input. The GRU computes a state
using the input
and the previous state
:
where M and L are weight matrices. A typical approach for GRUs is to use the last state of the GRU as the output. However, this strategy might not be optimal for this type of data as recurrent models assume ordered sequences while the EIS measurements’ order is arbitrary. While the last state does contain gated information
Figure 3. The joint models we propose. State-Max refers to our state-max-pooling strategy. FC denotes a fully-connected layer. GAP denotes global average pooling. Linear combination of features is shown in red (Lin.). Nonlinear combination of features with an FC-layer is shown in black (FC). Our proposed cross-attention mechanism is shown in blue (CA).
from all previous states, the last sequence input has the largest contribution which is not desirable for our problem. Instead, we incorporate the max-pooling heuristic used for the baseline model directly into the GRU. Consider a state matrix H with vectors
as its columns, obtained from the input sequence x. We calculate the final GRU output as o = MAX (H) where MAX is a max pooling operation with a pooling window of size 1
. We refer to this approach as state-max-pooling and compare to the use of state-average-pooling and using
as the GRU output. After this unit, we feed the vector o into another fully-connected hidden layer with batch normalization and ReLU activation and last, we apply the output layer. Both the FCNN and GRU models are trained with Adam for 200 epochs, a batch size of 10 and a learning rate of 2
10
. Since the sequence of EIS measurements is arbitrary, we randomly permute the sequences during training. For evaluation, we average over the result of five random permutations.
Models for Dermoscopic Data. We follow the approach proposed infor CNNs with dermoscopy images. The approach lead to the best performance in the ISIC 2018 Challenge when using only publicly available data and its code is publicly available, allowing for reproducibility in contrast to other approaches. Furthermore, the method performed best for binary lesion classification
which we address in this work. In detail, we use the state-of-the-art models Densenet121 and SE-Resnext50. Due to small dataset size we consider the typical approach of pretraining on ImageNet in comparison to training from scratch. The last layer is replaced with a layer for binary classification. During training, we use random crops of size 224
224 and data augmentation with random flipping and random contrast and brightness changes. For evaluation, we use multi-crop evaluation with 36 evenly spread crops across the image which covers the entire image with large overlap. The final prediction is obtained by averaging over the crops’ individual predictions. We train the models with Adam for 100 epochs, a batch size of 20 and a learning rate of 2
10
.
Combined Models. As a next step, we combine our models for EIS and dermoscopic data. A straightforward way of combining models is ensembling. Here, we combine the predictions and
from the two independent models by selecting the maximum predicted probability p = max
for each lesion. Furthermore, we build joint models where both data sources are fed into the model. In this way, the models can directly learn a combination of features. The joint models are depicted in Figure 3. The data sources are first processed independently by the EIS and CNN models introduced above.
Before the output, features from both models are combined in different ways. First, we consider a simple concatenation and linear combination of the features (Dense-GRU Lin.), similar to a previous approach.Second, we provide the model with more power by adding an FC layer after concatenation which allows the model to
Table 1. The results of our experiments, given in percent. We show results for using EIS only, dermoscopy (Derm.) only and combined models. The CNNs are Densenet121 (Dense) and SE-Resnext50 (SE-RX). The 95 % confidence intervals are provided in brackets.
learn a nonlinear combination of the features (Dense-GRU FC). Third, we do not combine the features explicitly but instead let them learn interactions between each other by employing a cross-attention module (Dense-GRU CA). Here, we use an FC layer with sigmoid activation to learn a weighting for the CNN features using the EIS features and vice versa. Each path has an individual output and we obtain predictions by selecting the maximum output value, similar to the ensembling strategy.
The model is trained end-to-end. For the CNN path we use Densenet121 and SE-Resnext50. For the EIS path, we use our novel recurrent approach. During training, a single crop of size 224 224 and a random permutation of the EIS sequence is fed into the model in each iteration. For evaluation we also use the multi-crop evaluation strategy which covers the entire dermoscopy image. We use a random permutation of the EIS sequence for each of the 36 crops from the dermoscopy images. Again, the final prediction is obtained by averaging the probabilities from all crops.
We evaluate at a high, clinically relevant sensitivity of 98 % by manually choosing a suitable threshold for the predicted probabilities. Above the threshold, a prediction is assigned to the class malignant. For a thresholdindependent metric, we consider the area under the curve (AUC). For all metrics we also provide 95 % confidence intervals (CI) which are obtained by bias corrected and accelerated bootstrapping with
= 10 000 samples. Furthermore, we test for statistical significance in terms of the models’ specificity using a permutation test with
= 10 000 permutations
and a significance level of
= 0.05.
The results are shown in Table 1. For EIS, our novel GRU-based state-max-pooling model performs best. Without state-max-pooling, the GRU’s specificity is lower than the classic models’ result. In particular, the performance difference between the GRU with state-max-pooling and all other EIS methods is significant in terms of the specificity. For dermoscopy-based melanoma detection with CNNs, pretraining substantially improves performance. Comparing methods for EIS and dermoscopy, both perform similar. The specificity is not signifi-cantly different between the best performing EIS model and the best performing CNN model. When combining CNN and EIS models by ensembling, performance is substantially improved. In contrast, the combination of two dermoscopy or two EIS models by ensembling does not lead to improved performance. The specificity is significantly different between the EIS & CNN ensembles and the ensembles with the same data source. Explicitly learning from both data sources in a joint model (Combined) improves performance even further with our proposed attention mechanism performing best. The combined models with FC and CA data fusion significantly outperform all ensembles in terms of the specificity.
We address melanoma detection using both EIS and dermoscopy data with deep learning methods. A clinically applicable diagnostic tool is required to have a very high sensitivity as missed out malignant lesions can impact patient survival.Therefore, we evaluate our models at 98.0 % sensitivity which was deemed useful in a previous study
and we assess their specificity at this operating point. Thus, clinically useful models achieve a high specificity given a fixed, high sensitivity.
For construction of our joint model, we first revisit machine learning methods for EIS data. Our novel GRU-based approach significantly outperforms SVMs and FC-NNs with a specificity of 34.7 % (CI 31.0-38.8) compared to 27.9 % (CI 24.7-31.3) and 30.0 % (CI 26.9-34.2). The GRU-based model takes the nature of EIS data into account as it is able to directly learn which measurements are important, given a sequence of EIS measurements with arbitrary length. The state-max-pooling mechanism integrates previously used heuristicsfor high-sensitivity classifier design smoothly into our model.
Second, we perform melanoma detection using only CNNs with the dermoscopic images from the dataset. Comparing EIS and dermoscopy models, the performance in terms of the specificity is not significantly different while the AUC is slightly higher for the CNNs. However, when restricting the models to the same dataset without pretraining, the EIS models perform better with a specificity of 34.7 % (CI 31.0-38.8) compared to 21.3 % (CI 18.4-24.9). This indicates that EIS models perform better when training from scratch with a small dataset.
Third, we combine both data sources to further improve performance. When taking the simple approach of ensembling the models, the clinically relevant operating point shows a substantially improved specificity over single models. Critically, this kind of performance improvement cannot be observed when combining two models with the same data source, i.e., two CNNs or two EIS-based models. Thus, EIS and dermoscopy appear to complement each other well for melanoma detection. This matches the expectation that EIS and dermoscopy carry complementary features as EIS captures skin properties in deeper layers while dermoscopy mostly provides information at the skin surface.
When building a joint model, the specificity improves even further while the AUC also improves slightly. It is notable that an additional FC layer after feature concatenation improves performance. The more powerful nonlinear transformation most likely allows the model to learn a more meaningful combination of the features. Performance improves further if we do not combine both data sources explicitly, but instead let them learn their interaction with a cross-attention module. The two nonlinear feature transformations FC and CA both significantly improve performance over all ensemble techniques. Overall, we achieve a specificity of 53.7 % (CI 50.1-57.6) compared to a specificity of 34.7 % (CI 31.0-38.8) for EIS only and 34.4 % (CI 31.3-38.4) for dermoscopy only. Thus our combination of EIS and dermoscopy leads to a significant improvement over the current state-of-the-art approach of using dermoscopy and EIS separately.
We propose to combine electrical impedance spectroscopy (EIS) and dermoscopy in joint deep learning models for improved melanoma detection. For this purpose, we first design a new deep learning model for EIS by using recurrent architectures with state-max-pooling for automatic selection of the most relevant EIS measurement. Second, we fuse this model with convolutional neural networks for dermoscopic image processing and study different ways of combining the two data sources. We find that joint nonlinear feature transformations perform best and we show that our combined approach significantly outperforms models that only use one data source. Our results imply that EIS and dermoscopy carry complementary features that can be effectively exploited by joint deep learning methods. For future work, our approach could be evaluated on larger datasets and its value for clinical decision support could be studied.
This work was partially supported by the TUHH initiative.
[1] Tsao, H., Atkins, M. B., and Sober, A. J., “Management of cutaneous melanoma,” N Engl J Med 351(10), 998–1012 (2004).
[2] Carli, P., De Giorgi, V., Crocetti, E., Mannone, F., Massi, D., Chiarugi, A., and Giannotti, B., “Improvement of malignant/benign ratio in excised melanocytic lesions in the dermoscopy era: a retrospective study 1997– 2001,” Brit J Derm 150(4), 687–692 (2004).
[3] Rocha, L., Menzies, S., Lo, S., Avramidis, M., Khoury, R., Jackett, L., and Guitera, P., “Analysis of an electrical impedance spectroscopy system in short-term digital dermoscopy imaging of melanocytic lesions,” Brit J Derm 177(5), 1432–1438 (2017).
[4] ˚Aberg, P., Birgersson, U., Elsner, P., Mohr, P., and Ollmar, S., “Electrical impedance spectroscopy and the diagnostic accuracy for malignant melanoma,” Exp Dermatol 20(8), 648–652 (2011).
[5] Malvehy, J., Hauschild, A., Curiel-Lewandrowski, C., Mohr, P., Hofmann-Wellenhof, R., Motley, R., Berking, C., Grossman, D., Paoli, J., Loquai, C., et al., “Clinical performance of the nevisense system in cutaneous melanoma detection: an international, multicentre, prospective and blinded clinical trial on efficacy and safety,” Brit J Derm 171(5), 1099–1107 (2014).
[6] Mohr, P., Birgersson, U., Berking, C., Henderson, C., Trefzer, U., Kemeny, L., Sunderk¨otter, C., Dirschka, T., Motley, R., Frohm-Nilsson, M., et al., “Electrical impedance spectroscopy as a potential adjunct diagnostic tool for cutaneous melanoma,” Skin Res Technol 19(2), 75–83 (2013).
[7] Kawahara, J., BenTaieb, A., and Hamarneh, G., “Deep features to classify skin lesions,” in [IEEE ISBI], 1397–1400 (2016).
[8] Lopez, A. R., Giro-i Nieto, X., Burdick, J., and Marques, O., “Skin lesion classification from dermoscopic images using deep learning techniques,” in [IEEE IASTED International Conference on Biomedical Engineering], 49–54 (2017).
[9] Codella, N., Rotemberg, V., Tschandl, P., Celebi, M. E., Dusza, S., Gutman, D., Helba, B., Kalloo, A., Liopyris, K., Marchetti, M., et al., “Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic),” arXiv preprint arXiv:1902.03368 (2019).
[10] Gessert, N., Sentker, T., Madesta, F., Schmitz, R., Kniep, H., Baltruschat, I., Werner, R., and Schlaefer, A., “Skin lesion diagnosis using ensembles, unscaled multi-crop evaluation and loss weighting,” arXiv preprint arXiv:1808.01694 (2018).
[11] Tschandl, P., Codella, N., Akay, B. N., Argenziano, G., Braun, R. P., Cabo, H., Gutman, D., Halpern, A., Helba, B., Hofmann-Wellenhof, R., et al., “Comparison of the accuracy of human readers versus machinelearning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study,” The Lancet Oncology 20(7), 938–947 (2019).
[12] Gessert, N., Sentker, T., Madesta, F., Schmitz, R., Kniep, H., Baltruschat, I., Werner, R., and Schlaefer, A., “Skin Lesion Classification Using CNNs with Patch-Based Attention and Diagnosis-Guided Loss Weighting,” IEEE Transactions on Biomedical Engineering (2019).
[13] Tschandl et al., P., “The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,” Sci. Data 5(180161) (2018).
[14] Hochreiter, S. and Schmidhuber, J., “Long short-term memory,” Neural computation 9(8), 1735–1780 (1997).
[15] Cho, K., Van Merri¨enboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y., “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” in [EMNLP], 1724–1734 (2014).
[16] Niu, Y., Lu, Z., Wen, J.-R., Xiang, T., and Chang, S.-F., “Multi-modal multi-scale deep learning for large-scale image annotation,” IEEE Transactions on Image Processing 28(4), 1720–1731 (2018).
[17] Efron, B. and Tibshirani, R. J., [An introduction to the bootstrap], CRC press (1994).