Many areas of medical and biological imaging have seen a recent upsurge in automated diagnosis systems using deep neural nets (DNNs). This trend is pretty much similar in many areas of traditional pathology [Litjens et al., 2017, Campanella et al., 2019, Chen et al., 2018]. However, the clinical application of medical imaging often involves “edge cases” where methods designed for natural images may not perform well. Typical challenges in these settings include large intrinsic variability, weak or inconsistent contrast, the presence of key structures in the images at distinct scales, signifcant class imbalance, the laborious and involved data labeling process, and the need for interpretability in terms of clinically relevant physiological features. These challenges prevent standard DNNs, even those designed for analyzing standard microscopy-based histopathological images, from achieving clinical utilization. In this work, we address one edge case of this type, analysis of morphological patterns of cellular structures in refectance confocal microscopy (RCM) images of pigmented skin lesions.
As we explain below, RCM has been shown to have the potential for a high impact on the assessment of such lesions and can signifcantly improve clinicians’ ability to make accurate and reliable screening decisions on which lesions to biopsy. However, a wider adoption of RCM is hindered signifcantly because the images are very diferent visually from standard histopathology, thus making them an edge case in that context. For that same reason, automated analysis tools require solutions that go beyond standard DNN approaches and that address the challenges listed in the previous paragraph. We report here on the motivation, structure, and evaluation of a DNN architecture, which we call Multiscale Encoder-Decoder Network (MED-Net), that was explicitly designed to overcome these edge case challenges.
Analysis of pigmented skin lesions is critical, with skin cancer being a serious medical problem worldwide. About 5.4 million new cases detected in the USA and another million in other regions (primarily parts of Europe, Canada, UK, Australia, New Zealand) [Nikolaou and Stratigos, 2014]. Diagnosis costs are about $3 billion, and treatment costs another $8 billion per year in the USA [Guy Jr et al., 2015]. RCM is an emerging non-invasive optical diagnostic tool based on examination of living tissue morphology directly on patients, on the fy, and at the bedside or in the clinic. After more than two decades of development and translation, in vivo RCM is advancing into clinical practice for non-invasively guiding diagnosis and treatment of cancer [Rajadhyaksha et al., 2017]. RCM imaging, combined with the current clinical standard for visual examination, known as dermoscopy, reduces the benign-to-malignant biopsy ratio by about a factor of two compared to dermoscopy alone [Alarcon et al., 2014, Pellacani et al., 2014, 2016, Borsari et al., 2016].
Although RCM images have a m-level resolution like standard histopathology, their appearance is quite diferent because they are collected in vivo. One diference is that the images are acquired in an en face orientation, as opposed to the “vertical” (i.e. normal to the skin surface) sections typically used in the pathology of excised specimens. Another is that, due to lack of in vivo contrast agents, images have only one source of contrast, refectance, and therefore are displayed in grayscale, whereas standard H&E pathology is in color contrast (the purple and pink appearance). Instead of color contrast, skin and cellular structures are diferentiated by intricate multiscale textural patterns in RCM images.
Diagnosis of melanocytic lesions using RCM is primarily based on the identifcation of four cellular morphological patterns in RCM mosaics acquired at the dermal-epidermal junction (DEJ). These mosaics typically span rectangular-shaped areas with 4-6 mm at one side [Scope et al., 2017]. The patterns in the mosaics are composed of heterogeneous cellular formations, appear at highly varying scales with highly varying shapes, and with difused transition boundaries in between. Moreover, the images are contaminated by intrinsic speckle noise. All these aspects are characteristic of high-resolution optical microscopy in vivo.
These characteristics present challenges for human readers who are trained extensively to interpret H&E pathology. Learning to read and perform a qualitative examination of RCM images demands significant efort and time for novices, and results tend to be highly subjective, with high levels of inter-reader variability even among experts. The steep learning-curve and large inter-reader variability have become a signifcant impediment to broader RCM adoption by clinicians, which strongly motivates the development of automated computational tools for both clinical guidance and clinical training.
Existing medical image segmentation applications are developed for identifying target structures that typically have
1. predefned shapes with noticeable boundaries (e.g. organs [Nie et al., 2016, Yu et al., 2017], cells [Ron- neberger et al., 2015, Falk et al., 2019]),
2. distinct contrast compared to the background (e.g. cells, retinal vessels [Fu et al., 2016]),
3. predefned spatial location within the view (e.g. organs, retinal layers [Gu et al., 2019], lesions [Marchetti et al., 2018, Codella et al., 2018]).
On the other hand, the morphological structures encountered in RCM images are complex in shape, have ambiguous boundaries, vary in size, change appearance under inherent speckle noise, and appear at arbitrary spatial locations within the feld of view. Therein our experience has convinced us that neither the existing semantic segmentation approaches developed for other medical imaging modalities [Ronneberger et al., 2015, Falk et al., 2019, Nie et al., 2016, Yu et al., 2017, Marchetti et al., 2018, Codella et al., 2018] nor the existing very dewep neural network architectures [Badrinarayanan et al., 2017, Chen et al., 2016] can be efectively used for RCM mosaics. These models contain very large numbers of parameters to optimize, making them prone to overftting with the type of limited and class-imbalanced training data available for RCM. Moreover, in deep network architectures with limited training data, the training of the layers which are farther away from the output is challenging as the partial derivatives that defne the coefcient updates tend to get smaller as the error propagates from the output towards the input layers.
To respond to these particular challenges of automated analysis of RCM images, we developed a multi-scale neural network called MED-Net for semantic segmentation of textural patterns in segmented lesions, based on the morphological patterns that have been defned by expert RCM readers. The architecture of MED-Net was driven by two key observations about clinical practice. First, our multiscale structure was inspired by the typical procedure for examining pathology in RCM mosaics clinically, which routinely starts with low magnifcation and low resolution in a large feld of view (2X-4X, 1-5
m/px, over 5-10 mm) followed by closer inspection of suspicious areas with higher magnifcation and higher resolution in smaller felds of view (10X-40X, 0.2-1.0
m/px, over 0.5-2 mm), and then often returns to lower magnifcation to integrate features found at higher resolution into a broader semantic setting. MED-Net models textural patterns at multiple scales (magnifcations, resolutions), starting from a coarse scale and proceeding to fner scales. Semantic segmentation at each scale is handled by subnetworks, which are fully convolutional encoder-decoder neural networks capable of generating label maps at the same scale as their input. The capacity (number of layers and coefcients) of the subnetworks depends on the complexity of the segmentation task at the given scale (e.g. coarser scales use smaller subnetworks as there is less detail at those scales). Consecutive subnetworks in the multiscale hierarchy explicitly cooperate, leveraging the correlation across scales. Each subnetwork utilizes the encoded feature representation (called the bottleneck representation) from the immediate predecessor subnetwork by integrating it into its feature representation at the equivalent level.
Similarly, the semantic segmentation estimation of each subnetwork is used as a prior in the subnetwork at the fner scale, so that each subnetwork only refnes the coarser-scale estimates rather than solving the whole segmentation problem from scratch. However, using several subnetworks in a cascaded fashion makes the model rather deep and can make training difcult. To solve this problem, we employ a method called “deep supervision” [Zhu et al., 2017]. We compare the output of the subnetwork at every scale against ground truth segmentation downsampled to the same scale. This supervision gives us direct access to deeper layers (early subnetworks) and allows efcient updates to avoid vanishing gradients during training.
Second, we use a set of four cell-morphological patterns (textural structures) that have been identifed by clinicians [Scope et al., 2017] along with two “extra" classes for artifacts and non-lesion background. Rather than designing a binary classifer to simply classify lesions as suspicious or non-suspicious, we aim to respond clinicians’ need for transparency in diagnostics by providing them a scheme that reports more fnely grained results in this “edge cases" setting. Indeed, given this critical need for transparency and its intrinsic advantage for both rapid reader throughput and education, it is of critical importance to generate pattern class masks rather than just binary classifcations. Similarly, we chose pixel-wise instead of image-wise classifcation, because in the latter, the clinician only has access to the fnal diagnostic prediction, while pixel-wise segmentation reports the spatial location of the diagnostic fndings, making the diagnostic process more interpretable.
The precursor to MED-Net, named MUNet, was developed as a feasibility study [Bozkurt et al., 2018]. Here we signifcantly extend MUNet in the following ways:
1. MUNet only provides feedback between consecutive layers via output label maps, whereas MED-Net also shares feature representations between consecutive subnetworks (Fig. 3, Section 2.1).
2. We trained MED-Net using a novel loss function that incorporates a total variation constraint to regularize the smoothness of the output label maps (Section 2.2).
3. We greatly expanded the dataset used to train and test MED-Net compared to MUNet, using what is, in the RCM context, an unprecedentedly rich set of labeled data, 117 mosaics, collected at six diferent clinics in the US (4) and Italy (2). In addition to only having more data available, here we were able to carry out cross-validation with data stratifed by clinic-of-origin, providing a more realistic prediction of future performance. We note that while in the context of DNNs, this is a rather small dataset, it is large for RCM due to the difculty of labeling, an aspect of the “edge case” nature of this problem.
Labeling datasets is laborious and challenging, even for experts. Indeed, only 58% of the pixels in the dataset were labeled by our experts due to these difculties. Thus, another feature of MED-Net is the ability to train on “partially-labeled” data, where only arbitrarily-shaped parts of training images are labeled, but be capable of classifying full images. In our quantitative evaluations, we can only compare to the labeled pixels as we only have ground-truth there, but we show our visual segmentation results on the full images (Fig. 4). We evaluated the segmentation performance of MED-Net using the Dice coefcient, as well as the sensitivity and the specifcity of the model in identifying the patterns. We compared MED-Net results against 4 well-known DNN models (FCN [Long et al., 2015], SegNet [Badrinarayanan et al., 2017], DeepLab [Chen et al., 2016] and UNet [Ronneberger et al., 2015]).
In the following sections, we discuss the design of MED-Net in detail, explain the algorithmic choices we made to overcome unique issues encountered in semantic segmentation of in vivo microscopy images, and present the results of our tests on mosaics of melanocytic skin lesions.
Our study set is composed of 117 RCM mosaics of melanocytic skin lesions collected at the DEJ level. 31 of these mosaics were acquired at 4 diferent clinics in the US (Memorial Sloan Kettering Cancer Center (New York, NY), University of Rochester (Rochester, NY), Loma Linda University Health (Loma Linda, CA), and Skin Cancer Associates (Plantation, FL) ) and the other 86 at clinics at the University of Modena and Reggio Emilia (Italy). All mosaics were collected under the required IRB (USA) and Ethics Committee (EU) approvals and de-identifed (patient metadata was removed). The study set was chosen to refect the data diversity encountered in daily clinical practice. At each clinic, the imaging was carried out with a commercial confocal microscope (Vivascope 1500, Caliber I.D.) with a spatial resolution of m/px. Mosaic sizes varied from
pixels up to
pixels, corresponding to an area between 14 and 36 mm
. The size of the mosaics was determined by the clinical need to be able to evaluate the cellular morphological patterns that characterize melanocytic lesions accurately.
Figure 1: Two examples for each of the six distinct patterns (four cellular morphological and two other patterns). as seen in refectance confocal mosaics at the dermal-epidermal junction in melanocytic skin lesions.
We set as our goal the segmentation of these mosaics into six clinically important classes. Four of them are cellular morphological patterns,i.e. ring, meshwork, nested, and aspecifc. These patterns are routinely observed in RCM mosaics of melanocytic neoplasm collected at the DEJ [Scope et al., 2017]. We added two additional classes for non-lesion areas and areas dominated by imaging artifacts [Gill et al., 2019], leading to six total classes in our segmentation task.1 Exemplars of these six classes are shown in Fig. 1.
Ground truth maps for these six classes came from labels determined by the consensus of 2 expert readers (co-authors MG and CAF), labeled using the open-source software package Seg3D (University of Utah, [CIBC, 2016]). Labeling was conducted in a non-exhaustive manner, meaning that pixels not labeled as any of the six classes were given a distinct “ignore" label. Pixels were not labeled either because the distinction between the labels was not clear due to the existence of mixed patterns or because they would have required excessive time and efort to label, in the readers’ judgement. Overall, 58% of the pixels were labeled (Table 1). We show a sample labeled mosaic in Fig. 2. The unlabeled portions of the mosaics were omitted during both training and quantitative testing. However, the readers qualitatively assessed the algorithm’s segmentations even for these unlabeled regions. The distribution balance of the six labels over the whole dataset is given in Table 1.
Figure 2: An example mosaic and its corresponding expert labeling. Colors indicate the labels; Red: Non- Lesion, Yellow: Artifact, Green: Meshwork, Blue: Ring, Cyan: Nested. Grey colored areas are not labeled, and are ignored in training and quantitative evaluation.
2.1 Semantic Segmentation Network Architecture
MED-Net is composed of multiple encoder-decoder subnetworks nested together (Fig. 3). Each subnetwork processes the input image starting at a specifc scale and outputs a segmentation map at the same scale. To the best of our knowledge, MED-Net is diferent from existing networks in the following aspects. In similar existing approaches [Lin et al., 2017, Jiang et al., 2018, Amirul Islam et al., 2017, Chen et al., 2016, Zhao et al., 2017, Fu et al., 2018, Zhou et al., 2018, Gu et al., 2018, Li et al., 2017, Zhang et al., 2019], the subnetworks are cascaded so that they share only features across networks, or else they independently solve the same segmentation problem and then, only at the end, fuse the results. More similar to MED-Net, Eigen and Fergus [2015] use three separate networks to process the input images at diferent scales in a cascaded manner resembling our approach. They feed the output of subnetworks into the input of the following subnetworks, so the individual models provide feedback to each other. However, in their approach, due to lack of feedback at the individual subnetwork level (e.g. deep supervision [Zhu et al., 2017]), the output of each subnetwork is not fnal output (e.g. in their case, a depth map) at respective scale, but a feature representation. Unlike all these approaches, MED-Net shares intermediate results in two ways. It shares the segmentation outputs across subnetworks (Fig. 3) by using them as a prior that becomes part of the input for subsequent subnetworks. Through the use of deep supervision [Zhu et al., 2017], the output of each subnetwork is compared against a ground truth segmentation and forced to be an intermediate label prediction at the given scale it operates.
Moreover, MED-Net also shares feature representations between matching levels of consecutive subnetworks. These subnetwork interconnections are not present in previous approaches [Lin et al., 2017, Jiang et al., 2018, Amirul Islam et al., 2017, Chen et al., 2016]. Backpropagating the fnal loss through the network can lead to inefcient training of the layers that are farther from the output. Therefore, to effectively train the individual subnetworks, we provide direct feedback to them, a method known as deep supervision [Zhu et al., 2017]. Overall, sharing intermediate feature representation, using intermediate label predictions as priors, and deep supervision to individual subnetworks are the three main innovations in the MED-Net architecture.
The elementary units of subnetworks in MED-Net consist of residual blocks [He et al., 2016], which are generally concatenations of convolutions, non-linearities, and batch normalizations. Downsampling is carried out through non-unity stride of the frst residual block, and upsampling is applied to processing block outputs. The sequence of downsampling processing blocks (encoder) is followed by a sequence of upsampling processing blocks (decoder). Thus if we had a single scale, the architecture would be very similar to a Fully Convolutional Network (FCN32) [Long et al., 2015] with encoder-decoder topology. However, here we have subnetworks that solve the segmentation problem starting from a diferent scale of the input image. Subnetworks in this cross-scale hierarchy share information (feature representations) directly through skip connections from bottleneck representations of their predecessor scale subnetwork. This information exchange is done via multiplication of tensor representations at comparable scales to act like attention mechanisms [Roy et al., 2018]. Also, the output segmentation probability map (a vector of six probabilities per pixel) at each scale (except the fnest) is upsampled and then concatenated with the original or directly downsampled image at the next fner scale and used as the input for the subnetwork at that next scale. More precisely, let
and
be the original image and corresponding ground truth labeled image, and
and
be those images after
times downsampling in both spatial dimensions (
). The subnetwork at the coarsest scale takes only
as input and produces a probability map
, which represents the likelihood of each pixel belonging to a particular class. For all other subnetworks (i.e.
), we fuse the segmentation coming from subnetwork
(
) with the level
version of the input (
) via concatenation. The fnal segmentation probability map is
, which is at the same resolution as the input image of the overall model.
The subnetwork depth parameter is a design choice, and one can also vary the scale factor between subnetworks, which we set to 2, leading to a 3-level version of MED-Net. Likewise, the scale diference of the input between consecutive levels is another design choice and can be determined according to needs and computational capabilities. In addition, the overall architecture is modular in the sense that one can replace our subnetwork architecture (including a diferent design of the processing blocks) with any other relevant subnetwork architecture and then assemble a MED-Net version of that network.
Each MED-Net subnetwork for has the same architecture as the subnetwork at scale
but with two additional blocks: One encoder block before the bottleneck feature representation and one deconvolution block at the input of the decoder. Note that the weights in each corresponding block difer across subnetworks; weights are not shared between layers. Information is shared between subnetworks only through the skip connections described above.
2.2 Loss function
The loss function was designed to take three distinct factors into account:
1. Appropriateness of segmentation (e.g. generating labels that change smoothly across the image).
2. Ability to handle imbalances in label distribution of the training data.
3. Applicability to multiclass labeling. Thus we used a modifed version of the soft-Dice loss calculated between and
(see Fig. 3) at each level.
The standard Dice Similarity Coefcient [Dice, 1945] is commonly used for binary segmentation and is known to be robust against label imbalance in the data. In its original binary formulation, DSC explicitly represents only true-positive samples, while true-negative cases are automatically optimized simultaneously. However, similar to Salehi et al. [2017], we found that directly extending this formulation to the multilabel case by treating each label as a binary classifcation task did not put enough emphasis on true-negatives samples. Therefore, we modifed the soft-Dice coefcient also to consider true-negative samples in the loss calculation, as described next.
Suppose we have sized tensors
and
, where
is one-hot encoded ground truth at the subnetwork level
. The entries
if pixel
is labeled as class
, where
is a one-hot vector of length
with 1 in its
entry and 0 everywhere else.
is the neural network output, such that at each
pixel,
and
. Our modifed loss function is:
where is a small value in order to avoid division by zero. The frst part of the equation is the standard softDice loss, which encourages agreement between true positive labels, while the second part of the equation also encourages agreement between true negative predictions. To ensure smoothness of the prediction label map and avoid small isolated segmentation labels, we regularize the loss function using the total variation (TV) of the output label map.
Combining MDSC and TV losses, the loss applied at each subnetwork level is
We set the regularization parameter empirically, , which kept the total variation cost to [0.1, 0.01] of the soft-Dice loss. In our experiments, we observed that keeping the total variation cost within this range of the soft-Dice loss provided a good balance between smoothness and the accuracy of produced label maps.
As shown in Fig. 3, we calculate between outputs of each subnetwork and the label map at the respective scale for each scale
, and the overall loss as the sum of losses across all subnetworks/scales
. Doing so, we efectively gain direct access to the deeper layers of the network, as is done with deep supervision [Zhu et al., 2017]. However, the subnetworks are not trained disjointly as they are connected via skip connections, resulting in joint optimization of all subnetwork parameters.
Figure 3: Our architecture is composed of 3 nested fully convolutional networks that generate semantic segmentation at diferent scales. Red arrows denote 2x downsampling, and green arrows denote 2x upsampling. Output segmentations at lower magnifcations are fed into the next level via concatenation. The loss at each level (scale) is calculated and backpropagated for deep supervision of the subnetworks.
2.3 Implementation Details
In this section, we discuss specifc parameter choices in our implementation of MED-Net on RCM mosaics. These choices were made to ft available hardware resources (e.g. GPU memory, number of GPUs) and problem characteristics (e.g. data sampling and augmentation scheme). We report them so that readers can replicate our work, and we also anticipate that they will provide a guideline towards applying this structure to other segmentation problems.
Before training the MED-Net model, we needed to make two important choices regarding; (i) the resolution of the mosaics to be processed and (ii) the size of the input images to the network. Although the network architecture can segment arbitrarily sized images, we processed the RCM mosaics in patches (portions of the mosaic) due to memory limitations of the GPU we used. Note that the patches needed to be larger than pixels per dimension because we used 2-strides (efectively downsampling by 2) at least at 4 levels of encoder blocks. To determine useful patch-sizes, we consulted our expert readers, who reported that in their experience, the morphological patterns of interest could still be reliably identifed at 2
m/px resolution, 4-times lower than that of the RCM acquisition system. Thus before feeding the mosaics to MED-Net, we downsampled them by 4. The readers also reported that a 0.5 mm
mm feld of view is typically large enough to identify these same patterns reliably. Thus we processed the mosaics in patches of
pixels after downsampling.
All models are trained using the same training parameters. We trained each model for 200 epochs, using a base learning rate of 0.01, batch size of 48, and weight decay of . We exponentially decayed the learning rate to one-tenth of the base value throughout the training. For a fair comparison, we kept the number of trainable parameters for all networks at 6 million. All the convolutional layers are initialized with He Normal initialization [He et al., 2016].
We also implemented data augmentation through spatial sampling. In order to cover all possible patches that could be extracted from the mosaic, we devised the following patch extraction procedure. Before each epoch, we extract pixels patches in a sliding window fashion with a 50% overlap. Then, at each epoch of training, we extracted
pixel patches at random locations within the larger patches.
In order to account for inevitable variations during RCM image acquisition, such as changes in laser power (illumination intensity), distortion in tissue, speckle noise, and the orientation of the microscope, we applied data augmentation on the extracted patches. At each epoch, we
1. rotated each patch at a random angle up to 180 degrees
2. randomly fipped the patch horizontally and vertically,
3. added a random intensity value in [-20, 20]2
4. zoomed in/out randomly up to 10%,
5. randomly sheared the patches (),
6. added signal-dependent Gaussian-distributed pseudo-speckle noise (with uniform random multiplication parameter of 0.2).
During inference, the output of the networks is six probability maps, one for each label (represented as a tensor) over a 0.5 mm
feld of view. Due to the use of padded convolutions, the network produces less reliable segmentation results at the borders of the patches. To compensate, we extracted and processed patches in an overlapping fashion, resulting in multiple soft decisions for each pixel. Specifcally, we extracted patches at a stride of 32 pixels, leading to up to 8 diferent decisions per pixel. We then weighted each patch’s probability map for each label with a spatial Gaussian mask whose variance was half of the patch size before summing the overlapping probability maps. Finally, we chose the class with the highest resulting probability for each pixel.
We report the results of testing on two distinct training scenarios. In Scenario 1, we pooled data across all sites, then stratifed by the patient for training, validation, and testing (5-fold stratifed cross-validation). In Scenario 2, we frst stratifed by clinics, only used the data from clinics in Europe for training and validation, and then tested only on data from the US. The validation set was used to probe the performance of the model throughout training, and the test set was used to evaluate the performance of the trained models quantitatively. We chose to train on the European data and test on the US data, and not vice-versa, both due to the limited size of the US data set and also because the US data came from a larger number of clinics, thus better mimicking a more realistic application scenario. Results from the frst scenario are described in Section 3.1 and results from the second scenario in Section 3.2. Each fold used in Scenario 1 is also stratifed by the class label in the training/test split to ensure a representative sampling of training data in the face of the class imbalance in our data. Specifcs of the data distribution over the training, validation, and test sets for both scenarios are given in Table 1.
In addition to MED-Net, we also tested 4 other widely used deep segmentation networks; FCN [Long et al., 2015], SegNet [Badrinarayanan et al., 2017], DeepLab [Chen et al., 2016], and UNet [Ronneberger et al., 2015] for comparison purposes. To try to ensure fair comparisons, we used a similar number of trainable parameters in each network (). All of the networks were trained using similar training parameters (e.g. learning rate, weight decay, batch size) for 200 epochs using the MDSC+TV loss described above.
Table 1: Class distribution statistics: The top portion reports the distribution of labels for both scenarios. In Scenario 1, we were able to balance distribution across training and test sets to within 1% (stratifed cross-validation). Class distributions in training and test sets are explicitly given for Scenario 2. In the bottom portion, we report on the size of the datasets in terms of both images and labeled pixels, as well as on the overall fraction of pixels that were labeled.
3.1 Scenario-1: Patient-Wise Cross-Validation Experiment
As described above, in this scenario, we “patient-wise partitioned" the dataset into 5 stratifed folds, meaning that each fold contained similar proportions of class labels. Training, validation, and test sets approximately corresponded to 70, 10, and 20 percent of the data in each fold, respectively.
In Table 2, we present the segmentation performance of all four networks for Scenario 1 in terms of sensitivity, specifcity, and the Dice coefcient. On average, MED-Net modestly outperforms the other networks in terms of sensitivity (by 0.02 to 0.12), although the comparison difers across classes. On specifcity, all four networks perform similarly both on average and by class. The Dice coefcient values are consistently better for MED-Net than the compared methods except for FCN on the Nest class. In general, FCN was the closest to MED-Net.
A closer comparison of the model output with ground truth labels revealed that in general, all models confused the meshwork class with the ring and aspecifc classes. This result is interesting, because anecdotally we have been told that novice clinicians also sufer from the same problem due to the wide range of variations in the appearance of the meshwork pattern. Moreover, visual examination of the results by our experts confrmed that most of the falsely classifed meshwork pattern samples contain "deformed" variations of the pattern, which they reported are typically also misclassifed by novice readers.
To obtain a qualitative assessment of MED-Net outputs, we presented the segmentation maps produced by MED-Net to our experts. In particular, we asked them to review the automated annotation of the algorithm over the “unlabeled areas". Their qualitative assessment of the results was very positive and confrmed that the model performed very well in annotating most of the unlabeled areas in the mosaics. We show an example in Fig. 4. The gray-colored areas in the fgure represent the unlabeled areas. MED-Net typically extended the labels of the neighboring labeled areas over the unlabeled sections, providing smoother label maps than the other methods.
Figure 4: Example segmentation results of 6 mosaics for Scenario 1. Color scheme is the same as used in Fig. 2. The ground truth segmentations are compared to the outputs of MED-Net and other state-of-the- art-methods. Images are not exhaustively annotated by the readers. Pixels that are not annotated (dark grey label) are ignored during training. During the testing phase, these pixels are discarded from sensitivity and specifcity calculations.
Table 2: Results for Scenario 1 Patient-Wise. The best results for each metric and label are highlighted in bold.
3.2 Scenario-2: Clinic-wise Cross-Validation
To assess how the models generalize across clinical settings, we trained them over the data collected in Italy (86 mosaics) and tested on data collected at 4 US clinics (31 mosaics). In this case, we were not able to keep the incidences of the labels in the training and test sets at similar levels (Table 1). In the training set, [18,20,21,6,23,12] percent of the labeled pixels were, [background, artifact, meshwork, nested, ring and aspecifc] patterns respectively; whereas in the test set the ratios were [8,23,23,5,36,5] percent. We used the same network model architectures and training parameters that we used in Scenario 1 for both MED-Net and the other networks.
In Table 3, we summarize the segmentation performance of these networks in terms of sensitivity, specifcity, and Dice coefcient. In general, performances of all the networks were close to what we observed on the patient-wise stratifcation, with only modest decreases in the performance metrics. Overall, MED-Net outperformed all the other networks in terms of averages across classes, particular with regards to sensitivity and Dice coefcient. Specifcity values were generally very high for all networks on all classes, and for some classes, other networks had sightly higher specifcity than MED-Net.
Table 3: Results for Scenario 2 Clinic-Wise Cross-Validation Experiments. The best results for each metric and label are highlighted in bold.
3.3 Ablation Studies
We conducted 2 ablation studies to investigate how multiscale analysis and the proposed loss function each afect performance. We compared ablation results to our baseline model (the 3-level MED-Net trained using MDSC+TV loss, see Section 3.2). We followed the same training and testing procedures in Section 3.2.
To test the efect of the multiscale approach, we trained 1-level and 2-level MED-Net models and compared them to the 3-level MED-Net. For a fair comparison, the number of trainable parameters for all the models is kept at 6 million. The results in Table 4 show that using the multiscale analysis improves the segmentation performance. We stopped at 3 levels because a fourth level would necessarily decrease the resolution below the size of the most of the relevant features in the images.
Table 4: Ablation study results for training versions of MED-Net with 3 diferent levels.
To test the efect of the loss function, we trained the same baseline MED-Net model using cross-entropy, Dice loss functions, and compare the results against our MDSC+TV loss defned in Section 2.2. The results in Table 5 show that using MSDC+TV as the loss function results in the best segmentation performance in terms of average Dice coefcient over all classes.
Table 5: Ablation study results for training MED-Net with diferent loss functions.
In this article, we present a deep-learning based semantic segmentation algorithm developed specifcally for in vivo microscopy applications other than retinal imaging. Machine-learning based analysis of in-vivo optical microscopy images has unique challenges as the textural patterns of morphology in these images are diferent from the patterns in natural images, and they vary extensively within classes. Hence, features developed for natural images do not generally perform well on these images. This makes deep-learning based models attractive for the analysis of microscopy images as they provide the possibility of learning the best feature representation, given an objective task. Moreover, as the deep-learning-based approaches ofer ways of learning both the feature representation and the classifcation model in an integrated fashion, they allow greater fexibility in capturing the relationships between pixels that encode complex morphological patterns like those present in RCM images.
Semantic segmentation also addresses another need: transparent, interpretable, machine-learning-based image analysis. Unlike diagnostic decision systems that provide a “black box” approach to a fnal diagnostic score (e.g. probability of being benign or malignant) [Esteva et al., 2017, Monheit et al., 2011, Codella et al., 2017, Marchetti et al., 2018], semantic segmentation methods provide to the user the results behind the outcomes. Thus a transparent approach can facilitate acceptance and adoption of machine learning-based approaches [Goodman and Flaxman, 2017]. Thus spatially-resolved, multiclass semantic segmentation algorithms such as the MED-Net architecture proposed here have this additional advantage.
We report several promising results in this study. Although average sensitivity is moderate, specifcity is very high; MED-Net performed very well at detecting the absence of a particular pattern and did not report a lot of false positives. Hence a clinician could be highly confdent about the accuracy of positive results reported by the model. Moreover, Dice coefcients of 0.73-0.75 show that the model is not only good at detecting the existence of a pattern but also successfully fnds the location and the extent of the pattern. On the other hand, due to its modest sensitivity, clinicians should be aware that the model may miss patterns that are present in the data.
Compared to the other network models that we tested, MED-Net achieved consistently higher quantitative metrics. Among other approaches, FCN performed best and had average sensitivity, specifcity, and Dice coefcient similar to MED-Net. The qualitative results provided in Fig. 2 suggest that MED-Net avoided inaccurately fragmented annotations. Note that both networks used the same loss function, which included an over-fragmentation penalty. Thus we conclude that this result was achieved via the multiresolution feedback mechanism introduced in the network, which provides the output of the coarser network as a prior estimate to the fner level (Fig. 3). In this way, the model was observed to provide more coherent segmentations compared to FCN.
In the feld of screening of pigmented skin lesions, MED-Net can act as a catalyst to enable faster training of novice readers and enable the adoption of RCM screening by the wider clinical community. Initially, semantic segmentation could serve as a quality assurance layer for experts, by providing them a quantitative measure of artifacts in the collected images [Kose et al., 2019]. Assuring to acquire images where diagnostic content is not obscured by artifacts, the expert reader can frst review the images blinded to MED-Net output, and then re-review their readings compared to an automated semantic segmentation. In this way, the semantic segmentation could ofer the expert a chance to identify areas of importance that may have been missed in their initial review, and then accept or reject the MED-Net output. Previous works have suggested that a double review of cases is preferable for remote interpretation [Witkowski et al., 2017], but this can be logistically infeasible due to the limited availability of experts. Having an integrated segmentation analysis serve as a second review may be a reasonable alternative to ensure the quality of care. In addition, MED-Net, used together with other quantitative imaging techniques such as DEJ delineation [Kurugol et al., 2015, Bozkurt et al., 2017a, Kaur et al., 2016, Robic et al., 2017, Bozkurt et al., 2017b, Hames et al., 2016] and diagnostic classifcation [Koller et al., 2011, Halimi et al., 2017], ofers the potential to automate the entire image-acquisition process and pave the way for clinical imaging-based diagnostic guidance.
Although the MED-Net was designed to work generally on microscopy images of complex tissue, we would argue for the need to be cautious when applying it directly to other domain-specifc clinical microscopy applications. We needed to make domain-specifc design choices in order to utilize the model and the available data to their full extent. In our case, these algorithmic choices were the minimum size of the processing area (0.5 mm mm), the resolution of the images (2
m/px), and the use of a multiscale CNN to increase robustness to scale changes in the morphological structures. Even if deep learning methods provide powerful solutions to represent the data of interest and carry out classifcation tasks, without the proper domain-specifc choices, one may not achieve good results. In addition, we caution that the speckle noise inherent in optical imaging of scattering tissue poses a challenge as it changes the texture of morphological patterns and increases the variability in their appearance. In our case, we observed that designing augmentation techniques to simulate the variation in the data greatly helped in ameliorating this problem and increased both sensitivity and specifcity.
Another way to potentially increase the performance would be to increase the amount of available training data. For example, as mentioned in Section 3, “deformed" variants of the meshwork pattern were misclassifed by MED-Net, decreasing the segmentation performance. We believe that it is possible to overcome this problem by using more meshwork pattern that includes such deformations for training. Similar strategies could be followed to cover variations of all the patterns and increase the segmentation performance.
However, preparing data to train semantic segmentation models is logistically challenging. Unlike widely used classifcation models, where collecting image-wise labels are sufcient for training, data labeling for semantic segmentation is laborious and time-consuming, as it requires identifying precise and exhaustive boundaries in each test image. Additionally, unlike labeling natural scenes where the object borders are well defned, subjectivity is a common issue in labeling microscopic images. For example, even if meshwork and ring patterns are considered two diferent morphological patterns in their canonical form, it was not at all uncommon in our data for one of the patterns to slowly morph into the other, leading to a region with a blend of both patterns. One way to ease the experts’ labeling workload, which we adopted here, was to ask experts to label only relatively clear and distinct single-pattern regions, rather than exhaustively labeling all pixels. Specifcally, we asked the experts to label only the areas that they thought represented clear examples of the six given patterns. The result was that they labeled 57% of the training data pixels across the 117 mosaics. Once trained, MED-Net was able to predict labels for the entire mosaic, although we were not able to calculate quantitative metrics on the unlabeled regions due to lack of ground truth. To allow this level of fexibility for the labelers, we designed our training procedure to be capable of handling partially labeled data by calculating and backpropagating the error over only the labeled pixels.
However, based on our experience, we believe that even this "partial labeling" scheme will not be sustainable in the long run if we want to signifcantly increase the size and variety of data available for further training and development. We are currently investigating ways of utilizing "weakly-labeled" data for semantic segmentation purposes. In such a scheme, the expert would provide only mosaic-wise labels (or maybe quadrant-wise, or for other, fxed, smaller portions of the mosaics), similar to what is done for classifcation problems. These labels would then be extended by the network to full semantic segmentation maps. These regions could be singly or multiply labeled according to both the ML scheme and the nature of the data. Campanella et al. [2019] investigate a multiple instance learning based approach for the segmentation of histopathology slides. In histopathology, large amounts of weakly-labeled data are available through pathology slides and the respective pathology reports (e.g. Campanella et al. used 12 thousand pathology slides). RCM imaging, on the other hand, is likely to remain in the realm of small data. We hope that this work, and specifcally the availability of MED-Net, will help to accelerate the adoption of RCM imaging, in turn leading to larger data availability in the coming years to enable the application of weakly-supervised methods.
Finally, we wish to return to the topic of wider applicability. MED-Net was explicitly designed as a segmentation tool that can be used for other imaging modalities and other non-generic "edge case" applications. The multiscale cellular and morphological textural patterns seen in RCM images of melanocytic skin lesions have underlying similarities to patterns seen in other tissues and conditions (e.g. non-melanocytic skin lesions, skin pre-cancers, oral pre-cancers and cancers, benign and infammatory conditions in skin [Flo- res et al., 2019, Peterson et al., 2019, Longo et al., 2012]) and with other emerging optical microscopic imaging approaches (optical coherence tomography (OCT), multimodal OCT-and-RCM, multiphoton microscopy (MPM), optical coherence microscopy (OCM)) [Schneider et al., 2019, Boone et al., 2015]. Thus we also hope that the utilization of MED-Net for both clinical training and clinical practice will eventually help to drive wider acceptance and adoption of in vivo optical microscopy in clinical practice.
This project was supported by NIH grant R01CA199673 from NCI and in part by MSKCCâĂŹs Cancer Center core support NIH grant P30CA008748 from NCI. The authors would like to thank NVIDIA Corporation for the Titan V GPU donation through their GPU Grant Program.
Christi Alessi-Fox is a current employee and shareholder at CaliberID. Milind Rajadhyaksha is a former employee of and holds equity in CaliberID, manufacturer of a confocal microscope. Prof. Giovanni Pellacani received honoraria for courses on confocal microscopy from Mavig GmbH, and served as advisory board member for CaliberID.
I. Alarcon, C. Carrera, J. Palou, L. Alos, J. Malvehy, and S. Puig. Impact of in vivo refectance confocal microscopy on the number needed to treat melanoma in doubtful lesions. British journal of Dermatology,
170(4):802–808, 2014.
M. Amirul Islam, M. Rochan, N. D. Bruce, and Y. Wang. Gated feedback refnement network for dense image labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pages 3751–3759, 2017.
V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481– 2495, 2017.
M. Boone, A. Marnefe, M. Suppa, M. Miyamoto, I. Alarcon, R. Hofmann-Wellenhof, J. Malvehy, G. Pellacani, and V. Del Marmol. High-defnition optical coherence tomography algorithm for the discrimination of actinic keratosis from normal skin and from squamous cell carcinoma. Journal of the European Academy of Dermatology and Venereology, 29(8):1606–1615, 2015.
S. Borsari, R. Pampena, A. Lallas, A. Kyrgidis, E. Moscarella, E. Benati, M. Raucci, G. Pellacani, I. Zalaudek, G. Argenziano, et al. Clinical indications for use of refectance confocal microscopy for skin cancer diagnosis. JAMA dermatology, 152(10):1093–1098, 2016.
A. Bozkurt, T. Gale, K. Kose, C. Alessi-Fox, D. H. Brooks, M. Rajadhyaksha, and J. Dy. Delineation of skin strata in refectance confocal microscopy images with recurrent convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 25–33, 2017a.
A. Bozkurt, K. Kose, J. Coll-Font, C. Alessi-Fox, D. H. Brooks, J. G. Dy, and M. Rajadhyaksha. Delineation of skin strata in refectance confocal microscopy images using recurrent convolutional networks with toeplitz attention. arXiv preprint arXiv:1712.00192, 2017b.
A. Bozkurt, K. Kose, C. Alessi-Fox, M. Gill, J. Dy, D. Brooks, and M. Rajadhyaksha. A multiresolution convo- lutional neural network with partial label training for annotating refectance confocal microscopy images of skin. In International Conference on Medical Image Computing and Computer-Assisted Intervention,
pages 292–299. Springer, 2018.
G. Campanella, M. G. Hanna, L. Geneslaw, A. Mirafor, V. W. K. Silva, K. J. Busam, E. Brogi, V. E. Reuter, D. S. Klimstra, and T. J. Fuchs. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine, 25(8):1301–1309, 2019.
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915, 2016.
P.-H. C. Chen, K. Gadepalli, R. MacDonald, Y. Liu, K. Nagpal, T. Kohlberger, J. Dean, G. S. Corrado, J. D. Hipp, and M. C. Stumpe. Microscope 2.0: An augmented reality microscope with real-time artifcial intelligence integration. arXiv preprint arXiv:1812.00825, 2018.
CIBC, 2016. Seg3D: Volumetric Image Segmentation and Visualization. Scientifc Computing and Imaging Institute (SCI), Download from: http://www.seg3d.org.
N. C. Codella, Q.-B. Nguyen, S. Pankanti, D. A. Gutman, B. Helba, A. C. Halpern, and J. R. Smith. Deep learning ensembles for melanoma recognition in dermoscopy images. IBM Journal of Research and Development, 61(4/5):5–1, 2017.
N. C. Codella, D. Gutman, M. E. Celebi, B. Helba, M. A. Marchetti, S. W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittler, et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC). In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pages 168–172. IEEE, 2018.
L. R. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302, 1945.
D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision, pages 2650–2658, 2015.
A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun. Dermatologist-level classifcation of skin cancer with deep neural networks. Nature, 542(7639):115–118, Jan. 2017. ISSN 0028-0836. doi: 10.1038/nature21056.
T. Falk, D. Mai, R. Bensch, Ö. Çiçek, A. Abdulkadir, Y. Marrakchi, A. Böhm, J. Deubner, Z. Jäckel, K. Seiwald, et al. U-net: deep learning for cell counting, detection, and morphometry. Nature methods, 16(1):67, 2019.
E. Flores, O. Yélamos, M. Cordova, K. Kose, W. Phillips, E. Lee, A. Rossi, K. Nehal, and M. Rajadhyak- sha. Peri-operative delineation of non-melanoma skin cancer margins in vivo with handheld refectance confocal microscopy and video-mosaicking. Journal of the European Academy of Dermatology and Venereology, 33(6):1084–1091, 2019.
H. Fu, Y. Xu, D. W. K. Wong, and J. Liu. Retinal vessel segmentation via deep learning network and fully- connected conditional random felds. In 2016 IEEE 13th international symposium on biomedical imaging (ISBI), pages 698–701. IEEE, 2016.
H. Fu, J. Cheng, Y. Xu, D. W. K. Wong, J. Liu, and X. Cao. Joint optic disc and cup segmentation based on multi-label deep network and polar transformation. IEEE transactions on medical imaging, 37(7): 1597–1605, 2018.
M. Gill, C. Alessi-Fox, and K. Kose. Artifacts and landmarks: pearls and pitfalls for in vivo refectance confocal microscopy of the skin using the tissue-coupled device. Dermatology online journal, 25(8), 2019.
B. Goodman and S. Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017.
F. Gu, N. Burlutskiy, M. Andersson, and L. K. Wilén. Multi-resolution networks for semantic segmentation in whole slide images. In Computational Pathology and Ophthalmic Medical Image Analysis, pages 11–18. Springer, 2018.
Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y. Zhao, T. Zhang, S. Gao, and J. Liu. Ce-net: Context encoder network for 2d medical image segmentation. IEEE transactions on medical imaging, 2019.
G. P. Guy Jr, S. R. Machlin, D. U. Ekwueme, and K. R. Yabrof. Prevalence and costs of skin cancer treatment in the us, 2002- 2006 and 2007- 2011. American journal of preventive medicine, 48(2):183–187, 2015.
A. Halimi, H. Batatia, L. D. Jimmy, G. Josse, and J. Y. Tourneret. Wavelet-based statistical classifcation of skin images acquired with refectance confocal microscopy. Biomedical Optics Express, 8(12):5450–5467, 2017.
S. C. Hames, M. Ardigò, H. P. Soyer, A. P. Bradley, and T. W. Prow. Automated segmentation of skin strata in refectance confocal microscopy depth stacks. PloS one, 11(4):e0153208, 2016.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
J. Jiang, Y.-C. Hu, C.-J. Liu, D. Halpenny, M. D. Hellmann, J. O. Deasy, G. Mageras, and H. Veeraraghavan. Multiple resolution residually connected feature streams for automatic lung tumor segmentation from ct images. IEEE transactions on medical imaging, 38(1):134–144, 2018.
P. Kaur, K. J. Dana, G. O. Cula, and M. C. Mack. Hybrid deep learning for refectance confocal microscopy skin images. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 1466–1471. IEEE, 2016.
S. Koller, M. Wiltgen, V. Ahlgrimm Siess, W. Weger, R. Hofmann Wellenhof, E. Richtig, J. Smolle, and A. Gerger. In vivo refectance confocal microscopy: automated diagnostic image analysis of melanocytic skin tumours. Journal of European Academy of Dermatology and Venerology, 25(5):5, 2011.
K. Kose, A. Bozkurt, C. Alessi-Fox, D. H. Brooks, J. G. Dy, M. Rajadhyaksha, and M. Gill. Utilizing ma- chine learning for image quality assessment for refectance confocal microscopy. Journal of Investigative
Dermatology, 2019. ISSN 0022-202X. doi: https://doi.org/10.1016/j.jid.2019.10.018.
S. Kurugol, K. Kose, B. Park, J. G. Dy, D. H. Brooks, and M. Rajadhyaksha. Automated delineation of dermal–epidermal junction in refectance confocal microscopy image stacks of human skin. Journal of Investigative Dermatology, 135(3):710–717, 2015.
J. Li, K. V. Sarma, K. C. Ho, A. Gertych, B. S. Knudsen, and C. W. Arnold. A multi-scale u-net for se- mantic segmentation of histological images from radical prostatectomies. In AMIA Annual Symposium Proceedings, volume 2017, page 1140. American Medical Informatics Association, 2017.
G. Lin, A. Milan, C. Shen, and I. Reid. Refnenet: Multi-path refnement networks for high-resolution se- mantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1925–1934, 2017.
G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Gin- neken, and C. I. Sánchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
C. Longo, I. Zalaudek, G. Argenziano, and G. Pellacani. New directions in dermatopathology: in vivo confocal microscopy in clinical practice. Dermatologic clinics, 30(4):799–814, 2012.
M. A. Marchetti, N. C. Codella, S. W. Dusza, D. A. Gutman, B. Helba, A. Kalloo, N. Mishra, C. Carrera, M. E. Celebi, J. L. DeFazio, et al. Results of the 2016 international skin imaging collaboration international symposium on biomedical imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images. Journal of the American Academy of Dermatology, 78(2):270–277, 2018.
G. Monheit, A. B. Cognetta, L. Ferris, H. Rabinovitz, K. Gross, M. Martini, J. M. Grichnik, M. Mihm, V. G. Prieto, P. Googe, et al. The performance of melafnd: a prospective multicenter study. Archives of dermatology, 147(2):188–194, 2011.
D. Nie, L. Wang, Y. Gao, and D. Shen. Fully convolutional networks for multi-modality isointense infant brain image segmentation. In 2016 IEEE 13Th international symposium on biomedical imaging (ISBI), pages 1342–1345. IEEE, 2016.
V. Nikolaou and A. Stratigos. Emerging trends in the epidemiology of melanoma. British journal of dermatology, 170(1):11–19, 2014.
G. Pellacani, P. Pepe, A. Casari, and C. Longo. Refectance confocal microscopy as a second-level exami- nation in skin oncology improves diagnostic accuracy and saves unnecessary excisions: a longitudinal prospective study. British Journal of Dermatology, 171(5):1044–1051, 2014.
G. Pellacani, A. Witkowski, A. Cesinaro, A. Losi, G. Colombo, A. Campagna, C. Longo, S. Piana, N. De Car- valho, F. Giusti, et al. Cost–beneft of refectance confocal microscopy in the diagnostic performance of melanoma. Journal of the European Academy of Dermatology and Venereology, 30(3):413–419, 2016.
G. Peterson, D. K. Zanoni, M. Ardigo, J. C. Migliacci, S. G. Patel, and M. Rajadhyaksha. Feasibility of a video-mosaicking approach to extend the feld-of-view for refectance confocal microscopy in the oral cavity in vivo. Lasers in Surgery and Medicine, 51(5):439–451, 2019.
M. Rajadhyaksha, A. Marghoob, A. Rossi, A. C. Halpern, and K. S. Nehal. Refectance confocal microscopy of skin in vivo: From bench to bedside. Lasers in surgery and medicine, 49(1):7–19, 2017.
J. Robic, B. Perret, A. Nkegne, M. Couprie, and H. Talbot. Classifcation of the dermal-epidermal junction using in-vivo confocal microscopy. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pages 252–255, April 2017. doi: 10.1109/ISBI.2017.7950513.
O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
A. G. Roy, N. Navab, and C. Wachinger. Recalibrating fully convolutional networks with spatial and channel âĂIJsqueeze and excitationâĂİ blocks. IEEE transactions on medical imaging, 38(2):540–549, 2018.
S. S. M. Salehi, D. Erdogmus, and A. Gholipour. Tversky loss function for image segmentation using 3d fully convolutional deep networks. In International Workshop on Machine Learning in Medical Imaging, pages 379–387. Springer, 2017.
S. L. Schneider, I. Kohli, I. H. Hamzavi, M. L. Council, A. M. Rossi, and D. M. Ozog. Emerging imaging technologies in dermatology: Part ii: Applications and limitations. Journal of the American Academy of Dermatology, 80(4):1121–1131, 2019.
A. Scope, P. Guitera, and G. Pellacani. Rcm diagnosis of melanocytic neoplasms: Terminology, algorithms and their accuracy and clinical integration. In S. González, M. Rajadhyaksha, M. Ardigo, C. Longo, C. Carrera, M. Ulrich, and E. Moscarella, editors, Refectance Confocal Microscopy of Cutaneous Tumors, 2nd Ed, pages 168–186. Boca Raton, CRC Press, 2017.
A. Witkowski, J. Łudzik, F. Arginelli, S. Bassoli, E. Benati, A. Casari, N. De Carvalho, B. De Pace, F. Farne- tani, A. Losi, et al. Improving diagnostic sensitivity of combined dermoscopy and refectance confocal microscopy imaging through double reader concordance evaluation in telemedicine settings: A retrospective study of 1000 equivocal cases. PloS one, 12(11):e0187748, 2017.
L. Yu, X. Yang, H. Chen, J. Qin, and P.-A. Heng. Volumetric convnets with mixed residual connections for automated prostate segmentation from 3d mr images. In AAAI, pages 66–72, 2017.
S. Zhang, H. Fu, Y. Yan, Y. Zhang, Q. Wu, M. Yang, M. Tan, and Y. Xu. Attention guided network for retinal image segmentation. In International Conference on Medical Image Computing and Computer-Assisted
Intervention, pages 797–805. Springer, 2019.
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2881–2890, 2017.
Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pages 3–11. Springer, 2018.
Q. Zhu, B. Du, B. Turkbey, P. L. Choyke, and P. Yan. Deeply-supervised cnn for prostate segmentation. In Neural Networks (IJCNN), 2017 International Joint Conference on, pages 178–184. IEEE, 2017.