Fast and robust multiplane single molecule localization microscopy using deep neural network

2020·Arxiv

Abstract

Abstract

Single molecule localization microscopy is widely used in biological research for measuring the nanostructures of samples smaller than the diffraction limit. This study uses multifocal plane microscopy and addresses the 3D single molecule localization problem, where lateral and axial locations of molecules are estimated. However, when we multifocal plane microscopy is used, the estimation accuracy of 3D localization is easily deteriorated by the small lateral drifts of camera positions. We formulate a 3D molecule localization problem along with the estimation of the lateral drifts as a compressed sensing problem, A deep neural network was applied to accurately and efficiently solve this problem. The proposed method is robust to the lateral drifts and achieves an accuracy of 20 nm laterally and 50 nm axially without an explicit drift correction.

1 Introduction

Fluorescence microscopy is widely used in biological research to analyze in vivo structures of samples. However, due to the diffraction limit of light, the resolution of conventional fluorescence microscopy is limited to approximately 200 nm laterally and 500 nm axially. To overcome the diffraction limit, a number of super-resolution microscopy methods including single molecule localization microscopy (SMLM) have been proposed [1]. The fundamental problem of the super-resolution microscopy is to estimate the true molecular distribution from an observed image. In SMLM, only a few molecules can be activated at a time by using photoactivatable molecules. The positions of a few molecules can be accurately estimated by a localization algorithm such as Gaussian fitting. By integrating the localization results of many frames, high-resolution images can be obtained.

In many biological studies, three-dimensional imaging techniques are important to observe the 3D structures of samples and various 3D fluorescence microscopy techniques have been proposed [2]. In the past decade, SMLM methods have been extended to achieve 3D super-resolution. The most commonly used method for 3D SMLM is point spread function (PSF) engineering. Several types of filters, such as astigmatism [3], double-helix [4], and teterapod [5], have been proposed to achieve 3D localization. In these methods, the 3D molecule locations are estimated from the difference of the shape of the PSFs. However, when the molecule density is high, estimation errors of the axial position by these methods are high. Also, when using the engineered PSFs, we need to use additional instruments, such as a cylindrical lens or phase masks, are required for the optical system. Therefore, the reconfiguration of the optical system for other applications is difficult.

In this work, multifocal plane microscopy (MUM)[6] is used for 3D SMLM. MUM is a simple extension of 2D fluorescence microscopy to 3D by using multiple cameras. In this study, quad-plane microscopy is used as MUM. The 3D locations of the molecules are estimated by the images obtained from four focal planes. This study did not use engineered PSFs for the microscopy work; thus, the optical layout is simple and the reconfiguration of the optical system is relatively easy. Moreover, even when the molecule density is high, estimation errors are not as high as the engineered PSF because the shape of the PSF is simpler. The major problem when using quad-plane microscopy for 3D SMLM is lateral drifts of the camera positions which affect the localization quality. When we use quad-plane microscopy, the positions of cameras may have sub-pixel lateral drifts. The location of the molecules is estimated from the observation for each focal plane; hence, the drifts of the camera positions make the estimation less accurate.

For an accurate localization of the molecules, it is necessary to estimate the amount of lateral drifts. In this study, the 3D molecule location is estimated along with the amount of lateral drifts. This estimation problem is formulated as a compressed sensing problem [7]. However, as the size of the input images become larger, the problem becomes intractable due to its high computational cost.

Recently, deep neural networks (DNNs) have received growing attention and have been successfully applied in a wide variety of applications due to their high predictive performances. In the last few years, convolutional neural networks (CNNs), which is a type of a DNN, have been applied to SMLM [8–11]. CNNs have achieved a remarkable speedup of 2D and 3D molecule localization. Although training a neural network takes several hours to days, the trained network can estimate the molecule location accurately and efficiently. In addition, DNNs are suitable for SMLM because an infinite number of training data can be generated by using an approximated PSF. This is because the performance of a neural network is significantly affected by the amount of data.

We also employ a CNN to efficiently estimate the location of molecules. The architecture of the proposed network is based on the fast super-resolution convolutional neural network (FSRCNN) [12], which was used for the super-resolution of natural images. The network estimates the sub-pixel location of the molecules in an input image by predicting the molecules’ existence at each location. In addition, this network is trained to be robust to lateral drifts of camera positions so that the network can localize molecules accurately without explicit drifts correction.

The rest of this paper is organized as follows. The experimental setting of a quad-plane microscope used in this study is presented in section 2, and the formulation of the single molecule localization problem is explained in section 3. Then the problem of the molecule localization using quad-plane microscopy and the detail of the proposed method are presented in section 4. The experimental results validate the algorithm as shown in section 5. Finally, the concluding remarks and the discussion are provided in the last section.

2 Experimental setting

A multi-focus microscope equipped with four EM-CCD cameras (iXon 897, Andor) was constructed based on a commercial inverted microscope (ECLIPSE Ti, Nikon) (Fig. 1). A 640 nm laser beam (HL6366DG, Thorlabs) that passed through a cleanup filter (LD01-640/8, Semrock) was focused on the back focal plane of a 100 oil immersion objective (Plan Apo VC 100X/1.40, Nikon) to illuminate an Alexa Fluor 647-stained specimen at an excitation intensity of approximately 5 kW/cm. The fluorescence emitted from the specimen was collected by the same objective. A filter cube consisting of an excitation filter (608–648 nm, FF02-628/40, Semrock), a dichroic mirror (669 nm, FF660-Di02, Semrock), and a bandpass mirror (672–712 nm, FF01-692/40, Semrock) was used to separate the excitation and emission light. The fluorescence image formed by the internal tube lens of the inverted microscope was relayed by an achromatic lens (f = 125.0 mm, Thorlabs), split twice by 1:1 beam-splitter mirrors (BSW29R, Thorlabs), and refocused onto the four cameras via achromatic lenses (f = 100.0 mm, Thorlabs). The axial positions of the achromatic lenses in front of the cameras were adjusted so that the four planes at 400 nm intervals in the Z-axis direction of the specimen correspond to the conjugate planes of the sensor surface of the respective camera. The relative distance among planes was estimated by a shift in Z-axis position dependence of PSFs which are determined by imaging fluorescent beads (FluoSphere Carboxylate-Modified Microspheres, 0.2 m, Invitrogen) while varying Z-axis positions of the objective by using a piezo positioner (P725.1, PI). The difference in the field of view of the cameras was corrected by coordinate registration using affine transformation, parameters of which were determined by images of multiple fluorescent beads captured on the different cameras.

Methanol-fixed COS7 cells were used for STORM imaging of tubulin molecules expressing inside the cells as described previously [13]. The first and secondary antibodies were an anti-tubulin antibody (YL1/2, Abcam) and an Alexa Fluor 647-labeled anti-rat IgG antibody, respectively. The specimen was mounted in a STORM buffer (10 mM NaCl, 60% sucrose, 10% glucose, 0.1% -mercaptoethanol, 0.5 mg/mL glucose oxidase, 0.04 mg/mL catalase, and 50 mM HEPES, pH 8.0) and then subjected to imaging. Images were acquired at 22Hz with 20 ms exposure.

Figure 1: Optical layout of the quad-plane microscope. The intermediate image is relayed onto each camera via a pair of lenses (L1, f=125.0 mm; L2, f=100.0 mm). TL, tube lens; M1, 1:1 beam-splitter mirror. The inset shows the focusing planes of four cameras.

3 Formulation

In this work, we assume that the resolution of observed images and the target resolution is given. Therefore, a target 3D space was divided into voxels. This was used to model the observation on a grid of voxels. Also, the molecules are localized at the voxel level of the specified resolution.

3.1 Observation Model

Let = (be an observed low-resolution image obtained by quad-plane microscopy, where is the observed fluorescence intensity at the i-th observation coordinate . Since is a convolution of the true molecule density and the PSF, can be approximated by a linear equation as,

where is an observation matrix and w = (is a molecule distribution where represents the weight of the intensity of the molecule at the j-th voxel coordinates . Here, = 1, 2, . . . , n) and = 1, 2, . . . , m) are coordinates of the grid of the low- and high- resolution voxels respectively. The (i, j)-element of the matrix H represents the fluorescence from a molecule at observed at , and can be written as ), where h is a point spread function:

We assume that an observed image contains both shot noise and additive observation noise. The shot noise follows a Poisson distribution and the observation noise follows a Gaussian distribution for each observation independently. Hence the observation can be modeled as

where is composed of shot noise and observation noise.

Now, the problem of molecule localization is to estimate weights for all j = 1, 2, . . . m from a low-resolution image . Here, the observation matrix H is an overcomplete matrix (n < m); thus the coefficient vector w cannot be recovered by minimizing the noise in Eq. (2). However, since w is a sparse vector, w can be recovered by solving the following linear inverse problem:

which is known as Lasso (see [14] and reference therein). This type of inverse problem is known as compressed sensing [7].

3.2 Three-dimensional Point Spread Function

In the above formulation, the true PSF h is not known in general; therefore, we use a parametric function ˆh to approximate h and an approximated observation matrix ˆH is used to solve Eq. (3). In this work, we use quad-plane microscopy which takes four images at different degrees of defocus. The width and the peak of the ) depend on the distances between a molecule and the focal planes.

The PSF of the quad-plane microscopy is modeled by the following function:

where b is the background fluorescence. This PSF is similar to the PSF used in [15] for biplane microscopy. The width ) of the PSF varies depending on the axial position and is described by the following defocus curve:

where is the width of the PSF when a molecule is on the focal plane and d is the focus depth of the microscope. The peak a of the PSF depends on the width ) and is modeled as:

The parameters are determined as listed by Table 1 using a set of images of fluorescent beads obtained from different depths. Figure 2 shows the width of the observed fluorescent beads and the value of the defocus curve (5).

3.3 Lateral Drifts of the Focal Planes

When we use multi-focal plane microscopy to localize the molecules, we need to consider the lateral drifts of focal planes. Otherwise, the estimation accuracy of the localization is easily deteriorated, because the elements of the true H varies depends on the drifts.

Let ∆and ∆= 1, 2, 3, 4) be the amount of lateral drifts along the horizontal and vertical axis of the z-th focal plane. Then, the drift vector is written as (∆0). In this work, we only

Figure 2: The width of the observed fluorescent beads and values of the defocus curve. The observed width of the fluorescent beads at each depth are shown by a circle. The red line shows the value of the defocus curve to approximate the width with the parameters in the Table. 1.

where the observation coordinates is on the z-th plane.

This equation implies that the observation from the molecule at with a lateral drift is identical to the observation from the molecule at without a lateral drift. Since the observation from each plane is affected by a different drift of camera position, the location of a molecule can be estimated as different points from each plane. However, the problem Eq. (3) considers all of the focal planes at the time to estimate the molecule location. Hence, unless we know the amount of lateral drift for each plane, the true molecule position cannot be correctly estimated even when a single molecule exists in the 3D space.

Besides, based on Eq. (7), if is the same for all z = 1, 2, 3, 4, we cannot distinguish if the amount of lateral drifts are or if the true molecule location is even when we use the information of all planes. Instead, we estimate the relative lateral drifts = (z = 2, 3, 4) from a reference plane z = 1. In this study, the amount of relative lateral drifts, as well as the molecule potisions, are estimated at a high-resolution voxel level.

4 Method

4.1 Compressed Sensing with Lateral Drift Estimation

When solving the molecule localization problem stated in Eq. (3), it is necessary to consider the lateral drifts of the focal planes to estimate the molecule location accurately. The approximated observation matrix ˆH varies depending on the lateral drifts; hence,the approximated observation matrix ˆH can be modified so that the molecule location is correctly estimated.

By ordering the rows of the observation matrix ˆH based on the axial position of the observation

where each submatrix ˆrepresents the observation matrix of the z-th focal plane. Since the lateral drift affects each plane independently, the drift of each block can be considered individually. Moreover, the absolute drift cannot be estimated from observation. Therefore, we consider the plane z = 1 as a reference plane. As a result, the relative drifts of ˆ, ˆ, and ˆare considered.

while considering the high-resolution pixel-level relative shift and the maximum amount of shift as a hyperparameter.

Now, the problem is to estimate both the lateral drifts and the molecule locations from the observation , which can be formulated as:

where

Here, we consider T images at the same time so that the amount of the lateral drifts are correctly estimated from the images. Since we assume that only a small number of molecules exist in the target 3D space, it is not always possible to estimate all of the lateral drifts from a single image.

However, the optimization problem above requires a high computational cost since we need to consider all of the possible pairs of lateral drifts to get an optimal solution. Although this problem can be solved by alternating the optimization of and to get a sub-optimal solution, the optimization of still requires a high computational cost as the input image size becomes larger or the target resolution becomes higher. Therefore, a faster method to solve the optimization problem is required to obtain a super-resolution image within a reasonable computational time.

4.2 Convolutional Neural Network (CNN)

In this work, we use a CNN to increase the computational speed to solve the problem Eq. (10). The molecule localization problem Eq. (3) is a deconvolution problem, where a molecule location is estimated from the observed images. This deconvolution process can be seen as a composition of upsampling and deblurring of a low-resolution image. In this work, the resolution of the input and output images is given and the scaling factor is 8along each axis.

This network is composed of several convolution layers and deconvolution layers as shown in Fig. 3 and the structure is similar to the structure of the fast super-resolution convolutional neural network (FSRCNN) [12], which is used for a single image super-resolution of natural images. The first layer extracts features from an input image and this is followed by three deconvolution layers. The sampling frequency of the input image is doubled at each deconvolution layer while important features for the localization are extracted.

Figure 3: Network architecture of the proposed model.

We use ReLU as activation functions followed by batch normalization layers [16] to enhance the training speed and the estimation quality for these layers.

For the last layer of the network, a convolution layer is used to obtain a set of images for the target axial resolution. Also, unlike a natural image super-resolution task, the purpose of the localization problem Eq. (3) is to estimate the existence of a molecule for each high-resolution voxel. Hence, the network directly outputs the probability of the existence of a molecule in each voxel using a sigmoid function. Namely, at the last layer of the network, the binary classification problem is solved for each voxel.

Let p = () be the ground-truth molecule existence probability for each voxel where:

and q = () be the molecule existence probability estimated by the CNN. Since the network solves the binary classification problem at each voxel, we use the sum of the binary cross-entropy (BCE):

as the loss functions to train the network.

To train the network, we need a training dataset. However, the true molecule density of the real samples is not known. Therefore, we use artificial observed images generated from artificial distributions to train the network.

We generate random molecule distributions that contain K molecules in a 3D space. In this work, K = 3 was used and the size of the target 3D space is 3072 nm 3072 nm 1200 nm and the coordinates of the molecules (k = 1, 2, . . . , K) are independently drawn from a uniform distribution on this space. The weight = 1, 2, . . . , K) of each molecule is also independently drawn from a continuous uniform distribution on [0.3, 1.0], and the target value p is generated as Eq. (12).

In this work, we assume that the size of the low-resolution voxels is 192 nm 192 nm 400 nm and the size of a low-resolution image is 164. The size the high-resolution voxel is 24 nm 24 nm 50 nm and the size of a high-resolution image is 128 128 32. The relative lateral drift of the focal planes ∆, ∆(z = 2, 3, 4) are randomly chosen as 24d nm independently where d is drawn from a discrete uniform distribution on [2] . Then, low-resolution images are generated by calculating the values of ˆ) on the low-resolution grids (i = 1, 2, . . . , n).

To generate a training dataset, the coordinates of the molecules and the relative lateral drifts of the focal planes are randomly drawn as above for each frame t = 1, 2, . . . , T independently. By training the network with this dataset, the trained network is expected to become robust to the lateral drifts of camera positions within [2] high-resolution voxels.

5 Experiments

We show experimental results of the localization by the proposed method with both artificial images and the real microscopy images. Below, the experiments were performed on a NVidia Tesla V100 32GB GPU.

5.1 Experiments with Artificial Images

We used 90,000 low-resolution images and corresponding molecule existence probability generated from the random molecule distribution to train the neural network. In addition, we use 10,000 test data were generated in the same way as the training dataset. We use Adam [17] as an optimizer, where its parameters are = 0= 0.99 and the initial learning rate is set to 110, and the batch size is 100. The epoch number of the optimization is 30, and dataset is shuffled at the end of each epoch.

To validate the accuracy of the estimation by the trained network, we generated artificial images that contain only one molecule in the 3D space. In this experiment, we estimate that a molecule exists in a voxel where the network outputs the highest molecule existence probability. In Fig. 4, the mean localization accuracy along the horizontal (X), vertical (Y) and axial (Z) directions with 95% confidence intervals are shown for each true molecule depth. The figure indicates that the error along each axis is within a high-resolution voxel on average along each axis at all depth.

Figure 5 shows the estimation results with multiple molecules. The molecules are sampled from the helix curve (red line) and their high-resolution coordinates are shown by red circles. In this experiment, we estimate that the molecules exist in the voxels whose molecule existence probability exceeds a certain threshold. The thresholding value is common for all locations and needs to be specified by a user. Here, we chose 0.1 for the thresholding value. As demonstrated in Fig. 5(a), three molecules distributed in the 3d space are also accurately detected. Although, as Fig. 5(b) indicates, the closely located molecules are difficult to localize, still, the estimated locations are close to the ground-truth locations By plotting all of the detected molecules from all frames, the helix curve structure behind the molecules can be seen as the Fig. 5(c).

The processing speed of the network is presented in Fig. 6. As the figure indicates, the computational speed decreases as the size of the image increases and is inversely proportional to the number of pixels

Figure 4: Average estimation error along the horizontal (X), vertical (Y), and axial (Z) axis and their 95% confidence interval.

Figure 5: The localization result of artificial data. The red line shows the true molecule structure where molecules are sampled from. The red circles show a true high-resolution molecule coordinate and the blue triangles show a estimated molecule locations. The figure (a), (b) shows the estimation results of selected frames and the figure (c) shows the reconstructed image from 300 frames.

Figure 6: Computational speed (fps) of estimations by the trained network for images of the size 16 px 16 px, 32 px 32px, . . . ,256 px 256 px. The estimation speed of our network is inverse proportional to the the number of pixels of an image.

of an image. Since we assume that 50 images are obtained by our microscopy at every second, further improvement of the processing speed is needed to process the large images in real-time. Still, the computational time is significantly reduced in comparison to the compressed sensing method. By solving the problem Eq. (10) by an alternating minimization, the processing speed is only 110fps even for a small 16 16 4 input.

5.2 Experiments with Real Images

In this section, we show the experimental result with real data that observed a microtubules by our microscopy. In this experiment, there were no ground-truth results; hence, we used the trained neural

network in the previous subsection to localized molecules.

The resolution of the low- and high-resolution image is the same as the previous subsection. The size of input images for each frame is 2562564 and the target image size is 2048204832. The dataset contains 39,000 frames of images and each frame is processed independently to localize the molecules. In this experiment, we estimate that a molecule exists in a voxel if the molecule’s existence probability exceeds 0.05.

Figure 7 shows the estimated high-resolution image at the selected depths generated by merging localization results of all of the frames. Each pixel of the image is a binary value, which indicates that the voxel contains a molecule in more than one frame. From the figures, we can see a tubular structure of the microtubules that varies depending on the depth.

6 Discussion

This study presents the 3D molecule localization problem using quad-plane microscopy. The problem with using multi-focal plane microscopy (MUM) is that lateral drifts of camera positions make the localization less accurate. We formulated the localization problem as a compressed sensing problem that consists of the molecule localization and an estimation of the amount of drifts. However, the computational cost to solve this problem is high and the optimal solution cannot be obtained within a reasonable computational time. A CNN is proposed to solve this problem accurately and efficiently. The network is trained to be robust against the sub-pixel lateral drifts for the camera locations.

The experiments with both artificial data and real data were presented. The results suggest that the network achieves 3D localization of the molecules with a lateral resolution of 25 nm and an axial resolution of 50 nm on average. It is also robust to the lateral drifts of the camera positions. We expect this technique can be used to broaden the applicability of MUM for 3D imaging since an explicit drift correction is not required.

Figure 7: Estimated high-resolution image of the microtubules data. The depth-dependent tubular structure of the sample is visualized by the colors.

However, some limitations are worth noting. Although, our proposed method significantly increased the computational speed of solving the localization problem, it is still difficult to process large images with real-time processing speed. Future work should, therefore, include further improvement in computational speed. Using a faster method to extract possible molecule locations and localizing the molecule by the proposed method may further improve the computational efficiency.

Acknowledgement The authors would like to thank M. Tanaka for technical assistance. This work was partly supported by JSPS KAKENHI Grant Numbers 17H01793, 18H03291 and JST CREST Grant Number JPMJCR1761, JPMJCR14D7.

References

[1] L. Schermelleh, A. Ferrand, T. Huser, C. Eggeling, M. Sauer, O. Biehlmaier, and G. P. Drummen, “Super-resolution microscopy demystified,” jan 2019.

[2] W. Liu, K. C. Toussaint, C. Okoro, D. Zhu, Y. Chen, C. Kuang, and X. Liu, “Breaking the Axial Diffraction Limit: A Guide to Axial Super-Resolution Fluorescence Microscopy,” Laser and Photonics Reviews, vol. 12, no. 8, pp. 1–29, 2018.

[3] B. Huang, W. Wang, M. Bates, and X. Zhuang, “Three-dimensional super-resolution imaging by stochastic optical reconstruction microscopy,” Science, vol. 319, no. 5864, pp. 810–813, 2008.

[4] S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord, N. Liu, R. J. Twieg, R. Piestun, and W. E. Moerner, “Three-dimensional, single-molecule fluorescence imaging beyond the diffraction limit by using a double-helix point spread function,” Proceedings of the National Academy of Sciences, vol. 106, no. 9, pp. 2995–2999, 2009.

[5] Y. Shechtman, S. J. Sahl, A. S. Backer, and W. E. Moerner, “Optimal point spread function design for 3d imaging,” Phys. Rev. Lett., vol. 113, p. 133902, Sep 2014.

[6] S. Ram, P. Prabhat, J. Chao, E. S. Ward, and R. J. Ober, “High accuracy 3D quantum dot tracking with multifocal plane microscopy for the study of fast intracellular dynamics in live cells,” Biophysical Journal, vol. 95, pp. 6025–6043, dec 2008.

[7] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, pp. 1289– 1306, April 2006.

[8] P. Zelger, K. Kaser, B. Rossboth, L. Velas, G. J. Sch¨utz, and A. Jesacher, “Three-dimensional localization microscopy using deep learning,” Opt. Express, vol. 26, pp. 33166–33179, Dec 2018.

[9] W. Ouyang, A. Aristov, M. Lelek, X. Hao, and C. Zimmer, “Deep learning massively accelerates super-resolution localization microscopy,” Nature Biotechnology, vol. 36, no. 5, pp. 460–468, 2018.

[10] E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deep-storm: super-resolution single-molecule microscopy by deep learning,” Optica, vol. 5, pp. 458–464, Apr 2018.

[11] N. Boyd, E. Jonas, H. Babcock, and B. Recht, “Deeploco: Fast 3d localization microscopy using neural networks,” bioRxiv, 2018.

[12] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Computer Vision – ECCV 2016 (B. Leibe, J. Matas, N. Sebe, and M. Welling, eds.), (Cham), pp. 391–407, Springer International Publishing, 2016.

[13] D. W. Cleveland and K. F. Sullivan, “Molecular biology and genetics of tubulin,” Annual Review of Biochemistry, vol. 54, no. 1, pp. 331–366, 1985. PMID: 3896122.

[14] T. Hastie, R. Tibshirani, and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman & Hall/CRC, 2015.

[15] L. Gu, Y. Sheng, Y. Chen, H. Chang, Y. Zhang, P. Lv, W. Ji, and T. Xu, “High-density 3D single molecular analysis based on compressed sensing,” Biophysical Journal, vol. 106, pp. 2443–2449, jun 2014.

[16] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing inter- nal covariate shift,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pp. 448–456, JMLR.org, 2015.

[17] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.