Absorption imaging is the most common probing technique in experiments with ultracold atoms. The standard procedure involves the division of two frames acquired at successive exposures, one with the atomic absorption signal and one without. A well-known problem is the presence of residual structured noise in the final image, due to small differences between the imaging light in the two exposures. Here we solve this problem by performing absorption imaging with only a single exposure, where instead of a second exposure the reference frame is generated by an unsupervised image-completion autoencoder neural network. The network is trained on images without absorption signal such that it can infer the noise overlaying the atomic signal based only on the information in the region encircling the signal. We demonstrate our approach on data captured with a quantum degenerate Fermi gas. The average residual noise in the resulting images is below that of the standard double-shot technique. Our method simplifies the experimental sequence, reduces the hardware requirements, and can improve the accuracy of extracted physical observables. The trained network and its generating scripts are available as an open-source repository (absDL.github.io).
Ultracold atomic gases are unique systems that allow studying few- and many-body physics in a highly precise and tunable manner. The atomic ensembles are exquisitely isolated from the surroundings as they are held in an ultra-high vacuum environment; therefore, probing them is almost always restricted to the analysis of their optical response. The most widely used probing technique is absorption imaging, where a collimated resonant laser beam is passed through the cloud, and the shadow cast by the atoms is recorded by a digital camera . The spatial atomic distribution is then extracted from the position-dependent absorption coefficient. The coherence length of the probe beam is typically much longer than the distances between optical interfaces in the experiment, hence, unwanted reflections interfere and generate a characteristic patterns of stripes and Newton’s rings in the recorded image. These patterns pose a problem in distinguishing between the signal and the non-uniform background.
The standard solution is to employ a double-exposure scheme: the first exposure is performed while the atoms are present, while the second reference exposure is performed shortly after and without the atoms. The exposure without atoms can be done either by waiting for the atoms to move out of the frame or by optically pumping them into a dark state. The line-of-sight integrated optical density (OD) image is formed by subtracting the logarithms of the pixel counts in the two frames, with and without the atoms. However, due to acoustic noises and other dynamical processes, the noise patterns in the two images are typically not identical. This results in a residual structured noise pattern in the final image (Fig. 1a). The lower signal to noise ratio afflicted by the fringes is particularly problematic in low-OD images. Linear approaches for background completion were recently suggested [2, 3], but, as we show, they are sensitive to small changes in the noise pattern that evolve over time.
In this work, we tackle the noisy background problem using machine learning, a term describing a set of algorithms that effectively perform a specific tasks relying on patterns and inference. Among these, deep learning refers to a class of models which involves information propagation via multiple structures, enabling the translation of a given input to a certain prediction. The use of deep learning has become widespread in recent years for problems where an analytic mapping does not exist or when numeric solutions are intractable [4–8]. Image completion is an excellent example of such an application, particularly in a scenario where there are typical recurrent but varying patterns in the image. Machine learning techniques were also used for the optimization of ultracold atoms cooling sequences [9–12] and to execute related numerical calculations . They were also suggested  and demonstrated  to be useful for flu-orescence detection of pinned atoms and ions.
Here we report on a new approach for absorption imaging that uses a deep neural network (DNN) to generate an ideal reference frame from a single image that includes the atomic absorption signal. The reference image is constructed by masking out the part of the image containing the atomic shadow and using the network for image completion of the background. We demonstrate the new method with data acquired with ultracold gas and show that the images captured by the single exposure technique feature lower noise levels and therefore allow for a more accurate extraction of physical observables. In addition to the improvement in the data quality, our single-shot approach simplifies the experimental sequence and eases the hardware requirements from the camera. The DNN model successfully adapts to both short and long time variations, and therefore it constitutes a robust solution.
FIG. 1. Completion of a background frame by the neural net – an example of network evaluation of a typical image without atoms from the validation set. (a) The input log image with its central part masked. The network task is to complete the image in the central cyan square. (b) The network prediction for the central square. (c) The original central part of the image (“ground truth”). (d) The difference between the network prediction and the ground truth, multiplied by 5 to enhance the contrast. The residual OD root mean squared error of this example is 0.061 for both the single-exposure and double-exposure techniques.
Experimental apparatus. The experiments are conducted with a quantum degenerate Fermi gas of atoms with an equal mixture of the two lowest energy states in the F = 9/2 manifold at a magnetic field of 185G. Our experimental system and cooling procedure are the same as described in Refs. [16, 17]. The frames without atoms were captured deliberately along seven months to test the DNN in realistic conditions. We acquired data with atomic clouds at different conditions by modifying the evaporation cooling sequence. For training and validation of the DNN, we also acquired images without atoms. To this end, we set the initial position of the optical transfer trap to about 2cm away from their location at the magnetic trap, hence no atoms are shuttled to the position where the images are recorded. In all cases, the first exposure was taken between after the optical dipole trap was turned off abruptly.
The images are taken with a laser tuned to the cycling transition in the manifold, at a wavelength of nm. The laser linewidth is about 100kHz, much narrower than the natural linewidth of MHz. The illumination is pulsed for and recorded by a 14 bit CCD camera . The reference frame (for the conventional absorption imaging) is recorded with a second pulse given after 50ms, when the atoms already moved out of the camera field of view. We also capture “dark frames” without illumination at all that serve as the zero references. The dark images don’t have to be taken often since they only account for any remaining light which is not due to the probe beam and for electronic noise in the camera. Prior to analyzing the two images in the conventional absorption imaging technique, we correct for small differences which may exist between the intensity of the illumination in both exposures. These differences are typically of few percents. The second exposure is taken only in order to compare our technique with the conventional method and is neither required for the application of the DNN nor for its training.
Two physical observables that are commonly used in ultracold atomic experiments are the temperature and number of atoms. In the presented results, the number of atoms in the cloud and its temperature are controlled by changing the final trap depth in the optical evaporation. We extract the observables from the momentum distribution, which is measured after 15ms of a ballistic expansion. To extract the observables, we fit the OD images with 
where denotes the Jonquière’s polylogarithm function, is the fugacity, and B accounts for any remaining constant background in the OD image. From the fugacity, we extract the relative temperature , with being the Fermi temperature, and is the geometricallyaveraged trapping frequency, which we measure and rescale according to the trapping laser power. The number of atoms, N, is obtained by integrating over the fitted momentum distribution.
pipeline where the input (the information in the masked OD image, in our case) undergoes multiple convolutional transformations and dimensional variations. These transformations distill the features of the underlying spatial pattern, and their result is the prediction of the DNN. The network is trained to optimally recover the structure of the illumination in the region where the atomic signal appears. The training phase is performed using images captured without atoms, and constitute therefore the “ground truth” for the unsupervised reconstruction. At each optimization step, the prediction of the network is compared to the ground truth values in the masked area, and the weights of the model are varied to minimize the loss, i.e., the mean squared error (norm) between the ground truth and the prediction. At the end of the training, we obtain an optimized model ready for prediction (inference) on new images with atoms. The network produces an ideal reference regardless of whether atoms appeared in the original image or not, because the relevant region is masked out. Since the involved convolutions are relatively simple, the evaluation of the model for inference on new inputs is rapid, and therefore the integration of a trained network into the infrastructure of another calculation is extremely facile.
From the raw images we subtract the dark frames, and then take the logarithm of their pixel values. The convolutional network is an autoencoder of a U-net architecture . The input to the network is the OD image cropped to pixels around the position of the atoms, from which we mask out the central circle with a diameter of 190 pixels  that may include an absorption signal if atoms are present. This mask diameter is larger by at least a factor of two relative to the size of the typical atomic cloud, to ensure that there is no absorption signal in the region used by the DNN to predict the background. For training, we use a generator to riffle through the stored TIFF images, apply the mask on the input, and feed the DNN input with 8 frames batches . To evaluate the DNN on an atomic frame, we store the model inference as binary file and subtract the input frame to obtain the atomic OD. By minimizing the loss over the square circumscribing the masked region (dashed cyan square in Fig. 1a), we ensure continuity at the corners, where the background is unmasked. Effec-tively of the loss is dedicated to image duplication rather than completion, in order to eliminate any offset between the input and output frames, which might be translated into an error in the number of atoms.
The feed-forward network consists of about parameters arranged in 27 layers. These parameters were optimized by running over frames captured without atoms, with additional images for loss validation, comparing the network output to the original central part of each image, and minimizing the mean squared error loss function. We used ADAM optimizer  and Glorot initialization  for the parameters optimization, applying 99% batch normalization . For this application, labeling of the input frames is unnecessary as the network output is compared directly against its input before masking. The only prior knowledge is the absence of atoms in the peripheral region and, only for the training set, also in the central area. Notably, generative adversarial networks , which were found very successful in natural-scene image competition tasks, might be destructive for this study case, as there is a given unique ground truth.
1 10 100 1,000 0.05
FIG. 2. Minimization of the residual error along the DNN training. Optical-density root mean squared error between the model prediction and the ground truth as a function of the number of training iterations (epochs). Lower values mean better performance. The purple curve represents the residual loss of the training set, which is minimized in the optimization process. The black curve is the residual error on the validation set, which was not used for training. The dashed red line designates the mean residual noise in the standard double-shot scheme, multiplied by to correctly compare with the residual noise of the DNN prediction in the central circle (see Fig. 1d).
DNN performance on the validation set. First, we examine the residual noise in inferences on the validation set, which was not used for training and does not include atomic signal. The convergence of the model is depicted in Fig. 2, where we present the decay of the residual loss during the training process for both the training (purple) and validation (black) datasets. The decay in both datasets on a log-log scale is sub-power-law. It exceeds the reference level, set by the average double-shot residual noise (dashed red line), after approximately 100 training epochs, which mainly points to a reliable extraction of the bias, but noise features still exist. In principle, the training should continue as long as the validation loss decreases. In practice, the loss decay slows dramatically after few hundreds of epochs, and we therefore cease the training after 1133 epochs. An example for image completion without atoms is displayed in Fig. 1, with the DNN input (1a) and the corresponding prediction of the network (1b), which closely resembles the original data (1c). Notably, there are no significant spatial correlations in the difference between the desired and the predicted frame (1d).
The lowest residual error is 0.0681 optical-depth root mean squared error (ODRMSE), for the whole validation dataset captured intermittently along seven months. As most of the residual error resulted from the inner circle of the square output image (see Fig. 1d), a fair comparison
0.05 0.1 0.15 0.2 0.25 0
FIG. 3. Residual error distribution of the difference images in the validation set. The upper histogram indicates the optical-density root mean squared error of the DNN single-exposure technique following 1133 training epochs. The middle histogram represents the residual error in the standard double-exposure technique, after correction for probe intensity fluc-tuations. The lower histogram depicts the residual error of PCA-based reference generation images . The PCA vectors set was extracted from the 300 significant components out of 600 random images taken from the DNN training set. Differ-ent colors distinguish the validation set constituent frames by date, counting from the first partial set.
for the loss is against of the averaged-double-shot error, indicated by a dashed red line in Fig. 2. This reference value is 0.0745 (ODRMSE), 9.4% higher than the minimal validation loss obtained during the first 1139 epochs.
In Fig. 3 we compare the histograms of the residual loss on the validation set using the DNN-based single-shot technique (upper panel), the conventional double-exposure technique (middle panel), and background completion using principal component analysis (PCA) technique (lower panel) . The histogram for the DNN technique exhibits a single narrow peak, while for the two other approaches it is markedly wider and multistructured. To illustrate the source of this behavior, we color the histograms based on the elapsed time when taking the corresponding dataset, relative to the first set. We find that the double-peak structure of the conventional double-shot technique is correlated to time variations, probably due to slow drifts in the probe light intensity. An exacerbation of this variation is observable in the PCA results. We substantiate that it directly emerges from the variations in the set from which the PCA basis is taken by repeating the PCA analysis but with the basis taken over 600 frames all from the first day of image acquisition. In this case, we find that the PCA approach yields excellent results for same-day frames, 0.08(2) ODRMSE. Nonetheless, its performance dramatically deteriorates with long-term drifts – we find distinct date-dependent peaks in the histogram (not shown in the figure), and for the days datasets the error distribution lies at 0.42(1) ODRMSE. We can conclude that in order for the PCA approach to maintain adequate performance, recurrent dataset accumulation and analysis is needed, almost on a daily basis. In contrast, the DNN technique is robust and insensitive to these variations. It derives its robustness from the variance in the substantially broader dataset, which is tractable due to the sequential training of the network.
The results on the validation set show that the DNN single-exposure approach achieves lower residual noise levels and deals better with variations in the imaging conditions when compared to the conventional double-shot scheme or linear algorithms. The residual noise of the DNN technique can, in principle, be further reduced by additional training. To assess the usefulness of the time invested in such prolonged training, one should take into account whether it has a measurable effect on physical observables, as we describe in the next section.
Single-shot imaging evaluation. In this section we present single-shot absorption images of a quantum degenerate fermionic potassium gas at different conditions. A typical analysis of a low-OD image following a ballistic expansion from a -deep trap is shown in Fig. 4. In panel (4a), we present the inner square part of the input log image. In this example, there are approximately atoms, hence the atomic signal is hardly discernible from the background to the naked eye. When it is subtracted from the network prediction in (4b), a clean OD image is obtained (4c). As a comparison, panel (4d) shows the conventional absorption image obtained from two exposures in the same experiment. Evidently, the single-shot approach eliminates the remaining fringe pattern and yields an overall better OD image. More examples for different trap depths are presented in the upper panel of Fig. 5, and show the same behaviour regardless of the atomic conditions.
Effect on physical observables. In Fig. 6 we plot the number of atoms and temperature for different trap depths as extracted by the single-shot (purple diamonds) and the two-exposures (black circles) techniques. Importantly, the new technique does not introduce any systematic error in extraction of these important observables. The errorbars represent the shot-to-shot variation in the experimental conditions combined with the fitting extraction error. Since both of these terms are of a similar magnitude, it is hard to observe the improvement in the single-exposure technique. To emphasize this improve-
FIG. 4. Reconstruction of a single-shot image with atoms, exemplified with a cloud of The central square of a single-shot log image, including the masked area (dotted white circle). (b) The network prediction. (c) The difference between prediction and input, multiplied by 5, resulting in a fringes-free single-shot absorption image. (d) The result of the conventional two-exposures technique in the same experiment, where the second exposure is taken 50ms after the first one (also multiplied by 5).
FIG. 5. Additional examples of inferences of the neural network on images with atoms (upper panel) for different conditions of the atomic cloud. Lower panel presents the correlative results using the standard double-shot technique. The numbers of atoms in these examples are, from left to right, ; and they were released respectively from 190, 117, 89, 76, and 57nK-deep traps. All examples displayed in the same color scale as in Fig. 4d.
ment, we present in the insets only the fitting extraction relative error averaged over the 10 experimental realizations in each trap depth. We find that the extraction uncertainty of both observables is smaller by using the single-exposure technique.
We have demonstrated a single-shot absorption imaging based on a deep convolutional network background completion. We have shown that this approach can accurately reconstruct atomic density profiles and yield smaller errors on the extracted physical quantities, compared to the standard double-exposure technique. The single-shot imaging lifts the need for fast cameras and facilitates multi-framed acquisitions. The corresponding simplification directly enables simpler and cleaner designs for new cold atomic systems. We have also demonstrated the ability of the DNN to adapt to variations in the working condition that develop through time.
Our network can be improved in several aspects. First, the masked area can be enlarged to achieve even better robustness. Also, by training the network over random patches in the uncropped OD image, the positiondependency of the result can be further reduced. Another interesting direction is the implementation of an online learning scheme, where images are routinely added to the
FIG. 6. Characterization of resulted images – number of atoms and temperature extracted by fitting a Fermi-Dirac distribution to the data. The conditions of the atomic clouds are controlled by the final trap depth in the optical evaporation. Black circles mark the results of the conventional double-exposure technique, while purple diamonds mark the results with the single-shot DNN approach. Errorbars combine extraction uncertainty with shot-to-shot variation over 10 experimental realizations. The insets show the average fit extraction error. The single-exposure technique achieves a better accuracy in both observables.
dataset and the model is continuously updated between inferences.
The trained network and its generating scripts are publicly available as an open-source Python software package  to facilitate their deployment by other experimental groups. Using the provided repository, single-shot imaging can be realized on any imaging apparatus, following local parameters training. k
This research was supported by the Israel Science Foundation (ISF) grant No. 1779/19, and by the United States-Israel Binational Science Foundation (BSF), Jerusalem, Israel, grant No. 2018264. The GeForce TITAN V used for the local network training was donated by the Nvidia Corporation. Remote training power was granted by the Google Cloud Platform research credits program. G.N. would like to thank Amit Oved for inspiring discussions.
 Bo Song, Chengdong He, Zejian Ren, Entong Zhao, Jeongwon Lee, and Gyu-Boong Jo, “Effective statistical fringe removal algorithm for high-sensitivity imaging of ultracold atoms,” arXiv preprint (2020), 2002.10053v1.
 Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “Deep learning,” Nature 521, 436–444 (2015).
 Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd, “Quantum machine learning,” Nature 549, 195–202 (2017).
 Pankaj Mehta, Marin Bukov, Ching-Hao Wang, Alexandre G.R. Day, Clint Richardson, Charles K. Fisher, and David J. Schwab, “A high-bias, low-variance introduction
 Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby, Leslie VogtMaranto, and Lenka Zdeborová, “Machine learning and the physical sciences,” Reviews of Modern Physics 91 (2019), 10.1103/revmodphys.91.045002.
 Maithra Raghu and Eric Schmidt, “A survey of deep learning for scientific discovery,” arXiv preprint (2020), 2003.11755v1.
 P. B. Wigley, P. J. Everitt, A. van den Hengel, J. W. Bastian, M. A. Sooriyabandara, G. D. McDonald, K. S. Hardman, C. D. Quinlivan, P. Manju, C. C. N. Kuhn, I. R. Petersen, A. N. Luiten, J. J. Hope, N. P. Robins, and M. R. Hush, “Fast machine-learning online optimization of ultra-cold-atom experiments,” Scientific Reports 6 (2016), 10.1038/srep25890.
 A. D. Tranter, H. J. Slatyer, M. R. Hush, A. C. Leung, J. L. Everett, K. V. Paul, P. Vernaz-Gris, P. K. Lam,
 Ippei Nakamura, Atsunori Kanemura, Takumi Nakaso, Ryuta Yamamoto, and Takeshi Fukuhara, “Nonstandard trajectories found by machine learning for evaporative cooling of 87rb atoms,” Optics Express 27, 20435 (2019).
 A J Barker, H Style, K Luksch, S Sunami, D Garrick, F Hill, C J Foot, and E Bentine, “Applying machine learning optimization methods to the production of a quantum gas,” Machine Learning: Science and Technol- ogy 1, 015007 (2020).
 Lewis R B Picard, Manfred J Mark, Francesca Ferlaino, and Rick van Bijnen, “Deep learning-assisted classifica-tion of site-resolved quantum gas microscope images,” Measurement Science and Technology 31, 025201 (2019).
 Zi-Han Ding, Jin-Ming Cui, Yun-Feng Huang, Chuan- Feng Li, Tao Tu, and Guang-Can Guo, “Fast high-fidelity readout of a single trapped-ion qubit via machine-learning methods,” Physical Review Applied 12 (2019), 10.1103/physrevapplied.12.014038.
 Constantine Shkedrov, Yanay Florshaim, Gal Ness, Andrey Gandman, and Yoav Sagi, “High-sensitivity rf spectroscopy of a strongly interacting fermi gas,” Physical Review Letters 121 (2018), 10.1103/phys- revlett.121.093402.
 Gal Ness, Constantine Shkedrov, Yanay Florshaim, and Yoav Sagi, “Realistic shortcuts to adiabaticity in optical
transfer,” New Journal of Physics 20, 095002 (2018).
 PCO pixelfly usb (ICX285AL CCD).
 Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Lecture Notes in Computer Science (Springer International Publishing, 2015) pp. 234–241.
 The imaging system translates the mask diameter of 190 pixels into at the atoms plane.
 François Chollet (2015).
 Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” arXiv preprint (2014), 1412.6980v9.
 Xavier Glorot and Yoshua Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics (2010) pp. 249– 256.
 Sergey Ioffe and Christian Szegedy, “Batch normaliza- tion: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint (2019), 1502.03167v3.
 Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Curran Associates, Inc., 2014) pp. 2672–2680.
 absDL repository is available at http://absDL.github.io.