Neural Architecture Search for Compressed Sensing Magnetic Resonance Image Reconstruction

2020·arXiv

Abstract

1. Introduction

Magnetic Resonance Imaging (MRI) can noninvasively provide various contrast images for assessing anatomical structures and physiological functions. However, the raw data of MRI are acquired in k-space (i.e. Fourier space), and the speed of scanning is limited by physiological constraints [26]. The relative slow imaging speed limits the use of MRI in applications that are motion-sensitive or time-starved. Thus, MRI acceleration has been an active research area since it was proposed in the 1970s [48].

Among various MRI acceleration methods, Compressed Sensing Magnetic Resonance Imaging (CS-MRI) gains a lot of attention because it can significantly accelerate MRI without any additional hardware [32]. The key principle of compressed sensing [10] is that we can reconstruct high-quality images from sub-Nyquist sampling signals when the following two assumptions are satisfied: first, the image has a sparse representation in a specific transform domain; second, the sampling and the sparsity domain are incoherent. Based on this theory, we can reconstruct MR images from randomly sub-sampled k-space data by iterative optimization algorithms to suppress the aliasing artifacts, i.e. incoherent noise, caused by sub-Nyquist sampling. Thus, the major problem of CS-MRI transfers to: 1) designing a better sparse representation [30] for de-aliasing; 2) the efficient implementation for clinical translation. The whole framework of traditional CS-MRI is shown in Fig.1.

Nowadays, Deep Learning (DL) [23] has achieved a dominant position in many computer vision applications including object detection [12], semantic segmentation [31], image de-noising [51], image super-resolution[9], etc. Researchers have also validated the feasibility of deep neural networks based MRI reconstruction. As far as we know, the deep convolutional neural network (CNN) was firstly introduced to CS-MRI by Wang et al. [50], where a three-layer CNN was trained with L2 loss between paired zero-filled and fully-sampled reconstruction MR images. Since then, significant progresses have been made by researchers to produce better MR image reconstruction results including data-driven methods [60, 33] and model-driven methods [43, 1]. A detailed review will be introduced in the following section. Researchers have successfully developed various DL based frameworks for CS-MRI, but limited attention is paid to how the network architectures can affect the reconstruction process. In previous works, all the networks were designed by handcraft, thus the performances of these networks highly depend on researchers’ expertise and labor with the following two concerns. On one hand, only several common convolutional operations (e.g. convolution) are tried in current works and other operations (e.g. dilated convolution) with their possible combinations are not sufficiently explored. On the other hand, it is hard to balance the performance and computation cost of CNNs by limited manual attempts.

Recently, more and more researchers focus on developing algorithmic solutions to automate the manual process of architecture design. Architectures automatically found by

Figure 1: An illustration of the traditional CS-MRI acceleration framework. Firstly, the k-space signals acquisition process is accelerated by sub-sampling. Then, zero-filled reconstruction images are achieved by performing Inverse Fast Fourier Transform (IFFT) on sub-sampled k-space data. Finally, high quality MR images can be reconstructed by alleviating aliasing patterns in the zero-filled reconstruction images using iterative optimization algorithms. For example, if an 8-fold Cartesian sub-sampled mask is applied in k-space data, the signals acquisition process can be accelerated 8 times theoretically.

Neural Architecture Search (NAS) algorithms have achieved better performance with fewer computation resources in various vision tasks such as image classification [61, 5, 29] and semantic segmentation [27]. Inspired by this, here we proposed a novel and efficient MR image reconstruction network by NAS. Our main contributions can be summarized as follows:

• We present a novel network designed for MR image reconstruction via NAS instead of manual attempts. To the best of our knowledge, we are the first to introduce NAS to solve CS-MRI problem.

• Experimental results on a knee MR dataset demonstrate that the searched network can achieve better performance with times fewer computation resources than manually designed ones. The combination of NAS and CS-MRI is effective.

• How hyper-parameters affect reconstruction results and the searched structures was explored, which may offer insights to design neural networks for other medical image applications.

• Extensive experiments on a brain MR dataset prove that the searched network can be directly generalized to different organs via re-training. And a more task-specific architecture can be identified by re-searching.

• The evaluation code will be available at https://github. com/yjump/NAS-for-CSMRI.

The organization of this paper is as follows. In Section II, recent developments of DL based CS-MRI frameworks and

NAS algorithms are reviewed. In Section III, details of how to search the network architecture are elaborated. In Section IV, experimental results show that our model is effective and efficient from both the quantitative and qualitative perspective. Extensive experiments of hyper-parameters and model generalizability are also presented. In Section V, we discuss and draw the conclusion of this study.

2. Related work

2.1. Recent Developments in DL based CS-MRI frameworks

Current DL based CS-MRI frameworks can be roughly divided into two categories: data-driven and model-driven methods.

For data-driven methods, inspired by the initial work of Wang et al. [50], researchers proposed different frameworks to learn the relationship between sub- and fully-sampling data. To build the mapping from sub-sampled k-space data to fully-sampled image data, Zhu et al. [60] proposed AutoMap model with fully-connected and convolutional layers. To estimate the missing k-space data, RAKI [2] and LORAKI [20] focused on using CNNs to implement interpolation reconstruction in the k-space domain. To perform reconstruction in the image domain, Lee et al.[25] reconstructed the magnitude and phase of MR images by two separated networks. Different generative adversarial networks (GAN) [13] were explored in [33, 52, 38] to reconstruct MR images with a discriminator-based loss for recovering more detailed textures. In these data-driven methods, deep networks can be regarded as the “black box" trained in an end-to-end fashion from the input domain to the output domain

directly.

For model-driven methods, researchers used deep networks to learn image priors for MR image reconstruction and then integrated these networks into traditional algorithms to unroll the iterative optimization process. Sun et. al. [44] firstly used convolutional layers to unroll the Alternating Direction Method of Multipliers (ADMM) optimization to solve the single-coil MR image reconstruction. A variational network was proposed by [14] to deal with the multi-coil problem. The data consistency module proposed in DCCNN [43] that performs iterative reconstruction in a cascading way has a great impact on the following works [17, 45, 55] and was generalized for common inverse problems and multi-channel MR data as the Model-based Deep Learning (MoDL) [1].

Although there exist various reconstruction frameworks now, the network architectures used in both data-driven and model-driven methods are very similar. U-net [41] and its variants with residual learning [25], cascading n-fold architecture [38] and channel-wise attention [17] were explored in independent works. U-net is famous for its success in medical image semantic segmentation, but it is not designed specifically for MR image reconstruction. In [50, 43, 1], plain fully convolutional networks were adopted. Following DCCNN [43], RDN [45] introduced dilated convolution [53] and recursive learning [19] to produce higher quality reconstructions with fewer parameters. Huang et al.[17] and Zeng et al.[55] also focused on how to design fine and novel structures instead of plain CNN with data consistency modules to improve final results. These architectures are all manually designed and have limited performance due to the concerns mentioned above. In contrast to these works, we tried to find a better architecture with the help of NAS.

2.2. Recent Developments in NAS Algorithms

The network architecture plays an important role in the study of DL and there are many famous architectures, e.g. AlexNet [22], InceptionNet [47], ResNet [15], etc. NAS aims to develop algorithms for automatical neural architectures design. At a high level, current methods usually fall into three categories: evolutionary algorithm (EA) [3], reinforcement learning (RL) [46], and differentiable search [29].

In EA based NAS methods[40, 39], the best architecture was obtained by progressively mutating a population of candidate architectures. Reinforcement learning (RL) technique is an alternative to EA in [61, 5, 28] by training a recurrent neural network [34] meta-controller to generate final architectures from a predefined sequences encoding search space. The major limitation of these EA and RL based methods is that they tend to require a large number of computation resources.

Our work is most closely related to the final differentiable search methods. Based on the continuous relaxation of the architecture representation [29], the architecture of inner cells can be selected via back propagation [24] automatically. Recent applications of differentiable search all focused on the classification and segmentation task of natural images, typically DARTS [29] and Auto-Deeplab[27]. These

automatically searched network architectures have outperformed previous handcrafted ones in these tasks.

3. Methodology

To clarify our method, we firstly introduce DL based MR image reconstruction framework, then we present our neural architecture search strategy. Because multi-channel MRI data need a great number of computation resources, we perform all the formulations and experiments in the single-coil MR image reconstruction scene.

3.1. DL based MR Image Reconstruction Framework

We followed DCCNN [43] and MoDL[1] to unroll the alternating minimization algorithm with cascading CNN-derived constraint for the CS-MRI problem.

The aim of CS-MRI is to reconstruct the fully-sampled, i.e. aliasing free, MR image from the sub-sample k-space measurement , such that:

where is the sub-sampling encoding matrix (e.g. Fourier encoding). Then can be obtained by solving the following unconstrained optimization problem:

where is the regularization term in the image domain, and can be regarded as the data consistence term between the image domain and k-space domain. For the traditional CS-MRI framework, L1 or L2 norm of is often used as the regularization term. For DL based CS-MRI, the deep CNN is integrated into this formulation by:

where represents the deep CNN network with learn-able weights . This problem can be solved with the alternating minimization steps by:

The sub-problem of Eq.4 can be regarded as a de-aliasing problem in the image domain. Given paired zero-filled MR reconstruction image as and fully-sampled MR image , the CNN with its weights can be obtained by minimizing the objective function:

Figure 2: The backbone framework of MR image reconstruction. The K-space Fusion Module helps to correct biases that accrue during the inference of the CNN. This strategy is widely used in different works [43, 1, 17, 45, 55], with various reconstruction modules. In MoDL[1], all the reconstruction modules share the same weights to reduce the number of learn-able parameters.

where is the loss function, e.g. L1 loss or L2 loss.

The sub-problem of Eq.5 is related to the data consistency problem between k-space and image domain. As for single channel MR image acquisition scenario, i.e. where applies two-dimensional Fast Fourier Transform (FFT) and is the sub-sampling mask selecting lines in k-space, Eq.5 has a close-formed solution:

This solution was firstly introduced in DCCNN [43] as the data consistency process. MoDL[1] generalized Eq.5 to multi-channel acquisition cases. For example, to deal with multi-coil MR image, we have where the coil sensitivity map needs to be taken into consideration so the conjugate gradient (CG) algorithm is used to solve this more complex problem because is not analytically invertible. Then, the data consistency process can be integrated as a layer with deep CNNs without trainable parameters. In other words, we fuse the accurate partial k-space data into deep CNNs to correct biases that accrue during the inference periods by Eq.7.

Note the deep CNNs by reconstruction modules and the data consistency process by k-space data fusion modules, we can unroll Eq.5 and Eq.4 by cascading these modules. According to the limited computation resource we have, we iterate these modules three times to form the final backbone shown in Fig.2.

Under this uniform framework, some previous works [17, 45, 55] discussed how the design of reconstruction modules can improve the quality of MR image reconstruction. As mentioned above, their networks were all built by handcraft. DCCNN[43] and MoDL [1] all used a plain CNN as the reconstruction module with residual learning shown in Fig.3. Particularly in MoDL[1], all the reconstruction modules shared the same weights to reduce the number of parameters. Sun et al.[45] designed a recursive dilated network named RDN shown in Fig.4.

In this work, we searched for the internal structure of the cells, which are stacked to form the reconstruction module drawn in Fig.5 via the NAS algorithm automatically. The concept of cell is also adopted in previous NAS works [29, 27, 37] and will be introduced in the following section. The

Figure 3: The reconstruction module of DCCNN [43] and MoDL [1]. A plain convolutional neural network with residual learning is adopted originally drawn in (a). In our reimplementation, we use residual blocks (b) instead of plain CNN drawn in (c) to avoid the gradient disappearance.

first and the last common convolutional layers are used to refine the channels of the input and the output data similar to previous works [43, 45].

In this manuscript, a cell maps the output tensors of previous two cells to construct its output, i.e. we can have:

where is the output of cell is a parameter representing the relaxation of discrete inner cell architectures by the following formulations.

Figure 4: The reconstruction module of RDN [45]. The dilated convolution is used to expand perception field and improve the performance. All the yellow dilated blocks share the same weights in reconstruction module by recursive learning [19] to reduce parameters.

Figure 5: The reconstruction module used in our network. The first and the last common convolutional layers are used to refine the channels of the input and the output data. The searched cells are stacked to form the whole architecture.

Define the connections between two nodes as a selection from candidate layer operations set , the input of as a se- lection from tensors set contains different CNN layer types, such as convolutional layer. The latter node

can take in the input of this cell and all previous nodes’ outputs, so we have: , .

Before searching internal cell architecture, these nodes are densely connected by all possible layer types in shown in Fig.6 (a). The parameter is integrated into each cell node by the following two steps.

Second, the parameter is added as the probability associated with each operation by:

where is limited by:

With the introduction of , the cell architecture search problem can be successfully integrated into a differentiable computation graph. After optimizing via gradient descent, the layer operations with top-2 -value are preserved to form the final structure shown in Fig.6 (b).

To clarify the final searched cell structure, we can define a 4-tuple , where are selections of the input tensors and are selections of candidate layer operations based on the optimized value. Thus we have:

Finally, the structure of the cell is defined and can then be used as a common CNN module.

3.2.2. Design of the Operation Search Space The candidate layer operations set was defined by us as follows shown in Fig.7:

• Sep_Conv_3x3 : The separable convolutional layer is formed by cascading a depthwise separable convolutional layer and a pointwise layer.

• Dil_2_Conv_3x3 : The separable convolutional layer with a dilation rate of 2 in the depthwise separable layer.

• Dil_3_Conv_3x3 : The separable convolutional layer with a dilation rate of 3 in the depthwise separable layer.

Figure 6: The definition of the cell and parameter . A cell takes in the output tensors of the previous two cells to construct its own output. Internal nodes with connections formed the cell are defined as a directed acyclic computation graph. Before searching for the final architecture, these nodes are densely connected by all possible layer types shown in (a). After optimization, the best two connections with its most suitable layer type are preserved according to and form the final architecture in (b).

Figure 7: The candidate layer operations set

• Skip connect

• None connect

We chose these operations based on the following observation and summary of previous works:

• The separable convolution was proposed by [7] and widely used in Mobilenet[16], Shufflenet[56] and other efficient networks. Comparing with common convolutional layer, separable convolution can use fewer calculations and parameters.

• The dilated convolution [53] is widely adopted in lowlevel image analysis tasks, because it can expand perception field without adding parameters. RDN [45] shows that dilated layers with various dilation rates CAN benefit the performance of MR image reconstruction.

3.2.3. Optimization Strategy We followed the optimization strategy in [29] to search

the inner structure of cells. The training data was divided

integrating to Eq.6 as:

According to [57], the L1 loss is beneficial to train machine learning models on computer vision tasks, even when the evaluation is performed under L2 norm related metrics, e.g. PSNR. Inspired by this, we defined the loss function as L1 loss between paired zero-filled MR image and fully-sampled MR image . The searching process needs to be stopped when the cell structure starts to keep stable according to early stopping strategy, which is commonly used in NAS works [11, 4].

After the structure of the whole network is determined, the final network needs to be re-trained on the whole training set to maximize its final reconstruction performance.

We compared our framework with the following methods: the conventional Total Variation (TV) [42] minimization based iterative algorithm, U-net baseline model used in [54] as a typically data-driven method, DCCNN [43], MoDL [1], and RDN [45] which can be regarded as a representative improved approach following [43, 1].

4.1. Dataset and Pre-processing

We conducted all the experiments on the fastMRI dataset [54] including knee and brain subsets obtained on 3 and 1.5 Tesla magnets. Due to limited computation resources we have, we randomly selected 80 scans with 2829 slices as the training set and 40 scans with 1457 slices as the testing set from the knee subset, and this mini-fastMRI knee dataset contains more slices than experiments in DCCNN[43] and RDN[45]. To evaluate the generalizability of the searched architecture for different organ reconstruction tasks, extensive experiments were carried out on a mini-fastMRI brain dataset including 100 scans with 1600 slices as the training set and 50 scans with 800 slices as the testing set from the T2 weighted brain subset.

The mini-fastMRI knee dataset provides raw complex-valued fully-sampled single-coil k-space data with different sizes. The following steps were performed to make paired zero-filled and fully-sampled reconstructions: First, 2D-IFFT was applied on original k-space data to get MR images, which were then cropped centrally to generate 320 cropped complex images. After that, 2D-FFT was performed on each and obtain the corresponding fully-sampled k-space data. The mini-fastMRI brain dataset, however, does not offer the complex-valued single-coil k-space data but has real-valued brain MR images with the same size of . We treated the real-valued brain MR images as complex-valued with zero phases following [45, 55] so that 2D-FFT could be applied to get simulate complex-valued k-space data.

To simulate the accelerated signals acquisition process, we used 4-fold and 8-fold Cartesian sub-sampling masks following the settings in [54]: For 4-fold sub-sampling, the central 8% k-space lines are full-sampled with the outer k-space under-sampled uniformly. For 8-fold sub-sampling, the central 4% k-space lines are full-sampled with the outer k-space under-sampled uniformly.

All the MR images were normalized by magnitude to [-6,6] with their phases unchanged. After normalization, the complex-valued images were separated into the real and the imaginary image and concatenated as the 2-channel input of the reconstruction network.

The BART toolkit [49] was used to perform the TV minimization iterative reconstruction algorithm. We set the total variation regularization weight as 0.01 and implemented 200 iterations on each slice.

All the deep CNNs in this manuscript were implemented with Pytorch [35]. The input and output of all the CNNs are 2 channels and all the other layers have 32-channel output except for special reminders (e.g. U-net). No normalization operation, e.g. batch-normalization [18], was used in these networks following DCCNN [43].

We searched for our reconstruction module on the knee dataset with 3 cascading cells and each cell includes 3 internal computation nodes, named NasN_Knee. The final searched cell structure is drawn in Fig.8. The outputs of and are processed by convolutional layers in to reduce channel number from 96 to 32.

Figure 8: The final searched cell structure with 3 internal nodes, named NasN_Knee. Two input connections with their independent layer type are preserved to form the whole architecture.

To make different networks comparable, we made different networks have relatively similar learn-able parameters and floating-point operations (FLOPs). As a result, the reconstruction module of DCCNN[43] contains 3 residual blocks, and the reconstruction module of RDN [45] contains 3 recursive dilated blocks. Because we are dealing with single-coil reconstruction, we re-trained the DCCNN with sharing-weight reconstruction modules to re-implement MoDL [1]. For U-net, the input data are down-sampled 4 times with channels doubled starting from 32 channels.

According to DCCNN [43] and RDN [45], more blocks in their reconstruction modules will produce better results. Thus, we also added more blocks in these networks to evaluate whether they can use more computation resources to achieve similar performance with our searched network.

Table 1 Quantitative knee MR reconstruction results of different methods. The results are calcu- lated on the testing set including 40 scans. (

4.2.2. Training Strategy

All deep networks were trained on one TITAN X Pascal GPU with 12GB memory using Adam optimizer [21] for parameter learning with L1 loss only and a batchsize of 2. The initial learning rate was set to be 0.001 for the first 40 epochs and 0.0001 for the later 40 epochs. During the training process, the 4-fold and 8-fold Cartesian sub-sampling mask was generated randomly for every slice with equal possibility. This can also be viewed as a data augmentation to avoid over-fitting [6].

4.2.3. Evaluation Strategy

For all deep methods, the FLOPs per inference were calculated by counting multiplication and add operations in all convolutional layers inside the reconstruction modules after feeding a 2-channel MR image. The FLOPs is a wildly used metric to evaluate the computation resource cost in previous NAS works [29, 27]. And the numbers of learn-able parameters are also provided, because previous works [1, 45] claimed they used fewer parameters to achieve better performance.

We evaluated the MSE, Normalized MSE (NMSE), PSNR and SSIM [59] between reconstructed results and target fully-sampling MR images on magnitude with the similar setting in baseline works [1, 43, 45]. In evaluation, half of the total cases were sub-sampled with 4-fold Cartesian sub-sampling masks and the other with 8-fold ones. All quantitative evaluation results were calculated on images reconstructed from the same corresponding sub-sampling patterns.

4.3. Knee MR Reconstruction Comparison

The quantitative evaluation results of knee MR reconstruction are shown in Tab.1, demonstrating that searched NasN_Knee network outperforms previous state-of-the-art frameworks. Among all the deep models, U-net is more different because its feature maps are reduced in size after down-sampling with channels doubled. So U-net has much more learn-able parameters than others but with fewer opera-

tions. U-net has more parameters than our proposed network while fails to provide better results. DCCNN, MoDL, RDN, and our NasN_Knee all adopt k-space fusion strategy. Comparing to DCCNN, MoDL uses sharing-weight reconstruction modules to reduce parameters and leads to worse reconstruction results. Because dilated blocks in RDN share the same weights, so its learn-able parameters do not increase with more blocks. Although RDN and MoDL contain fewer learn-able parameters, “there is no free lunch", the FLOPs do not decrease by recursive learning, i.e. the inference speed is still limited. Our searched NasN_Knee architecture uses fewer FLOPs to reconstruct better results than RDN with 8 recursive dilated blocks.

The qualitative knee MR reconstruction results are shown in Fig.9 and Fig.10. Two slices reconstructed from different sub-sampling ratios are presented, where we can find NasN_Knee can reduce aliasing artifacts more effectively compared with other methods. When the sub-sampling ratio gets bigger, our model performs much better than other methods with less noise and more accurate structural details.

4.4. Analysis of Hyper-Parameters

Although the architecture of cells is searched automatically, there still exist some hyper-parameters set manually in the whole framework. In this part, we conduct experiments to explore how these hyper-parameters may affect the reconstruction performance and the searched architecture.

4.4.1. Number of Cascading Modules and Cells

Given the searched cell structure (shown in Fig.8), we can cascade more modules (shown in Fig.2) or more cells in each module (shown in Fig.5) to build the whole reconstruction framework.

The reconstruction performance and FLOPs as functions of cascading modules are shown in Fig.11. While the reconstruction performance and FLOPs as functions of cascading cells in each module are shown in Fig.12.

Figure 9: The qualitative knee MR reconstruction results of different methods with 4-fold sub-sampling. Here the red boxes address where our reconstruction results have less noise.

Figure 10: The qualitative knee MR reconstruction results of different methods with 8-fold sub-sampling. Here the red boxes address where our reconstruction results have less noise.

We can find that more cascades lead to better reconstruction results but with heavier computation load. Comparing to the increase of FLOPs, the improve of performance may be not efficient enough.

4.4.2. Number of Internal Nodes in Each Cell

To explore the impact of internal nodes in each cell, we searched for different architectures where each cell contains a different number of internal nodes on the knee MR dataset. The searched results with 2 and 4 internal nodes are drawn

Table 2 Quantitative evaluation results of searched architectures with different internal nodes. The results are calculated on the testing set including 40 scans. (

Figure 11: The reconstruction performance and FLOPs as functions of cascading modules.

Figure 12: The reconstruction performance and FLOPs as functions of cascading cells in each module.

in Fig.13. The reconstruction performance of these architectures on the knee dataset is listed in Tab.2.

The results show that the increasing internal nodes bring mild but consistent benefit for the final performance. With 2 internal nodes and 6 times fewer FLOPs, the searched architecture still outperforms RDN[45]. And the found structures shown in Fig.8 and Fig.10 demonstrate that the algorithm does learn some principles to form the reconstruction architecture automatically. These principles offer insights for other researchers to design their network for other MR medical image tasks, e.g. Semantic Segmentation [8] or SuperResolution [58]:

• Deeper feature maps require larger perception fields. In our searched results, layers with dilation rate 3 are always preferred in the bottom of the cell behind layers with dilation rate 2.

• It is important to fuse feature maps from different levels properly. In our searched cell structure, feature

4.4.3. The Operation Search Space

To appreciate the value of the operation search space, experiments were performed to explore the relationship between operation search space and network performance.

Note original operation search space by A, we define a new space B as follows:

• Sep_Conv_3x3

• Dil_2_Conv_3x3

• Conv_9x1_1x9 : two cascading depthwise separable convolutional layers with the kernel size of and .

• Skip Connect

• None Connect

The design of Conv_9x1_1x9 shown in Fig.14 is inspired by [36], where large convolutional kernels lead to better performance in semantic segmentation. Two cascading and convolutional layers enable dense connections within a large region when producing the feature map.

The search result is drawn in Fig.15 and the reconstruction results of these architectures are listed in Tab.3. Results show that competitive reconstruction performance is achieved with fewer computation resources. With a different search space, the principle that larger perception layers are placed in the bottom still holds. What’s more, this experiment shows that the setting of operation search space allows us to explore more possibilities, if there are some novel and efficient layer types proposed in the future.

4.5. Generalizability of the Searched Architecture

We evaluated the generalizability of the searched architecture on the mini-fastMRI brain dataset from the following two aspects: for one, the architecture searched from the original knee data-set, i.e. NasN_Knee, was directly generalized to the brain dataset via re-training; for the other, we searched for another specific architecture with the similar setting of NasN_Knee for brain MR reconstruction, named NasN_Brain, but with an extended searching space including all the six operation types mentioned in this manuscript. The searched architecture is shown in Fig.16.

The quantitative brain MR reconstruction results are shown in Tab.4. We can find that NasN_Knee re-trained on the brain dataset still exceeds previous baseline models, demonstrating that the searched architectures can be generalized to

Figure 13: The searched structures of the cell with 2 and 4 nodes inside. Convolutional layers with larger perception field are always placed in the bottom and Feature maps from different levels are fused properly. By observing the selection of connections and layer types, some insights of network design can be concluded.

Table 3 Quantitative evaluation results of searched architectures with different operation search spaces. The results are calculated on the testing set including 40 cases. (

Figure 14: The structure of Conv_9x1_1x9. Two cascading convolutional layers with kernel size are adopted to achieve a perception field.

different tasks via re-training. And NasN_Brain can outperform NasN_Knee with fewer FLOPs and parameters, indicating that the architecture has a latent impact for MR reconstruction. The performance degradation of RDN with

more blocks and MoDL implies that reducing learning parameters by sharing weights does not always work in different tasks. The qualitative brain MR reconstruction results of all DL based methods are shown in Fig.17 with 8-fold sub-sampling ratios.

We also re-trained NasN_Brain on the knee dataset and it produces slightly worse results than NasN_Knee but still outperforms manual designed baseline deep models shown in Tab.5. We can conclude that the searched architecture is task-specific for MR reconstruction of different organs.

5. Discussion and Conclusion

The MRI acceleration is highly demanded in clinical practice and has been an active research area for years. The introduction of compressed sensing made a significant breakthrough in the reduction of MRI scan time. Nowadays, deep learning technology brings new chances for us to solve this problem better. Although plenty of works have been explored, there still exists a gap between research works and clinical practice due to complex network architecture design and heavy computation cost. In this work, we evaluated the

Table 4 Quantitative brain MR reconstruction results of different DL based methods. The results are calculated on the testing set including 50 scans. (

Table 5 Quantitative reconstruction performance of the searched architectures on different datasets. (

Figure 15: The searched cell structure when the operation search space is redefined.

applicability of NAS to improve the DL based CS-MRI performance remarkably. The key insight of our work is that we can search for a specific and novel network architecture for CS-MRI in a differentiable way, based on continuous re-

Figure 16: The searched cell structure on the brain MR dataset, named NasN_Brain.

laxation of the cell. Experiments demonstrate that our optimized network can reconstruct MR images better than previous handcrafted networks even with 6 times fewer computation resources. These results imply that automatically searched neural architecture can balance performance and

Figure 17: The qualitative brain MR reconstruction results of different DL based methods with 8-fold sub-sampling. Here the red boxes address where our reconstruction results have less noise.

efficiency better, so may be more friendly for possible clinical translation in the future. The analysis of how hyper-parameters may affect the network architectures found by NAS shows that the searched structures follow some basic principles. Deeper feature maps in the single block require larger perception fields and should be properly refined with skip connection from the shallower layers. These principles may offer insights for networks used in other medical image tasks. And the setting of layer types in operation searching space makes it an open workflow allowing us to add other advanced structures in the future. Extensive experiments to explore the generalizability of the searched architectures prove that the searched network can be directly generalized to different organ MR reconstruction tasks via re-training. And if we re-search the architecture for the exact application, the network will be more task-specific. The following problems of this work will be explored in the near future. Due to the limited computation resources we have, we performed all experiments on single-coil MR reconstruction cases. But our methods can be generalized to multi-coil reconstruction by using 3D convolutional layers instead of 2D ones in the operation search space with the conjugate gradient solution instead of the close-formed one as MoDL[1] mentioned in Sec.3.1. Another concern lies in the simple loss function, i.e. L1 loss and 1D Cartesian sub-sampling pattern adopted in this manuscript for demonstrating purpose, other loss functions, i.e. GAN[33] loss, and sub-sampling patterns will be further explored with our workflow to improve the quality of final reconstruction. To conclude, we present a novel reconstruction network

for the CS-MRI task by automatically searching the inner structure. Experiments show the searched network can reconstruct MR images better and more efficiently than previous works. The searched architectures can be directly generalized to different MR reconstruction tasks of different organs through re-training and will be more task-specific via the architecture re-searching. With the superiority of good performance and the general applicability of neural architecture search, we expect that the proposed workflow can become a promising research direction for MRI acceleration with great potential impacts on other medical image applications.

CRediT authorship contribution statement

Jiangpeng Yan: Conceptualization of this study, Methodology, Software, Writing - Original draft preparation. Shou Chen: Data curation, Validation, Writing - Review and editing. Yongbing Zhang: Methodology, Writing - Review and editing. Xiu Li: Project administration, Supervision, Writing - Review and editing, Funding acquisition.

Funding

This research was partly supported by National Natural Science Foundation of China (Grant No. 41876098 and Grant No. 81901734) and Overseas Cooperation Research Fund of Tsinghua Shenzhen International Graduate School. (Grant No. HW201808).

Acknowledgment

The authors would like to thank Dr. Zechen Zhou, Philips Research North America, Cambridge, Massachusetts, and Dr. Rui Li, Center for Biomedical Imaging Research, Department of Biomedical Engineering, School of Medicine, Tsinghua University for their valuable discussions that significantly improved the quality of the manuscript. They also thank the authors of fastMRI [54] for making their code and dataset accessible online. Special thanks are also given to the reviewers for their constructive comments.

References

[1] Aggarwal, H.K., Mani, M.P., Jacob, M., 2018. Modl: Model-based deep learning architecture for inverse problems. IEEE transactions on medical imaging 38, 394–405.

[2] Akçakaya, M., Moeller, S., Weingärtner, S., Uğurbil, K., 2019. Scan-specific robust artificial-neural-networks for k-space interpolation (raki) reconstruction: Database-free deep learning for fast imaging. Magnetic resonance in medicine 81, 439–453.

[3] Angeline, P.J., Saunders, G.M., Pollack, J.B., 1994. An evolutionary algorithm that constructs recurrent neural networks. IEEE transactions on Neural Networks 5, 54–65.

[4] Baker, B., Gupta, O., Raskar, R., Naik, N., 2017. Accelerating neu- ral architecture search using performance prediction. arXiv preprint arXiv:1705.10823 .

[5] Cai, H., Chen, T., Zhang, W., Yu, Y., Wang, J., 2018. Efficient archi- tecture search by network transformation, in: Thirty-Second AAAI Conference on Artificial Intelligence.

[6] Caruana, R., Lawrence, S., Giles, C.L., 2000. Overfitting in neu- ral nets: Backpropagation, conjugate gradient, and early stopping, in: NIPS.

[7] Chollet, F., 2017. Xception: Deep learning with depthwise separable convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258.

[8] Danelakis, A., Theoharis, T., Verganelakis, D.A., 2018. Survey of automated multiple sclerosis lesion segmentation techniques on magnetic resonance imaging. Computerized Medical Imaging and Graphics 70, 83–100.

[9] Dong, C., Loy, C.C., He, K., Tang, X., 2014. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 295–307.

[10] Donoho, D.L., 2006. Compressed sensing. IEEE Transactions on Information Theory 52, 1289–1306.

[11] Fiszelew, A., Britos, P., Ochoa, A., Merlino, H., Fernández, E., García-Martínez, R., 2007. Finding optimal neural network architecture using genetic algorithms. Advances in computer science and engineering research in computing science 27, 15–24.

[12] Girshick, R., Donahue, J., Darrell, T., Malik, J., 2014. Rich feature hierarchies for accurate object detection and semantic segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition.

[13] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Bing, X., Warde- Farley, D., Ozair, S., Courville, A., Bengio, Y., 2014. Generative adversarial nets, in: International Conference on Neural Information Processing Systems.

[14] Hammernik, K., Klatzer, T., Kobler, E., Recht, M.P., Sodickson, D.K., Pock, T., Knoll, F., 2018. Learning a variational network for reconstruction of accelerated mri data. Magnetic resonance in medicine 79, 3055–3071.

[15] He, K., Zhang, X., Ren, S., Jian, S., 2016. Deep residual learning for image recognition, in: IEEE Conference on Computer Vision and Pattern Recognition.

[16] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H., 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 .

[17] Huang, Q., Yang, D., Wu, P., Qu, H., Yi, J., Metaxas, D.N., 2018. Mri reconstruction via cascaded channel-wise attention network. 16th IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) , 1622–1626.

[18] Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ArXiv abs/1502.03167.

[19] Kim, J., Lee, J.K., Lee, K.M., 2015. Deeply-recursive convolutional network for image super-resolution. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 1637–1645.

[20] Kim, T.H., Garg, P., Haldar, J.P., 2019. Loraki: Autocalibrated recur- rent neural networks for autoregressive mri reconstruction in k-space. arXiv preprint arXiv:1904.09390 .

[21] Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimiza- tion. arXiv preprint arXiv:1412.6980 .

[22] Krizhevsky, A., Hinton, G., 2010. Convolutional deep belief networks on cifar-10. Unpublished manuscript 40, 1–9.

[23] LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. nature 521, 436–444.

[24] LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D., 1989. Handwritten digit recognition with a back-propagation network, in: NIPS.

[25] Lee, D., Yoo, J., Tak, S., Ye, J.C., 2018. Deep residual learning for accelerated mri using magnitude and phase networks. IEEE Transactions on Biomedical Engineering 65, 1985–1995.

[26] Liang, Z.P., Lauterbur, P.C., 2000. Principles of magnetic resonance imaging: a signal processing perspective. SPIE Optical Engineering Press.

[27] Liu, C., Chen, L.C., Schroff, F., Adam, H., Hua, W., Yuille, A.L., Fei- Fei, L., 2019. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 82–92.

[28] Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.J., Fei- Fei, L., Yuille, A., Huang, J., Murphy, K., 2018a. Progressive neural architecture search, in: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34.

[29] Liu, H., Simonyan, K., Yang, Y., 2018b. Darts: Differentiable archi- tecture search. arXiv preprint arXiv:1806.09055 .

[30] Liu, Q., Wang, S., Liang, D., 2017. Sparse and dense hybrid rep- resentation via subspace modeling for dynamic mri. Computerized Medical Imaging and Graphics 56, 24–37.

[31] Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional net- works for semantic segmentation, in: CVPR.

[32] Lustig, M., Donoho, D.L., Santos, J.M., Pauly, J.M., 2008. Com- pressed sensing mri. IEEE signal processing magazine 25, 72.

[33] Mardani, M., Gong, E., Cheng, J.Y., Vasanawala, S.S., Zaharchuk, G., Xing, L., Pauly, J.M., 2018. Deep generative adversarial neural networks for compressive sensing mri. IEEE transactions on medical imaging 38, 167–179.

[34] Mikolov, T., Karafiát, M., Burget, L., Černock`y, J., Khudanpur, S., 2010. Recurrent neural network based language model, in: Eleventh annual conference of the international speech communication association.

[35] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al., 2019. Pytorch: An imperative style, high-performance deep learning library, in: Advances in Neural Information Processing Systems, pp. 8024–8035.

[36] Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J., 2017. Large kernel matters–improve semantic segmentation by global convolutional network, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4353–4361.

[37] Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J., 2018. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268 .

[38] Quan, T.M., Nguyen-Duc, T., Jeong, W.K., 2018. Compressed sens- ing mri reconstruction using a generative adversarial network with a cyclic loss. IEEE Transactions on Medical Imaging 37, 1488–1497.

[39] Real, E., Aggarwal, A., Huang, Y., Le, Q.V., 2019. Regularized evo-

lution for image classifier architecture search, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4780–4789.

[40] Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y.L., Tan, J., Le, Q.V., Kurakin, A., 2017. Large-scale evolution of image classifiers, in: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org. pp. 2902–2911.

[41] Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical Image Computing and Computer Assisted Intervention.

[42] Rudin, L.I., Osher, S., Fatemi, E., 1992. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60, 259–268.

[43] Schlemper, J., Caballero, J., Hajnal, J.V., Price, A.N., Rueckert, D., 2017. A deep cascade of convolutional neural networks for dynamic mr image reconstruction. IEEE Transactions on Medical Imaging 37, 491–503.

[44] Sun, J., Li, H., Xu, Z., et al., 2016. Deep admm-net for compressive sensing mri, in: Advances in neural information processing systems, pp. 10–18.

[45] Sun, L., Fan, Z., Huang, Y., Ding, X., Paisley, J.W., 2018. Com- pressed sensing mri using a recursive dilated network, in: AAAI.

[46] Sutton, R.S., Barto, A.G., et al., 1998. Introduction to reinforcement learning. volume 2. MIT press Cambridge.

[47] Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A., 2017. Inception- v4, inception-resnet and the impact of residual connections on learning, in: Thirty-First AAAI Conference on Artificial Intelligence.

[48] Tsao, J., Kozerke, S., 2012. Mri temporal acceleration techniques. Journal of Magnetic Resonance Imaging 36, 543–560.

[49] Uecker, M., Ong, F., Tamir, J.I., Bahri, D., Virtue, P., Cheng, J.Y., Zhang, T., Lustig, M., 2015. Berkeley advanced reconstruction toolbox, in: Proc. Intl. Soc. Mag. Reson. Med.

[50] Wang, S., Su, Z., Ying, L., Xi, P., Zhu, S., Feng, L., Feng, D., Dong, L., 2016. Accelerating magnetic resonance imaging via deep learning, in: IEEE International Symposium on Biomedical Imaging.

[51] Xie, J., Xu, L., Chen, E., 2012. Image denoising and inpainting with deep neural networks, in: International Conference on Neural Information Processing Systems.

[52] Yang, G., Yu, S., Dong, H., Slabaugh, G., Dragotti, P.L., Ye, X., Liu, F., Arridge, S., Keegan, J., Guo, Y., 2017. Dagan: Deep de-aliasing generative adversarial networks for fast compressed sensing mri reconstruction. IEEE Transactions on Medical Imaging 37, 1310–1321.

[53] Yu, F., Koltun, V., 2015. Multi-scale context aggregation by dilated convolutions. ArXiv abs/1511.07122.

[54] Zbontar, J., Knoll, F., Sriram, A., Muckley, M.J., Bruno, M., Defazio, A., Parente, M., Geras, K., Katsnelson, J., Chandarana, H., Zhang, Z., Drozdzal, M., Romero, A., Rabbat, M.G., Vincent, P., Pinkerton, J., Wang, D., Yakubova, N., Owens, E., Zitnick, C.L., Recht, M.P., Sodickson, D.K., Lui, Y.W., 2018. fastmri: An open dataset and benchmarks for accelerated mri. ArXiv abs/1811.08839.

[55] Zeng, K., Yang, Y., Xiao, G., Chen, Z., 2019. A very deep densely connected network for compressed sensing mri. IEEE Access 7, 85430–85439.

[56] Zhang, X., Zhou, X., Lin, M., Sun, J., 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856.

[57] Zhao, H., Gallo, O., Frosio, I., Kautz, J., 2017. Loss functions for image restoration with neural networks. IEEE Transactions on Computational Imaging 3, 47–57.

[58] Zhao, M., Liu, X., Liu, H., Wong, K.K., 2020. Super-resolution of cardiac magnetic resonance images using laplacian pyramid based on generative adversarial networks. Computerized Medical Imaging and Graphics 80, 101–698.

[59] Zhou, W., Alan Conrad, B., Hamid Rahim, S., Simoncelli, E.P., 2004. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13, 600–612.

[60] Zhu, B., Liu, J.Z., Cauley, S.F., Rosen, B.R., Rosen, M.S., 2018. Im-

age reconstruction by domain-transform manifold learning. Nature 555, 487.

[61] Zoph, B., Le, Q.V., 2016. Neural architecture search with reinforce- ment learning. arXiv preprint arXiv:1611.01578 .

Designed for Accessibility and to further Open Science