b

DiscoverSearch
About
My stuff
Time-Dependent Deep Image Prior for Dynamic MRI
2019·arXiv
Abstract
Abstract

We propose a novel unsupervised deep-learning-based algorithm to solve the inverse problem found in dynamic magnetic resonance imaging (MRI). Our method needs neither prior training nor additional data; in particular, it does not require either electrocardiogram or spokes-reordering in the context of cardiac images. It generalizes to sequences of images the recently introduced deep-image-prior approach. The essence of the proposed algorithm is to proceed in two steps to fit k-space synthetic measurements to sparsely acquired dynamic MRI data. In the first step, we deploy a convolutional neural network (CNN) driven by a sequence of low-dimensional latent variables to generate a dynamic series of MRI images. In the second step, we submit the generated images to a nonuniform fast Fourier transform that represents the forward model of the MRI system. By manipulating the weights of the CNN, we fit our synthetic measurements to the acquired MRI data. The corresponding images from the CNN then provide the output of our system; their evolution through time is driven by controlling the sequence of latent variables whose interpolation gives access to the subframe—or even continuous—temporal control of reconstructed dynamic images. We perform experiments on simulated and real cardiac images of a fetus acquired through 5-spoke-based golden-angle measurements. Our results show improvement over the current state-of-the-art.

Index Terms—dynamic MRI, accelerated MRI, unsupervised learning, deep image prior, Golden-angle trajectory.

There are currently three main approaches to accelerate the magnetic resonance imaging (MRI) of a static image. All three methods rely on a partial sampling of the k-space to reduce the acquisition time. The resulting partial loss of data must then be compensated to maintain the quality of the image. Once compensation is achieved, the accelerated methods capture accurate motions of fast moving organs such as the heart.

1) In parallel MRI (pMRI), the simultaneous use of several hardware coils results in spatial redundancy that enables algorithms to reconstruct clean images [1], [2].

2) In compressed sensing (CS) MRI, the data are assumed to be sparse in certain transform domains [3], [4]. This ultimately leads to regularized computational methods that compensate for the partial sampling. Their success suggests that, in particular, a Fourier-based forward model matches well the assumption of sparsity.

K.H. Jin, H. Gupta, and M. Unser are with Biomedical Imaging Group, ´Ecole polytechnique f´ed´erale de Lausanne, Switzerland. J. Yerly and M. Stuber are with Department of Radiology, University Hospital (CHUV) and University of Lausanne (UNIL), and Center for Biomedical Imaging (CIBM), Lausanne, Switzerland. This work has been funded by European Research Council (ERC) Grant 692726 (H2020-ERC Project GlobalBioIm). (K.H. Jin and H. Gupta are co-first authors).

3) In the context of trainable deep artificial neural networks, learning approaches have already achieved fast and accurate reconstructions of partially sampled MRI data [5], [6]. Similarly to CS MRI, dynamic accelerated reconstructions have also been proposed in the literature [7][9], possibly in combination with pMRI in the learning loop [10]. These approaches depend on training datasets [11][16].

In the context of dynamic MRI, the approach that consists in the acquisition of a sequence of frames is suboptimal. Instead, it is more efficient to take advantage of the time dependencies between frames to gain additional improvements in terms of temporal resolution [17], [18]. For instance, [19], [20] design non-overlapping sampling masks at each frame to restore a dynamic volume—a method that demands far fewer k-space samples than would a sequence of static MRI. Indeed, the CS theory explains the possibility of perfect reconstruction despite a sub-Nyquist sampling [21]. The capture of temporal redundancies has also been handled through low-dimensional manifolds [22], [23]. In the specific case of cardiac applications, the overall motion of the heart is expected to be approximately cyclic. This periodicity can be exploited to bring additional gains in terms of temporal resolution, but the length and phase of the cycles must be determined first. This is usually achieved either through electrocardiograms (ECG) or self-gating [24], [25]. Under the restrictive assumption of ideal periodicity, these methods allow one to prefix the cardiac phases and to reorder temporally the acquired frames, effectively producing a stroboscope-inspired analysis of the motion. Motion irregularities are not captured by those methods.

A. Contribution

In this paper, we propose an unsupervised learning framework in which a generative network reconstructs fast dynamic MRI from golden-angle radial lines in k-space, also called spokes. To reconstruct a single image, we feed one realization of low-dimensional latent variables to a generative artificial neural network. A nonuniform fast Fourier transform1 (NuFFT) is then applied to the output of the generative network and simulates the MRI measurement process [26]. Inspired by deep image priors [27], we fit the simulated measurements to the real measurements. The fit is controlled by adjusting the weights of the generative network until the Euclidean loss between the simulated and real measurements is minimized; this fitting process is referred to as the learning stage.

In the context of dynamic MRI, we extend the fitting process in such a way that the weights of the network are learned by taking simultaneously into consideration the joint collection of all acquisitions, which yields time-independent weights. Time dependence is recovered by controlling the latent variables. Given some temporal interval, we synthesize two independent random realizations of low-dimensional latent variables and associate one to the initial and one to the final bound of the interval. Timestamped contributions to learning are obtained by taking as ground-truth the real measurements acquired over a quasi-instantaneous observation period (say, five spokes), while we let the activation of the generative network be the intermediate realization of the latent variables obtained by linear interpolation of the two latent endpoints. This approach allows us to impose and exploit temporal dependencies in the latent space.

In short, the action of our generative model is to map the manifold of latent variables onto a manifold of dynamic images. Importantly, our approach is purely unsupervised; moreover, priors are imposed only indirectly, arising from the mere structure of a convolutional network. We demonstrate the performance of our generative network by comparing its reconstructions to those obtained from CS algorithms [19], [20].

B. Related Works

Deep image priors have been introduced in [27]. They capitalize on the structure of convolutional artificial neural networks to adjust priors to the data, thus avoiding the limitations and pitfalls of hand-crafted priors. They have been deployed in [28] to build an unsupervised learning scheme for accelerated MRI; but, contrarily to ours, the task addressed therein is static. Other researchers have used deep image priors to reconstruct positron emission tomography images, albeit again in a non-dynamic fashion [29].

Let R be the Radon transform of the complex-valued continuously defined image  x : R2 → C, so that

image

where r and  ϑare the spatial and angular Radon arguments, respectively, and where  uϑis a unit vector in the  ϑdirection. Moreover, let F denote the one-dimensional continuous Fourier transform that follows the convention

image

at spatial pulsation  ω, otherwise known as a coordinate in k-space. Then, our conceptual model of non-Cartesian MRI is the concatenated linear transform

image

which maps a two-dimensional static image onto its measurements at continuously defined direction  ϑand pulsation ω. Mathematically,  Hϑ{x}(ω)is invertible because so are F and R. Consequently, provided we know  Hϑ{x}(ω)for every direction  ϑand every pulsation  ω, we can in principle recover the value of  x(ξ)at every spatial argument  ξ.

A. Static Discretization

Unfortunately,  Hϑ{x}(ω)can be known in practice only at finitely many discrete directions and pulsations. This discretization, which amounts to modeling MRI by a NuFFT, makes the discrete transform an unfaithful surrogate of our conceptual model, in particular because of aliasing concerns, and also because the discrete version is no more invertible.

Formally, let  x ∈ CNbe a vectorized version of the samples of x seen as an image of finite size  (N1 × N2), with N = N1 N2. Likewise, let  y ∈ CMbe a vectorized version of the samples of the sinogram  Hϑ{x}(ω), with measurements of finite size  (Mϑ × Mω)taken over  Mϑorientations and  Mωelements of the k-space, with  M = Mϑ Mω. Then, by linearity of the transformation we write that

image

where G is an M-rows by N-columns matrix that combines discrete Fourier and discrete Radon transforms.

B. Spoke Sharing

In dynamic MRI, it is acknowledged that the image x changes continuously through time. We assume however that the measurements of a spoke (a radial line in k-space, as suggested in Figure 1) are instantaneous and indexed at discrete times  t0 ∈ Z ∆t, taken regularly at temporal interval ∆t. The spoke orientations follow the golden-angle strategy

image

where  ϑtgives the orientation of a spoke at continuous time  t ∈ R, with  ω0its angular velocity. The golden-angle specificity is the irrationality condition  (ω0 ∆t/π) /∈ Q, which is approximated by setting  (ω0 ∆t) ≈ 111.25◦[19]. We finally denote an image frame at time t as  xtand its spatially discretized version of length N as  xt. Its associated sinogram is  st. As it turns out, however, it is only natural to set  Mϑ = 1for the discretization of  stbecause of our assumption of ideal temporal sampling. Then,  stis a direct representation of a spoke and has length  M = Mω. Its dependence on  xtis encoded in the time-dependent  Mω-rows by N-columns system matrix  Gt, with  st = Gt xt.

In accelerated dynamic MRI, one acquires  st0for  t0 ∈ Z ∆tand wants to reconstruct  xt0or, possibly,  xtfor  t ∈ R. Clearly, however, a single orientation does not provide sufficient data for the the recovery of the two-dimensional  xt0. To overcome this issue, we assume next that the changes are slow over some small n, so that  xt ≈ xt0for all t in the half-open interval  Tt0 = [t0 −n ∆t/2, t0 +n ∆t/2). In practice, the odd n ∈ 2 N + 1corresponds to the number of radial lines used for reconstruction and is related to the temporal resolution of dynamic imaging. We then collect n neighboring spokes2 and

image

Fig. 1. Nonuniform fast Fourier transform with golden-angle scheme and spoke sharing.

concatenate them in the vector  yt0 = (sm ∆t)m∈Z∩(Tt0/∆t)of length  (n Mω). Through this mechanism, there are spokes  st0that are shared between, say,  yt0and  yt0+∆t.

The dependence of  yt0on  xt0is encoded now in the time-dependent  (n Mω)-rows by N-columns matrix  Ht0 =(Gm ∆t)m∈Z∩(Tt0/∆t), so that (4) becomes time-dependent by writing that

image

Because of the irrationality condition of the golden-angle approach, no direction will ever be measured twice and  Ht0acquires effective time dependence.

C. Regularization

Even with n > 1, it is observed that  (n Mω) ≪ N, which makes severely ill-posed the recovery of  xt0given  yt0. To truly resolve this issue, practitioners often choose to regularize the problem over some extended temporal range. From a notational perspective, K vectors  yt0are concatenated over a large duration  (K ∆t)to build  Y = (yk ∆t)k∈[0...K−1]. Likewise, we write that  X = (xk ∆t)k∈[0...K−1]and H = [Hk ∆t]k∈[0...K−1]. The length of Y and X are  (K n Mω)and (K N), respectively. The size of H ensues.

In the context of CS dynamic imaging, the traditional regularization of the forward model (6) is established as a search for the solution

image

where D is a sparsifying transform along the temporal domain. Typically, this transform is a finite-difference operator, used as surrogate of the conceptual first-order derivative D. The corresponding regularization term encourages temporal dependency and counterbalances the ill-posedness of (6).

By contrast, there exists no explicit regularizer in the context of dynamic MRI reconstruction by traditional neural networks [7], [30]. In return, image priors are data-driven, imposed by the supervised learning of ground-truth pairs (Y, X). Letting  fφ : CM → CNrepresent the function that the network implements, where  φgives network parameters such as weights and biases, learning will return the solution

image

There,  HHis the Hermitian transpose of H. With some abuse of notation, the learning process is represented by the expectation operator E. The set {(Y, X)} provides the learning data. After learning has converged to some  φ∗, reconstructed images are obtained as  fφ∗(HH Y).

D. Deep Image Prior with Interpolated Latent Variables

Supervised learning cannot be performed in the absence of ground truth; unsupervised learning methods must be used instead. To that effect, generative networks associated with deep image priors have been proposed in [31], while a cost function that is appropriate to unsupervised learning has been developed in [27]. In this paper, we propose an extension whereby we address the needs of dynamic MRI by performing temporal interpolation of the latent variables. The sketch of our approach is provided in Figure 2.

1) Inpainting and Denoising: As introduction, we address first the static applications of inpainting and denoising. There, the purpose of deep-image-prior algorithms is to reconstruct a clean signal  x∗ = fθ∗(z)given the perturbed measurement x. The generative network with optimal parameter

image

minimizes a data-fidelity term characterized by the forward model A, which is assumed to be known. The Z-dimensional latent variable  z ∈ RZis maintained at a fixed value during the whole optimization process. Often, the minimizer  θ∗is calculated using some stochastic gradient-descent method with random initial parameter [32]. In summary, deep-image-prior methods achieve strong priors from the structure of a generator network and capture advanced image statistics in a purely unsupervised fashion.

2) Accelerated Dynamic MRI: Our main contribution in this paper is to use interpolated latent variables as inputs

image

Fig. 2. Flowchart of our framework. Conv.: convolutional layers, BN: batch normalization. The details are described in Table I.

image

Fig. 3. Interpolated latent variables.

of a generative neural network. We start by obtaining two realizations of random discrete images; these realizations are kept for the entire duration of the procedure. The first realization  z0takes the role of the latent variable associated to the beginning of the dynamic MRI sequence. The second realization is denoted by  z(K−1) ∆tand is associated to the end of the sequence. Intermediate realizations are built like shown in Figure 3 and are also used as latent variables.

Choosing the dimension Z of the latent variables to be small, we conjecture that the linearly interpolated latent variables span a low-dimensional, smooth manifold. In other words, we interpret our proposal as a way to impose data-driven temporal dependency in the latent space. A generative convolutional neural network will then transfer this low-dimensional manifold to some corresponding low-dimensional manifold in the MRI space, the central tenet of this work being that the frames from dynamic imaging do span a low-dimensional manifold, too.

For a single coil, our deep prior minimizes an Euclidean loss and results in the solution

image

where  gφ : CZ → CNrepresents the deep generative network

image

of parameter  φ. For C coils in pMRI, we establish the solution

image

where  Ccgives the sensitivity map of the cth coil,  ⊙is a pixel-wise multiplication operator in the spatial domain which relates true magnetization image to coil sensitivities, and  yc,tconcatenates n instantaneous acquisitions of spokes for the cth coil. Once an optimal  φ∗has been found in either (10) or (11), we can produce the final estimate  x∗tfor all values of t, including for  t /∈ Z ∆tif desired, as

image

E. Architectures, Datasets, and Training

1) Architectures: We design our generative network as shown in Table I. The generative network consists of convolutional layers, batch normalization layers, ReLU, and nearestneighbor interpolations. We apply zero-padding before convolution to let the size of the output mirror that of the input. At the last layer, ReLU is not used. The input of the network is a small-size random variable generated from the uniform distribution  U ∼ (0, 0.1), as explained in Section II-D2. The output has two channels because MRI images take complex values.

TABLE I ARCHITECTURE OF OUR GENERATIVE CONVOLUTIONAL NETWORK. CONV.: CONVOLUTION; BN: BATCH NORMALIZATION; NN INTERP.: NEAREST-NEIGHBOR INTERPOLATION.

image

2) Datasets: All experimental datasets are breath-hold. We use golden-angle radial sparse parallel (GRASP) MRI as a common baseline [19]. Spoke-sharing is not applied for GRASP. We assume two-fold upsampling of measurements for every dataset. Therefore, the size of the reconstructed fields of view is half that of the first dimension of the measurements.

Retrospective Simulation A cardiac cine data set was acquired using a 3T whole-body MRI scanner (Siemens; Tim Trio) equipped with a 32-element cardiac coil array. The acquisition sequence was bSSFP and prospective cardiac gating was used. The imaging parame- ters were as follows: FOV=300 × 300mm2, acquisition matrix size=128 × 128, TE/TR=1.37/2.7ms, receiver bandwidth=1184 Hz/pixel, and flip angle=40◦. The number of cardiac phases was 23 and the temporal resolution was 43.2 ms. We then used NuFFT to implement the forward model in a golden-angle context, resulting in fully sampled Cartesian trajectories. Then, sinograms were obtained as shown in Figure 1. The number of spokes per phase is 13. The dimension of sinograms is (K×Mω×C) = 23·13×256×32. In this simulation, the cardiac motion is discrete; thus, no spoke-sharing strategy is applied.

Golden-angle Reconstruction of Fetal Cardiac Motion Fetal cardiac MRI data were acquired on a 1.5 T clinical MR scanner (MAGNETOM Aera, Siemens AG, Healthcare Sector, Erlangen, Germany) with an 18-channel body array coil and a 32-channel spine coil for signal reception. Images were acquired with an untriggered continuous 2D bSSFP sequence that was modified to acquire radial readouts with a golden-angle trajectory [25]. The acquisition parameters were: FOV = 260 ×260 mm2, acquisition matrix size = 256 × 256 pixels, slice thickness = 4.0 mm,TE/TR = 1.99/4.1 ms, RF excitation angle = 70deg, radial readouts = 1600, acquisition time = 6.7 s and bandwidth

image

3) Reconstruction Experiments: We use an Intel i7-7820X (3.60GHz) CPU and an Nvidia Titan X (Pascal) GPU. Pytorch 1.0.0 on Python 3.6 is used to implement our generative model3. All experiments are performed in single-batch mode. The input size is  (8 × 8). The cost function used to train our generative network is (11). The learning rate is  10−3, with [32] as optimizer.

Retrospective Simulation The number of iterations is 10,000. The scheduling for the learning rate is 0.5 multiplier per 2,000 iterations. We set  U ∼ (0, 0.1).

Golden-angle Reconstruction of Fetal Cardiac Motion The number of iterations is 20,000. No scheduling for the learning rate is applied. We used 14 cycles of latent variables. We set  U ∼ (0, 10)to obtain the results shown in Figure 6. We use the regressed signal-to-noise ratio (RSNR) as a quantitative metric in our retrospective simulations. With x the oracle and  x∗the reconstructed image, RSNR is given by

image

where a higher RSNR corresponds to a better reconstruction. For real datasets, a CS approach with self-gated signals takes the role of a state-of-the-art baseline [24], [25].

A. Retrospective Simulation

In this experiment, the acquisition process is simulated, which allows us to build the ground truth from a fully sampled k-space. We use n = 13 spokes for the reconstruction and present the results in Figure 4. There, the bandpass method

image

Fig. 4. Retrospective reconstructions of cine dynamic MRI, with n = 13. Top rows (A), from left to right: ground truth; field of view of reconstructed frames from BP, GRASP, and ours. Bottom row (B), from left to right: ground-truth with a white line indicating the (y-t) location of cross sections; cross sections from BP, GRASP, and ours.

(BP) corresponds to a zero-filled DFT while GRASP is the baseline against which we compare the performance of our method. We see in Figure 4 (A) that GRASP leads to blurring artifacts, while the residual map discloses the occurrence of errors around the wall of the heart. By contrast, our proposed method gives better results. This is confirmed in Figure 4 (B), where the cardiac motions are captured better by our model than by GRASP. The systolic phase of our reconstruction is well described and very close to the ground-truth, whereas the systolic phase captured by GRASP is too flat.

B. Golden-Angle Reconstruction of Fetal Cardiac Motion

In this experiment, the acquisition process is real, so that no ground truth is available. We use n = 5 spokes for most reconstructions (n = 15 for GRASP) and present the results in Figure 5. The synthesis of the sequence of latent variables is explained in Section IV-A. The overlapped method (OV) corresponds to frames generated from all spokes, while the reordered method (RD) is able to reconstruct multiple cardiac phases by reordering k-t sparse SENSE by self-gated signals [24], [25]. Self-gating, driven by correlation coefficients between approximately reconstructed images, is necessary because it is impractical to capture the electrocardiogram signals of a fetal heart. This ultimately prevents the reordering of golden angles in accordance with their phase.

In the absence of ground truth, we shall take OV to be the reference image for navigation purposes. However, because OV considers all spokes simultaneously, it is a static image that is of high quality only in regions that are not moving. We see in Figure 5 (A) that BP and GRASP are noisier than RD and our method. Comparing now RD to our approach, we see that ours produces better-resolved features, particularly for the hyper-intense dot-like structures.

In Figure 5 (B), it becomes apparent that BP fails altogether to capture the fetal cardiac beats, while GRASP is less noisy but still mostly fails to reconstruct motions. RD fares better; unfortunately, its reordering process may lead it to superpose in the same frame spokes that may belong indeed to different cardiac cycles. By contrast, our method4 reconstructs each frame with data from just a few neighboring spokes, thus avoiding the mingling of different cycles. Consequently, we expect our reconstructed systolic phase to capture well the true motion of the heart. The cross section from our method is similar to that of RD but the motion is smoother in our case, which is the expected behavior of a beating heart.

We provide in Figure 6 (A) our complete reconstructed sequence of cardiac cycles, in the form of a (y-t) cross section. The quasi-periodicity of the cardiac motion is clearly visible along the temporal axis, while motion variations can be discerned from cycle to cycle. In Figure 6 (B), we also explore over the image and latent domains the structure of the manifold of the cardiac motion. The visualization proceeds through a (t-SNE) embedding [33]. We observe that the continuous trajectory for the image manifold is well aligned with the temporal index, while the input latent variables lie on a smooth manifold in the latent space. The latent variables that correspond to Figure 6 (B) have been cut in fourteen chunks; the reason for this is explained in Section IV-A. The importance of smoothness in the latent variables is described in Section IV-B.

image

Fig. 5. Dynamic reconstructions of a fetal heart for one beating cycle. Top rows (A), from left to right: field of view from OV, BP, GRASP, RD, and ours. Bottom row (B), from left to right: OV with a white line indicating the (y-t) location of cross sections; cross sections from BP, GRASP, RD, and ours.

A. Latent Encoding for Acyclic Data

Letting K in Section II-C be such that truly all data in Figure 6 (A) were taken jointly, and interpolating the latent variables between the only two endpoint realizations  z0and z(K−1) ∆t, we observed that the reconstruction  x∗tof the fetal cardiac motion took a constant, time-independent value. We surmise that this failure is due to the overly strong presence of non-periodic components in the data. To cope with them, we adapted our scheme slightly and proceeded by piecewise interpolation, the pieces being made of temporal chunks in the latent space. More precisely, we generated fourteen realizations  {z(τ)}τ∈[0...13]of the latent variables, equi-spaced in time; then, instead of building  ztas a linear combination of  z0 = z(0)and  z(K−1) ∆t = z(13), we built  ztas a linear combination of  z(τ)and  z(τ+1), with an appropriate  τthat depends on t. Note that, while the latent variables evolve now chunk-wise, the network is still time-independent and trained over all data jointly. The chunk boundaries are made visible in Figure 6 (B).

B. Smoothness in the Manifold of Latent Variables

To explore the importance of smoothness in the manifold of latent variables, we have trained separately two generative networks, each with a different configuration for the latent variables. On one hand, we have considered independent realizations of random latent variables at each frame, which resulted in an uncorrelated, non-smooth latent spatio-temporal manifold. On the other hand, we have followed the interpolation procedure of Figure 3. After examination of the outcome shown in Figure 7, we conclude that the non-smooth manifold fails to reconstruct dynamic images, while our proposed approach succeeds.

C. Size of Latent Variables

Until now, we have fixed the size of the latent variables as (8 × 8), which corresponds to Z = 64. In this section instead, we let Z vary and report the resulting RSNR in Table II, where we observe an acceptable quality of reconstruction for latent variables of size ranging between  (2 × 2)and  (16 × 16). For larger sizes, the reconstruction are corrupted by significant artifacts. We expected the smallest,  (1 × 1)size to reflect the fact that time is a one-dimensional variable. Yet, the learning process failed to converge to a desirable solution in this case, the corresponding sequence of reconstructed images taking the constant, time-independent value shown in the last row of Figure 8. In conclusion, this experiment suggests that the (8 × 8)size provides a good tradeoff between the dimension of the manifold spanned by the latent variables and the convergence issues inherent with any optimization procedure.

D. Variations on Latent Variables

In this section, we explore various scenarii beyond linear interpolation, which we summarize in Figure 8.

1) Temporal Interpolation In the first scenario, we consider  t /∈ Z ∆t, which corresponds to a fine interpolation of the intermediate latent variables. This gives us access to temporally interpolated images at the output of our generative network.

2) Temporal Extrapolation In the second scenario, we extrapolate the latent variables outside of the temporal range that was available for learning. This gives distorted images.

3) New Latent Variables In the third scenario, we make use of random realizations of latent variables that differ from those used while learning. This gives severely perturbed images.

image

Fig. 6. Top (A): Series of (y-t) cross sections of our reconstruction from the region of interest in Figure 5 (B). Bottom (B): t-SNE embedding from image frames (left) and latent variables (right). The temporal index is color-coded.

TABLE II REGRESSED SNR IN TERMS OF THE SIZE OF THE LATENT VARIABLES.

image

Fig. 7. Importance of smoothness in a latent manifold.

image

Fig. 8. Latent-variable scenarii. Here,  T = (k − 1)∆t.

4) Perturbations In the fourth scenario, we perturb the latent variables by adding uniform noise whose energy amounts to 10 %. This setting outputs clean images.

5) Scalar Latent Variables In the fifth and last scenario, we deploy scalar latent variables. This results in a time-independent, non-moving sequence of images.

E. Memory Savings

In a CS context, the gradient updates of the iterative optimization process would need one to allocate enough memory to hold a whole target reconstruction volume. For example, the reconstruction of 5,000 frames with spatial size  (256 × 256)would need one to handle data of size  (256 × 256 × 5000), which demands for over a gigabyte of memory.

By contrast, our approach is much less memory-hungry. It optimizes the generative network using batches, which requires the simultaneous handling of only those frames that correspond to the batch size. In short, the fact that our proposed approach handles few 2D images whereas CS handles a 2D+t extended sequence leads to substantial savings, particularly for golden-angle dynamic MRI with many frames. In our approach, we only save a generative model; for example, its memory demands for the spatial size  (256 × 256)are about half-a-dozen megabytes. This cost is negligible compared to that of the CS approach.

F. Benefits of Our Approach

Several competing methods aim at synthesizing a single cardiac cycle out of spokes collected over several cycles at various cardiac phases. There, a synchronized ECG could in principle allow one to associate a phase to each spoke; however, the deployment of this ancillary ECG would make the practical apparatus more involved, which is often undesirable. Furthermore, there are applications such as fetal cardiology where no ECG can be deployed at all. In traditional ECG-free approaches instead, one deploys self-gating methods for the purpose of phase assignment. They proceed on the basis of either human inspection or heuristic decisions, which makes them arduous, non-reproducible, and prone to errors. (Sometimes, the assignment is no more advanced than a simple manual sorting.) One specific additional difficulty that self-gating methods must deal with originates with the necessary Cartesian-to-polar conversion inherent in radial sampling trajectories, which ultimately results in streaking artifacts that tend to confound phase assignments, particularly those based on visual assessments in the spatial domain.

By contrast, in our proposed approach we relax the hypothesis of ideal periodicity. As presented in Section IV-A, this allows us to take into consideration cycle-to-cycle variations, thus providing access to clinical insights that are not available with traditional accelerated methods. Moreover, our ECGfree approach has the major benefit that no phase assignment is needed, thus providing access to a continuously defined time variable. Being fully automated, our approach is also reproducible. Another advantage is that the streaking artifacts associated to radial trajectories play no role since the reference data used for training in (10) or (11) are the spokes themselves, as opposed to reconstructed images. Finally, when compared to CS, our approach achieves better reconstruction, with a gain of 0.69 dB, as seen in Figure 4. It also leads to a simpler optimization task with fewer hyper-parameters. For instance, k-t SENSE requires three interdependent hyper-parameters whose optimal value is found only after some substantial gridsearch effort, while the two hyper-parameters of our approach are easier to interpret since they trivially consist of just an initial learning rate, along with a number of iterations.

G. Limitations and Future Work

One major limitation of our proposed approach is its computational complexity. At times, no fewer than 10,000 iterations are required before convergence is observed. This imposes a large computational burden since every iteration involves both forward and adjoint operations. From a technological perspective, we implemented the forward model in Python 3.6, with Pytorch (v1.0.0) as main library. Unfortunately, NuFFT is currently optimized neither for Python nor for GPU usage, which lead to a marked slowdown of our implementation. For instance, our method spent a whole day letting one GPU (TITAN X) process the cardiac dataset presented in Section II-E2, whereas GRASP terminated within ten minutes. This bottleneck will be improved by a better integration of the NuFFT library.

In our future work, we want to explore the dynamics of the latent variables. In this paper indeed, we simply took advantage of linear interpolation to build trivial intermediate states of the latent variables; however, refined approaches may better capture the temporal correlation between frames.

We propose to combine unsupervised learning with a generative network model. There, the latent variables are interpolated through time before being input to our generative network. This framework fits well the learning of spatio-temporal manifolds that are smooth temporally; it is purely unsupervised. It is also particularly appropriate in the context of inverse problems where no ground-truth is available. In practice, it results in significant memory savings when compared to compressed-sensing (CS) approaches. Our study shows that our proposed method has the potential to reconstruct dynamic magnetic resonance images (MRI) in the absence of an electrocardiogram signal. Quality-wise, it typically outperforms state-of-the-art CS approaches. To the best of our knowledge, ours is the first approach of unsupervised learning in accelerated dynamic MRI.

The authors would like to thank Prof. Jong Chul Ye at KAIST for providing the bSSFP cardiac MRI k-space data set (retrospective dataset).

[1] M. Griswold, P. Jakob, R. Heidemann, M. Nittka, V. Jellus, J. Wang, B. Kiefer, and A. Haase, “Generalized autocalibrating partially parallel acquisitions (GRAPPA),” Magn. Res. in Med., vol. 47, no. 6, pp. 1202– 1210, June 2002.

[2] K. Pruessmann, M. Weiger, M. Scheidegger, and P. Boesigner, “SENSE: Sensitivity encoding for fast MRI,” Magn. Res. in Med., vol. 42, no. 5, pp. 952–962, November 1999.

[3] M. Lustig, D. Donoho, and J. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magn. Res. in Med., vol. 58, no. 6, pp. 1182–1195, December 2007.

[4] J. Fessler, “Model-based image reconstruction for MRI,” IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 81–89, July 2010.

[5] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 27 2015.

[6] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, June 2017.

[7] J. Schlemper, J. Caballero, J. Hajnal, A. Price, and D. Rueckert, “A deep cascade of convolutional neural networks for dynamic MR image reconstruction,” IEEE Trans. on Med. Imag., vol. 37, no. 2, pp. 491–503, February 2018.

[8] A. Hauptmann, S. Arridge, F. Lucka, V. Muthurangu, and J. Steeden, “Real-time cardiovascular MR with spatio-temporal artifact suppression using deep learning—Proof of concept in congenital heart disease,” Magn. Res. in Med., vol. 81, no. 2, pp. 1143–1156, February 2019.

[9] S. Biswas, H. Aggarwal, and M. Jacob, “Dynamic MRI using model- based deep learning and SToRM priors: MoDL-SToRM,” Magn. Res. in Med., vol. 82, no. 1, pp. 485–494, July 2019.

[10] K. Hammernik, T. Klatzer, E. Kobler, M. Recht, D. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated MRI data,” Magn. Res. in Med., vol. 79, no. 6, pp. 3055– 3071, June 2018.

[11] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-Net for compres- sive sensing MRI,” in Proceedings of the Conference on Advances in Neural Information Processing Systems 29 (NeurIPS), Barcelona, Spain, December 5-10 2016, pp. 10–18.

[12] S. Wang, Z. Su, L. Ying, X. Peng, S. Zhu, F. Liang, D. Feng, and D. Liang, “Accelerating magnetic resonance imaging via deep learning,” in Proceedings of the Thirteenth IEEE International Symp. on Biomed. Imag.: From Nano to Macro (ISBI, Prague, Czech Republic, April 13-16 2016, pp. 514–517.

[13] Y. Han, J. Yoo, H. Kim, H. Shin, K. Sung, and J. Ye, “Deep learning with domain adaptation for accelerated projection-reconstruction MR,” Magn. Res. in Med., vol. 80, no. 3, pp. 1189–1205, September 2018.

[14] M. Mardani, E. Gong, J. Cheng, S. Vasanawala, G. Zaharchuk, L. Xing, and J. Pauly, “Deep generative adversarial neural networks for compressive sensing MRI,” IEEE Trans. on Med. Imag., vol. 38, no. 1, pp. 167–179, January 2019.

[15] K. Jin, M. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. on Imag. Proc., vol. 26, no. 9, pp. 4509–4522, September 2017.

[16] K. C. Tezcan, C. F. Baumgartner, R. Luechinger, K. P. Pruessmann, and E. Konukoglu, “MR image reconstruction using deep density priors,” IEEE Trans. on Med. Imag., vol. 38, no. 7, pp. 1633–1642, July 2019.

[17] H. Jung, K. Sung, K. Nayak, E. Kim, and J. Ye, “k-t FOCUSS: A general compressed sensing framework for high resolution dynamic MRI,” Magn. Res. in Med., vol. 61, no. 1, pp. 103–116, January 2009.

[18] S. Lingala, Y. Hu, E. DiBella, and M. Jacob, “Accelerated dynamic MRI exploiting sparsity and low-rank structure: k-t SLR,” IEEE Trans. on Med. Imag., vol. 30, no. 5, pp. 1042–1054, 2011.

[19] L. Feng, R. Grimm, K. Block, H. Chandarana, S. Kim, J. Xu, L. Axel, D. Sodickson, and R. Otazo, “Golden-angle radial sparse parallel MRI: Combination of compressed sensing, parallel imaging, and golden-angle radial sampling for fast and flexible dynamic volumetric MRI,” Magn. Res. in Med., vol. 72, no. 3, pp. 707–717, September 2014.

[20] L. Feng, L. Axel, H. Chandarana, K. Block, D. Sodickson, and R. Otazo, “XD-GRASP: Golden-angle radial MRI with reconstruction of extra motion-state dimensions using compressed sensing,” Magn. Res. in Med., vol. 75, no. 2, pp. 775–788, February 2016.

[21] E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. on Info. Theo., vol. 52, no. 2, pp. 489–509, February 2006.

[22] S. Poddar and M. Jacob, “Dynamic MRI using smoothness regulariza- tion on manifolds (SToRM),” IEEE Transactions on Medical Imaging, vol. 35, no. 4, pp. 1106–1115, April 2015.

[23] U. Nakarmi, W. Y., J. Lyu, D. Liang, and L. Ying, “A kernel-based low- rank (KLR) model for low-dimensional manifold recovery in highly accelerated dynamic MRI,” IEEE Trans. on Med. Imag., vol. 36, no. 11, pp. 2297–2307, November 2017.

[24] J. Yerly, G. Ginami, G. Nordio, A. Coristine, S. Coppo, P. Monney, and M. Stuber, “Coronary endothelial function assessment using self-gated

cardiac cine MRI and k-t sparse SENSE,” Magn. Res. in Med., vol. 76, no. 5, pp. 1443–1454, November 2016.

[25] J. Chaptinel, J. Yerly, Y. Mivelaz, M. Prsa, L. Alamo, Y. Vial, G. Berchier, C. Rohner, F. Gudinchet, and M. Stuber, “Fetal cardiac cine magnetic resonance imaging in utero,” Scientific Reports, vol. 7, no. 15540, pp. 1–10, November 14 2017.

[26] J. Fessler and B. Sutton, “Nonuniform fast Fourier transforms using min-max interpolation,” IEEE Trans. on Sig. Proc., vol. 51, no. 2, pp. 560–574, February 2003.

[27] V. Lempitsky, A. Vedaldi, and D. Ulyanov, “Deep image prior,” in Proceedings of the IEEE Computer Society Conference on Comp. Vis. and Patt. Rec. (CVPR), Salt Lake City UT, USA, July 18-23 2018, pp. 9446–9454.

[28] A. Yazdanpanah, O. Afacan, and S. Warfield, “Non-learning based deep parallel MRI reconstruction (NLDpMRI),” in Proceedings of the SPIE Conference on Medical Imaging: Image Processing, vol. 10949. San Diego CA, USA: International Society for Optics and Photonics, February 16-21 2019, pp. 1 094 904–1 094 910.

[29] K. Gong, C. Catana, J. Qi, and Q. Li, “PET image reconstruction using deep image prior,” IEEE Trans. on Med. Imag., vol. 38, no. 7, pp. 1655– 1665, July 2019.

[30] C. Qin, J. Schlemper, J. Caballero, A. Price, J. Hajnal, and D. Rueckert, “Convolutional recurrent neural networks for dynamic MR image reconstruction,” IEEE Trans. on Med. Imag., vol. 38, no. 1, pp. 280–290, January 2019.

[31] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” in International Conference on Learning Representations (ICLR), San Diego CA, USA, May 7-9 2015.

[32] D. Kingma and J. Ba, “ADAM: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), San Diego CA, USA, May 7-9 2015.

[33] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, November 2008.


Designed for Accessibility and to further Open Science