b

DiscoverSearch
About
My stuff
Deformable Groupwise Image Registration using Low-Rank and Sparse Decomposition
2020·arXiv
Abstract
Abstract

Low-rank and sparse decompositions and robust PCA (RPCA) are highly successful techniques in image processing and have recently found use in groupwise image registration. In this paper, we investigate the drawbacks of the most common RPCA-dissimilarity metric in image registration and derive an improved version. In particular, this new metric models low-rank requirements through explicit constraints instead of penalties and thus avoids the pitfalls of the established metric. Equipped with total variation regularization, we present a theoretically justified multilevel scheme based on first-order primal-dual optimization to solve the resulting non-parametric registration problem. As confirmed by numerical experiments, our metric especially lends itself to data involving recurring changes in object appearance and potential sparse perturbations. We numerically compare its peformance to a number of related approaches.

image

1.1 Groupwise Image Registration

The problem of aligning one image with another image of the same object is a well-studied problem in image processing and variational methods have proven successful for the task [23, 32]. However, many application scenarios involve data comprised of more than two images, as in the case of image data gathered over time, which necessitates groupwise methods. Naive pairwise techniques, that select one image from the group as a fixed reference and register all other images to the reference have been shown to be inconsistent with respect to registration accuracy (depending on the choice of the reference) and are generally deemed inferior to groupwise methods [22, 18]. These allow all images of the group to be deformed simultaneously and therefore operate on an implicit reference.

A crucial step in solving any image registration problem is the selection of a suitable dissimilarity metric on pairs or groups of images. In the past, both generalizations of established dissimilarity metrics for the classic two image problem and new concepts have been proposed to measure the distance between a group of N > 2 images. Examples for the former case include the variance-measure found in [2, 22] that extends the well-known sum of squared distances, different generalizations of the mutual information from [29, 18] and a multi-image version of the normalized gradient fields-measure in [6].

One example of a newly developed metric that is also related to the metric proposed in this work is  DPCA2from [18]. Given N images  T1, . . . , TN ∈ Rm×n, this measure operates on the so-called Casorati matrix

image

where vec(·) denotes a column-major vectorization. In  DPCA2, one proceeds to penalize a weighted sum of the (nonnegative) eigenvalues  λiof the correlation matrix

image

¯M is the repeated columnwise mean of  MT1,...,TNand Σ is diagonal with diagonal elements given by the standard deviations of the columns of  MT1,...,TN. To be exact, the metric is given by

image

As the number of nonzero eigenvalues of K is equal to the rank of  MT1,...,TN, minimizing  DPCA2promotes low-rankness of  MT1,...,TNand similarity between images is modeled as linear dependency. Note that apart from the Σ−1-weighting in (2), the eigenvalues  λicorrespond to variances along the principal components of  MT1,...,TN, which emphasizes the relation to the eponymous PCA. DPCA2will serve as a comparison method for our proposed metric in the experiments of section 6.

1.2 Robust PCA

As the classic PCA is known for its sensitivity towards sparsely distributed outliers, such methods are prone to fail for datasets involving partially unreliable data or strong changes in image intensity over time. To overcome this issue, different versions of a Robust PCA (RPCA) were proposed in the literature – see [14] for an extensive comparison. The most widely-used RPCA-variant is arguably the Principal Component Pursuit (PCP) from [9, 7]. PCP is derived as a convex relaxation of the combinatorial optimization problem

image

for given data  M ∈ Rp×q. The term  ||E||0denotes the number of non-zero entries of E. Replacing both summands of (4) with their convex hulls yields

image

which is convex in L and thus poses a more tractable optimization problem.  ||L||∗is the so-called nuclear norm, defined as the sum of all singular values of L (see [12]) and  ||M − L||1= �pi=1�qj=1 |Mi,j − Li,j|is a  ℓ1-type norm. Especially recall the relationship between the singular values  σiand the rank of a matrix: rank(A) = #{σi(A) > 0}(see again [12]). The decomposition of M generated by (5) is usually referred to as a low-rank and sparse decomposition, in which L is of low rank and  E = M − Lis sparse.

PCP has previously been used in the context of groupwise image registration by [27, 16, 15, 20]. Primarily tackled therein were datasets for which low-dimensional approximations using PCAbased techniques were not applicable due to occlusions, local changes in image intensity (for the case of DCE-MRI data) and irregular pathologies in medical image data. In all these publications, the data matrix M for (5) was constructed as a Casorati matrix (1). The authors of [15, 20] however only used low-rank and sparse decompositions as preprocessing steps and performed subsequent registrations on the generated low-rank components L with different algorithms. Contrary to that, [27, 16] both used the optimal value of (5) as a metric for the similarity of a set of given images  T1, . . . , TN.

Section 2 of this paper will present a deeper analysis of PCP as a distance measure. We argue that PCP has some inherent drawbacks: Perfect alignments of all  Tioften constitute local minimizers of PCP in only very narrow neighborhoods. At the same time, degenerated deformations result in comparatively lower energies. To overcome these issues, we present in this work a modification of PCP that is still convex and therefore easy to optimize.

1.3 Proposed Approach

Precisely, we propose to use the following groupwise dissimilarity measure:

image

Here  MT1,...,TNis again the Casorati matrix (1), ¯L is the repeated columnwise mean of L and ν ≥0 is a suitable threshold for the nuclear norm. The intuition behind (6) is to jointly measure the  ℓ1-distance between the input images and their optimal approximations in a low-dimensional linear subspace. Details are given in section 2. Our main contributions in this work involve the following:

A novel technique for low-rank and sparse decompositions that results in a more suitable distance metric for groupwise registration tasks than previous approaches.

A less restrictive uniqueness constraint than the one commonly employed in the literature.

A multi-level strategy with theoretically justified scaling that solves the registration model in an iterative process and that uses first-order primal-dual optimization techniques to solve the subproblems.

1.4 Other Related Work

Major differences between our approach and related methods for variational groupwise registration are as follows:

Besides the fact that the  DPCA2-measure from [18] is based on the classic PCA (whereas ours is based on RPCA), the authors suggest a parametric deformation model based on B-Splines. Instead, we employ a non-parametric model that is fully deformable and that is explicitly (and flexibly) regularized through a total variation penalty (see section 3). Regularization in [18] is handled implicitly through grid point spacing, and the same is true for all of [3, 2, 22, 18, 13, 29], as they use B-Spline deformations in the same manner. Concerning the two PCP-based registration approaches [27, 16], the former is even further restricted to affine deformations, while the latter operates on light-field data, for which a geometric relationship between input images is known a priori and is exploited in the registration process.

Another non-parametric approach is presented in [6]. While also based on rank minimization, the authors use normalized image gradients as feature vectors and define alignments locally (instead of image intensities as features and global alignments as in this article). A continuation of [6] is found in [5], which generalizes the former approach to different kinds of feature vectors and formulates alignments globally.

1.5 Outline and Contributions

The remainder of this article is organized as follows: In section 2, we analyze the established PCP-metric and derive our proposed approach as a replacement. In section 3, the total variation is discussed as a regularizer for our model and a new uniqueness constraint for groupwise image registration algorithms is introduced. In section 4, an in-depth account of the optimization strategy and its implementation is given, including a multilevel scheme with theoretically justified scaling. In the subsequent sections 5 and 6, we introduce the benchmark data and present a numerical comparison to related approaches. Section 7 gives concluding remarks.

image

Figure 1: Depicted are five subsequent frames  T1, . . . , T5from a MRI sequence that serve as input for the experiments in Fig. 2: In order to analyze the behavior of a given dissimilarity measure, its energy is determined while one image is kept fixed and the remaining images are warped uniformly in a prescribed manner. Due to their short temporal offset, the input images  T1, . . . , T5can be regarded as aligned

2.1 Classical Approach

The classical PCP image distance from [27, 16] is given by

image

with M as a Casorati matrix1. The parameter  µ >0 controls the weighting between the requirement on L to be of low rank and the requirement on E :=  M − Lto be sparse.

In order to assess the general applicability of (7) in the context of non-parametric groupwise registration, we conducted a number of experiments to examine the behavior of  DPCPunder certain predefined deformations of the images  T1, . . . , TN.

To this end, we define an image  T ∈ Rm×n as a function of a deformation u ∈ Rm×n×2(given over the same grid) through linear interpolation – see, e.g., [24, Chapter 3.3]. Using this convention, the experiments were performed by evaluating  DPCP(T1, T2(uj), . . . , TN(uj)) for the four cases of the deformation sequence (uj)j=−k,...,kdescribing a translation, a rotation, a scaling and a shearing.  T1therefore acted as a fixed reference, while  T2, . . . , TNwere warped uniformly by  uj.

A schematic depiction of each transformation sequence is given in the first row of Fig. 2. Test data was comprised of the N = 5 frames from a cardiac MRI sequence, that are displayed in Fig. 1. As suggested by [7, Theorem 1.1], the weighting parameter for (7) was chosen as  µ= (mn)−1/2.

The energy plots of  DPCPfor all four experiments are shown in the second row of Fig. 2. Additionally, their respective decompositions into the two summands  ||L∗||∗and  µ||M − L∗||1are displayed, where  L∗denotes the minimizer of (7) over the variable L.

The results show that  DPCPhas two major shortcomings that are unfavorable for the purposes of image registration. Firstly, the point  u0= 0 at which the N images are most appropriately aligned only marks a local minimizer of  DPCPin a very narrow neighborhood of  u0in the translation experiment. In the scaling experiment,  u0even constitutes a global maximizer. Secondly, both the left or the right endpoints of each energy plot, i.e., the most degenerated of all evaluated deformations  uj, represent global minimizers in every experiment except for the rotation. This is especially problematic in case of the translation, since constant translations are also not penalized by any regularizer, that is based on derivatives of deformations. As a result, we consider  DPCPunsuitable as a distance metric in the general case.

2.2 Proposed Modified Measure

Based on these observations, we propose a new distance metric that modifies  DPCPin two aspects. As a first step, the  ||·||∗-penalty term in (7) is replaced by a hard constraint to the set  {||·||∗ ≤ ν}for some suitable threshold  ν ≥0. As a second step, we propose not to constrain the nuclear norm of L itself, but to constrain that of the centered variable  L −¯L instead. To this end, let

image

Figure 2: Experiments on the distance measures  DPCP and Dδ-RPCA. The classical  DPCP energies (second row) exhibitonly very narrow minima at the position  u0 = 0 of perfect alignment. Even worse so, global minimizers for all experiments except the rotation are given by degenerated transformations. The proposed modification  Dδ-RPCA (third row) resolvesthese problems:  u0 = 0 constitutes a global minimizer across all experiments. Furthermore, degenerated deformations generally result in high energies and are therefore not favored by this metric

¯L := (�Ni=1liN)·11×N ∈ Rmn×Ndenote the matrix, in which every column is given by the average of the columns  liof L. The proposed dissimilarity measure, which we term  δ-RPCA, is then given by

image

for all q > 1. In terms of the registration model, this means that a smaller nuclear norm for the uncentered variable L can be achieved by shifting all deformable images out of the image domain – thereby replacing them with the boundary value of zero – than by aligning them inside the images domain.

The continuation of the above example shows that this situation, which is highly undesirable for the purpose of image registration, is reversed when dealing with centered variables. These are given by  A1 −¯A1 = a ·(1  − q−1, −q−1, . . . , −q−1) and  A2 −¯A2= 0 respectively and in fact, the

equivalent of relation (9) now reads

image

As a consequence, (8) does not favor shifting the deformable images out of the image domain and is therefore more suited for image registration.

Crucially,  Dδ-RPCAis still convex in the variable L, since  {|| · ||∗ ≤ ν}constitutes the level set of a convex function and is therefore convex [30, Theorem 4.6].

Repeating the above experiments for  Dδ-RPCAwith the choice of  ν= 0.9||M −¯M||∗for every set of deformed images in M, one obtains the energy plots in the third row of Fig. 2. In contrast to  DPCP, the modified metric  Dδ-RPCAshows global minimizers at the point  u0= 0 across all cases. Additionally, the global maximizer of each curve is found towards its left or right endpoint and consequently results from a degenerated deformation. In conclusion, the two main issues of DPCPas a distance function for registration tasks are hence resolved by  Dδ-RPCA.

Apart from the interpretation of  Dδ-RPCAas a modified version of  DPCP, it can also be interpreted in the sense described as follows. First consider the case  ν= 0. This implies L = ¯L, which in turn implies constant columns  l1 = . . . = lNof L. Using this, one can solve (8) analytically for L by recalling, that  ℓ1-distance minimization problems of the type

image

are solved by the median of (y1, . . . , yK) [4, p. 433]. As a consequence, the constant columns of L in the problem above are given by the pointwise median of  T1, . . . , TNand  Dδ-RPCArepresents the remaining  ℓ1-distance between the input images and that median.

In the case of  ν >0,  Dδ-RPCAcan now more generally be interpreted as the joint  ℓ1-distance between the images  T1, . . . , TNand their individual (optimal) approximations  l1, . . . , lNwith deviations from the mean ¯l := �Ni=1liNrestricted to a low-dimensional linear subspace.

Consequently, we deem (8) especially suited for image groups with inherent low-dimensional structure such as image sequences with strong or pronounced temporal repetition.

3.1 Total Variation Regularization

Total variation (TV) is a popular choice for regularizing motion fields in applications of both optical flow estimation and image registration due to its distinguishing feature of allowing discontinuities in the solution. TV therefore sets itself apart from other common regularizers such as diffusive, elastic or curvature energies that favor smooth transformations. Exemplary early applications of TV regularization for optical flow estimation can be found in [25, 35] and for image registration in [28, 34]. In the context of medical image processing, TV regularization is particularly interesting when modeling non-smooth sliding motions, since it eliminates the necessity to explicitly mask all sliding interfaces beforehand [10].

image

can be defined as

image

where  Dυdenotes the distributional (measure-valued) derivative of  υwith values in  Rd×d. In case of  υ ∈ C1(Ω, Rd), (12) is equivalent to

image

For all further details, we refer to [1].

In our registration model, we assume rectangular domains Ω  ⊂ R2and employ a standard discretization scheme with cell-centered grids of resolution  m × nand grid spacings of (h1, h2) ∈R2>0in the two coordinate directions. Optimization is performed over discrete displacement fields uk ∈ Rm×n×2, for which we use finite forward differences and Neumann boundary conditions to discretize (13). Following [34], we use the notation

image

Therein,  G ∈ R4mn×2mndenotes the finite difference operator with the aforementioned characteristics.

3.2 Uniqueness Constraint

As our model does not make use of an explicit reference image that all other images are aligned to, we need to employ an additional constraint on the displacements  u1, . . . , uNin order to ensure the uniqueness of a solution.

This can be seen from the simple example, in which  T1, . . . , TNdisplay uniform objects, e.g., white rectangles, before a black background. Consider the case of a perfect alignment T1(u1) =  . . . = TN(uN) of these rectangles inside the image domain Ω. If all deformations  ukare simultaneously offset by  t ∈ R2, such that the new deformations ˆukstill align  T1, . . . , TNinside the common domain2, then (ˆuk)k=1,...,Nconstitute a solution equal to (uk)k=1,...,Nboth in terms of  Dδ-RPCAand TVh.

For TVh, this is explained by (15) solely penalizing derivatives of deformation fields which are always invariant to translations.

The invariance for  Dδ-RPCAis due to the equivalence of an offset by t and a simple reordering of the pixels between  Tk(uk) and  Tk(ˆuk) (due to the zero boundary condition). Clearly, the  ℓ1-term in (8) is invariant to any reordering and the same is true for the nuclear norm constraint, since a consistent reordering of all  Tk(uk) results in a row permutation of the Casorati matrix M = [vec(T1(u1))| . . . |vec(TN(uN))]. As a short derivation shows, a row permutation does not affect the singular values of a matrix:

Let  A ∈ Rp×qbe an arbitrary matrix and let  P ∈ {0, 1}p×pbe a permutation. If a singular value decomposition (SVD) of A is given by  A = UΣV ⊤, then PA = (PU)ΣV ⊤constitutes a valid SVD of PA due to PU still being orthogonal, i.e.,

image

Thus, the singular values on the diagonal of Σ stay unaffected and so does the nuclear norm ||PA||∗ = ||A||∗.

In order to eliminate this remaining degree of freedom from the model, we impose an additional constraint on the deformations  u1, . . . , uN, enforcing the mean (or equivalently the sum) over all deformations and grid points to be zero in each coordinate direction:

image

Note that [22, 18, 13, 29] constrain their deformations in a related manner by demanding the mean of all deformations to be zero at every grid point as first introduced by [3]. The difference however is, that (17) only imposes one constraint per dimension instead of one constraint per grid point and dimension. As a result, (17) restricts the space of feasible solutions much less severely while still ensuring uniqueness.

In this section, we present an optimization scheme for our groupwise registration model that is strongly related to the work in [16]. First we combine all components derived in the previous sections into the complete registration model

image

in which  µ >0 controls the regularization strength.

4.1 Linearized Subproblems

In order to be able to apply convex optimization methods to (18), one needs to deal with the nonlinearity of the expressions  Tk(uk) that leads to a non-convexity of the model. As in [28, 35, 27, 16], an iterative linearization of the deformed images is used to overcome this issue. Note that while a one-time linear approximation would also be possible in theory, the strong locality of such an approximation becomes prohibiting when larger deformations are required to align the images.

In the following, we assume all variables to be in vector format (including the values of all Tk) and for brevity’s sake omit the explicit notation of reshaping operations like vec(·). The linearization of  Tkcan then be expressed as

image

for a suitable point ˜uk. This enables one to approximate the first term in (18) by

image

Using vectorized variables further allows one to rewrite the centering of L as a linear operation KL with

image

Since solving (18) through iterative (re-)linearization amounts to solving a series of subproblems, we propose to treat is as a process, in which the threshold  νis successively decreased to the threshold value for which the original problem (18) is meant to be solved.

Assuming a predefined number  niterof linearization steps and denoting the final threshold by ν, we therefore employ a series of thresholds  ν1 > ν2 > . . . > νniter = νfor the iterative solution of the separate subproblems. As a strategy to select these parameters, we propose to choose  νrelative to the nuclear norm of the centered input images and to progressively decrease  νtto that value by multiplication with a constant factor  α ∈(0, 1).

More specifically, let M denote the Casorati matrix of the input images and let ¯M denote the columnwise repetition of their mean. If one now wants to meet a final threshold of  ν = β||M −¯M||∗for some  β ∈(0, 1) and one employs a predefined number of  niterlinearization steps, the proposed strategy amounts to choosing

image

where  α = β(1/niter).

image

4.2 Solving the Convex Subproblem

Denoting all entries of  ukcorresponding to the c-th coordinate axis by  uk,c(uk•,•,cin the non- vectorized notation of (18)), the t-th subproblem now reads

image

We solve these subproblems using the primal-dual optimization algorithm 1 from [8] that is designed for finding saddle-points of problems of the type

image

H : Rp → (R ∪ {∞}=: ¯R), F ∗:  Rq →¯R are proper, lower-semicontinuous, convex functions and F ∗denotes the conjugate of another proper, lower-semicontinuous, convex function  F : Rq →¯R. A ∈ Rq×pfurther denotes a linear operator. As is well-known, (24) is equivalent to the primal

minimization problem

image

For all details, we refer to [31, Chapter 11].

In our case, we bring (23) into the form (25) by assigning the first three terms of (23) to F and the remaining uniqueness term to H. The primal variables for this problem are given by the union of all variables over which (23) is minimized,

image

and we define the linear operator A to be

image

where K and G are as in (21) and (15) respectively.

This allows for a separable definition of the function F by F(z) :=  F1(z1) +  F2(z2) +  F3(z3) and

image

where  z3,k:= (z34(k−1)mn+1, . . . , z34kmn)⊤and where the vector b = (b⊤1 , . . . , b⊤N)⊤gathers the constants  bk:=  Tk(˜uk) − ∇Tk(˜uk)⊤˜ukfor k = 1, . . . , N. As the remaining uniqueness term does not depend on  l1, . . . , lN, we define H(x) := ˜H((u1, . . . , uN)) and

image

In order to apply Alg. 1 to the problem, the proximal operators (id +η∂F ∗)−1and (id +τ∂H)−1

are required to compute the updates (26). We point out that the separable nature of F implies a decomposability of (id +η∂F ∗)−1into three terms corresponding to the three summands of F (or equivalently of  F ∗). These terms as well as the proximal operator of H are given by standard expressions, for which we refer to [26]. We however emphasize the point that the proximal step corresponding to the nuclear norm constraint

image

requires both an SVD y = U diag(σ)V ⊤, σ ∈ RN, of the input y (assumed to be  mn × N-shaped) and a projection Π{||·||1≤1}onto the  ℓ1-unit ball. While the latter step cannot be solved in a decoupled manner, there exist exact algorithms with time complexity O(N) to compute such projections – in our implementation we employ the approach from [11].

Finally, we shortly address the problem of determining the spectral norm  ||A||σthat the primal and dual step sizes  τ, ηfor Alg. 1 are based on. Since the linear operator A given by (28) contains the image gradients  ∇Tk(˜uk) and is therefore dependent on empirical data, an analytical solution for  ||A||σis unattainable. Instead, we apply a simple power iteration scheme to estimate this quantity [12, Section 7.3.1].

4.3 Multilevel Scheme and Parameter Scaling

image

Moreover, a consistent scaling of the thresholds  νtis required since the low-rank components change in resolution as well in between stages. To this end, it is useful to derive by what factor the nuclear norm of a matrix  M ∈ Rp×qscales when it is prolongated to the next higher resolution. Interpreting the columns of M as vectorized images and assuming p > q, we define the prolongation

image

of M as the separate prolongation of these image columns (as in Fig. 3). The operation can therefore be expressed as

image

where  P ∈ R4p×4pis a suitable permutation. As the singular values of a matrix are invariant under row-permutations (see (16)), one can however restrict the analysis of the singular values for (35) to the case P = I. Let an economic SVD of M now be given by  M = UΣV ⊤with  U ∈ Rp×qand Σ  ∈ Rp×p[12, Chapter 2.5]. Then it holds

image

Figure 3: Prolongation scheme. Variable values for the index (i, j) in the low-resolution coordinate system (left) are propagated to the variables indexed by (2i − 1, 2j − 1), (2i − 1, 2j), (2i, 2j − 1), (2i, 2j) in the high-resolution system

image

(36) however does not constitute a valid (economic) SVD of ˆM, since the columns of ˆU are no longer normalized: One has ( ˆU ⊤ ˆU)i,j= 4 for i = j and ( ˆU ⊤ ˆU)i,j= 0 for  i ̸= j. In order to regain a valid SVD, a factor of 2 has to be redistributed from ˆU to Σ, i.e.,

image

This implies || ˆM||∗= 2||M||∗, which in turn implies that the sought factor is given by 2. Also note that this result can easily be generalized to the case of d-dimensional images, where that factor is given 2(d/2).

The overall solution scheme is summarized by Alg. 2, where an image domain of Ω = [0, m] ×[0, n] is assumed for the input images  T1, . . . , TN ∈ Rm×n. Further assumed are a predefined number  nlevof resolution stages, predefined numbers  njiterof linearization steps per stage (for j = 1, . . . , nlev), as well as a final relative threshold parameter of  β= exp(ln(α) �nlevj=1 njiter) (see subsection 4.1).

5.1 Synthetic Dataset: Textured Ellipse

The purpose of the first synthetic dataset is to illustrate the capacity of our model to correct the motion of objects with recurring changes in texture, exposing the inherent low-dimensional structure of the dataset.

The image sequence is comprised of ten frames displaying a textured ellipse moving in a semicircular manner before a black background that further features a fixed white rectangle and a fixed white frame. The texture of the ellipse alternates between vertical stripes for all oddly indexed frames and horizontal stripes for all evenly indexed frames.

To quantify the accuracy of registration on this dataset, we equipped each frame with 17 landmarks at the same corresponding (analytically determined) positions. All frames were generated at a resolution of 200  ×200 pixels. As an example, four out of the ten input frames are displayed along with their landmarks in Fig. 4.

5.2 Real-world Dataset I: Cardiac MRI

Besides the challenge of motion correction in the presence of recurring changes in object appearance that the first synthetic dataset posed, the first real-world dataset comes with the additional difficulty of irregular disturbances to object appearance.

The sequence consists of cardiac MRI data in the so-called two-chamber view, where the left atrium and ventricle are on display. Seven repetitions of the heart cycle with blood flow in and out of the two chambers as well as breathing-induced motions of several structures like the thorax, the diaphragm, and the heart are shown.

For this dataset, changes in object appearance relate to different phases of the heart cycle as we selected one frame from each systole, one frame from each diastolic relaxation and one frame from each diastolic filling (making for a total of 21 input frames). Due to the turbulent nature of the

image

Figure 4: Exemplary frames from the textured ellipse-dataset with their respective landmarks. A perfect motion correction is expected to unify the ellipse positions, while keeping the white frames and rectangles stationary

image

Figure 5: Exemplary frames from the cardiac MRI dataset with their respective landmarks. While clear visual congruences between different images of the same phase exist, they are obscured by the irregularity of the blood flow and pose a particular challenge to distance measures that model similarity as linear dependence

image

Figure 6: Reference frames for selected sequences from [21]. The “Blinking Arrow”-sequence (left) is deemed challenging because of intensity changes resulting from a blinking traffic sign, the challenge in the “Flying Snow”-sequence (center) consists of heavy snowfall obstructing the view, and the “Shadow on Truck”-sequence (right) features rapidly changing shadow patterns that do not describe physical motion

blood flow, the visual appearances of these phases are somewhat irregular and pose an interesting test case for the low-rank/sparse decomposition generated by our model.

As with the textured ellipse-dataset, we equipped this sequence with 23 handselected landmarks per frame. Each individual image was resolved with 220×220 pixels. The respective input frames for the first and last heart cycle are displayed together with their respective landmarks in Fig. 5.

5.3 Real-world Dataset II: Challenging Data for Stereo and Optical Flow

We also evaluated our model on a variety of test sequences from the “Challenging Data for Stereo and Optical Flow”-dataset (CDSOF) [21]. This dataset features eleven sequences captured in real-world traffic situations that are deemed challenging for motion estimation algorithms due to diverse phenomena such as illumination changes from blinking signs, occlusions from snowflakes, and blurs from water spray.

We selected subsequences of 10 frames from the datasets entitled “Blinking Arrow”, “Flying Snow” and “Shadow on Truck” as test cases as they all feature different distortions that pose interesting challenges to the robustness of our model. In order to restrict the required computational effort, we downsampled all used frames to a resolution of 271  ×328 pixels.

Contrary to the other two datasets, all sequences from [21] come with a predefined reference frame, which is why we drop the uniqueness constraint from section 3.2 for these inputs. Instead, we enforce alignments with the reference through a constraint of the form  δ{0}(u(ref)), where u(ref)is the displacement field for the respective reference. The reference images for all selected sequences are shown in Fig. 6.

For the former two datasets from section 5, we compare our registration approach to the following two methods:

1. An approach based on the simple variance dissimilarity measure given by

image

which has previously been used by [2, 22]. We combine (38) with the same TV-regularization and the same uniqueness constraint as in our model (18).

2. A publicly available implementation of the  DPCA2-metric from [18] in the elastix software package [19]. This method uses a cubic B-spline transformation model and implicit regularization.

For each individual landmark, accuracy is measured in terms of mean Euclidean distance to the mean landmark position

image

yki ∈ R2therein denotes the position of the i-th landmark in the k-th image.For the CDSOF-datasets we compared our approach to the publicly available implementation of the “nonlocal” optical flow estimation method from [33], that was suggested as a referemce by the authors of [21]. [33] extends the classical Horn-Schunck model for optical flow estimation [17] by a number of techniques, e.g., an additional nonlocal term derived from median filtering. To give a meaningful comparison, we altered our algorithm to include the median filtering of flow fields in between linearization steps as well. Note that the reference method only operates in a pairwise fashion.

All experiments were performed on an Intel Core i7-8700 (6×3.20 GHz) system with 64 GB of memory, running Matlab R2019a under Ubuntu 18.04 (64-Bit). Our implementation is publicly available at

image

Computation times for the textured ellipse- and cardiac MRI-datasets are given in Tab. 1 and for the CDSOF-dataset in Tab. 2.

6.1 Textured Ellipse

Using the parameters  α= 0.9, µ= 0.2 as well as  nlev= 3 with  n1iter= 16 and  njiter= 2 for  j ≥2 in Alg. 2, we achieved the results displayed in Figures 7 and 9 for the textured ellipse-dataset. We shortly note that we observed values of  njiter ≥2 for  j ≥2 to be mandatory in order for Alg. 2 to generate useful linearization points on higher resolution levels. Furthermore, a small enough  α

image

Table 1: Computation times of competing methods

was crucial in finding low-rank components L that accurately describe the structural changes in texture – higher values of  αon the other hand allowed for unwanted motion artifacts in L.

For the variance method based on (38), we employed an adapted version of Alg. 2 with the regularization strength set to  µ= 0.1. The number of resolution levels as well as the iterations per level were kept the same as for our method.

In the elastix-based implementation of  DPCA2, we increased the settings recommended by the authors of [18] to three resolution stages (instead of the recommended two), 2.000 iterations per stage (recommended: 1.000) and 25.000 random coordinates per stage (recommended: 2.048) in order to ensure sufficient computational capacity for the method to produce accurate solutions.

Fig. 7 visualizes both the deformations calculated by our model and the warped images for all four exemplary frames from Fig. 4. Fig. 9 further analyzes the generated low-rank components in terms of singular values and (left) singular vectors of  L−¯L, i.e., the matrix whose nuclear norm is constrained by our model (18). As the dataset is of synthetic nature and therefore without intensity distortions, the sparse outlier components  E = M − Lare negligible and are not displayed.

A quantitative comparison in terms of landmark accuracy between our approach and the two competing methods is presented in Fig. 8. The comparison shows that our method significantly outperforms the other approaches on this dataset: While it corrects the ellipse positions most accurately out of the three methods, it is also the only one that does not introduce notable motion to the white rectangle and frame (which were already stationary in the input sequence).

In terms qualitative results, Fig. 9 shows that our method was moreover able to find a nearperfect embedding of the motion-compensated images in a low-dimensional subspace: The negligible magnitudes of the singular values  σ2, . . . , σ10of the centered low-rank components  L −¯L indicate that a two-dimensional basis consisting of the mean low-rank component ¯l and the singular vector  s1is largely sufficient to approximate the output images.

6.2 Cardiac MRI

For the cardiac MRI dataset, we only adjusted the regularization strength and threshold-scaling parameters of our method to  µ= 0.125 and  α= 0.95. In the variance registration method, we kept all parameters fixed except for  µ= 0.065 and since the  DPCA2model does not feature explicit regularization parameters, we left all the above settings unchanged for this method.

Results of our approach are presented in Figures 10, 12 and 13. Figures 10 and 13 show deformations, warped images and sparsity components for all example frames from Fig. 5. In Fig. 12, singular values and singular vectors of  L −¯L are analyzed in a presentation similar to Fig. 9. The quantitative evaluation in terms of landmark accuracy is visualized in Fig. 11.

Although results are generally more balanced than was the case for the synthetic data, our approach still outperforms the two competing methods on this real-world dataset in terms of landmark accuracy. We refer to Fig. 11, where our method not only achieves the highest accuracy for the majority of the landmarks, but where it is the only out of the three methods that did not introduce additional motion to any landmarks in the registration process: both competing approaches generate several landmarks with worse accuracy than in the unregistered input sequence.

In terms of the low-rank/sparse decomposition, we remark that our model was able to generate meaningful low-rank components alongside the actual motion-compensation: As seen from Fig. 12, the centered low-rank components matrix  L−¯L is dominated by three singular value/vector pairs in which the singular vectors  s1 − s3exhibit clear visual congruences with the three considered phases of the heart cycle. Congruences thereby consist of highlighted anatomical structures and physiological features such as the mitral valve and the direction of blood flow.

Furthermore, granting the method the ability to define sparse outlier components aided the generation of a meaningful low-rank approximation by filtering out highly irregular image feature such as the turbulences of blood flow (see Fig. 13).

image

Figure 7: Results of proposed approach for the selected frames from Fig. 4. Our model allows to correct the motion of the ellipse through piecewise constant deformations (top row), while automatically detecting and discarding repetitive structural noise (the horizontal and vertical bars) in the registration process. The motion-corrected images (bottom row) exhibit a good visual correspondence between matching landmarks. Quantitative results are given in Fig. 8, in which landmarks are indexed in the same order as in  T1

image

Figure 8: Comparison of landmark accuracy for the textured ellipse-dataset as measured by (39) (lower is better). Landmarks are ordered as in Fig. 7 with landmarks 5  −9 attributed to the moving ellipse and the remaining landmarks positioned around the stationary white rectangle and frame. Note that the latter do not appear in the input curve due to zero error and logarithmic axis scaling. Our method clearly outperforms the two competing approaches as it is able to correct the positions of the “ellipse-landmarks” most accurately without introducing artificial motion to the remaining landmarks

image

Figure 9: Singular values and vectors of centered low-rank components  L − ¯L. The singular value progression (left) over the inner iterations of Alg. 2 (scaling adjusted, see subsection 4.3) shows that the norm  ||L − ¯L||∗– the quantity constrained by our model – is dominated by the singular value  σ1towards the end of the iteration. This indicates that the columns of L can be reconstructed from their mean ¯l (second from left) and the dominating singular vector  s1of the centered low-rank components  L − ¯L (second from right)with only minor error. Intuitively,  s1is added to ¯l when reconstructing horizontal bars and subtracted from ¯l when reconstructing vertical bars. In comparison, the second singular vector  s2 (right)only describes negligible remainders of motion with little influence on  L− ¯L(as indicated by the final magnitude of  σ2in the left curve). To summarize, our method was able to automatically detect the low-dimensional texture variations present in the dataset

image

Figure 10: Results of proposed approach on the cardiac MRI-dataset. The difference images per phase between the first and seventh heart cycle before and after registration (left two columns) show a successful motion compensation: Motion in the thorax area as well as the front-facing area of the diaphragm and the heart apex is greatly reduced across all phases. The main intensity differences after registration are located inside the heart itself and are due to irregular turbulences of the blood flow. The two right-hand columns show the actual motion-compensated images including their deformed landmarks – in the upper-left image, an additonal ordering of the landmarks is given that is referred to again in Fig. 11

image

Figure 11: Comparison of landmark accuracy for the cardiac MRI-dataset as measured by (39) (lower is better). Landmarks are ordered as in Fig. 10. While our method still produces the most accurate deformations (performing best for 13 out of 23 landmarks), results are much more balanced than in the textured ellipse-experiment (see Fig. 8) with  DPCA2performing remarkably well despite being based on pure PCA. However, one notable drawback which both competing methods exhibit is that landmarks occasionally feature more motion in the registered images than was actually present in the input sequence – this is indicated by crossings of their respective curves with the gray input curve (see for example landmarks 7 and 16 from the thorax area)

image

Figure 12: Singular values and vectors of the centered low-rank components  L − ¯Lfor the cardiac MRI-dataset. The development of the singular values (left) shows, that the nuclear norm  ||L − ¯L||∗(which is constrained by our model) is dominated by the three largest singular values  σ1 − σ3. Consequently, the low-rank components of the warped images primarily consist of a linear combination of ¯l (second from left) and the three corresponding singular vectors  s1, s2, s3(third, second, first from right). Especially note the visual congruences between these three singular vectors and the characteristics of the three considered heart phases: While  s1marks a blood flow into the left ventricle (see the diastolic filling in Fig. 10),  s2highlights the mitral valve, which is clearly visibly closed during the diastolic relaxation phase (see again Fig. 10). Moreover,  s3exhibits a high contrast between atrium and ventricel as present during the systole in Fig. 10

image

Figure 13: Sparse components  ei = Ti(ui) − lifor all frames used in Fig. 10. Nonzero entries are sparsely distributed across all images and primarily serve to correct the irregularities of the blood flow that were not captured by the low-rank components  li, i.e., that were not representable in a low-dimensional linear subspace. Thus, a meaningful low-rank approximation of the motion-corrected images is only enabled by allowing for sparsely distributed outliers

6.3 CDSOF

A comparison of the motion estimation capabilities of our method with the pairwise operating reference [33] for all sequences from Fig. 6 is given in Fig. 14.

In our model, we used the parameters  nlev= 4,  n1iter= 16,  njiterfor  j ≥2,  α= 0.91 across all three examples as well as  µ= 0.075 for the “Blinking Arrow”,  µ= 0.045 for the “Flying Snow” and  µ= 0.125 for the “Shadow on Truck” sequences. In the reference method implementation, we kept all parameters at standard values except for the regularization strengths, which were set to  λ= 10 for the “Blinking Arrow” and to  λ= 20 for both the “Flying Snow” and the “Shadow on Truck” sequence.

Upon inspection of the results in Fig. 14 and Tab. 2, it is clear that our approach was mostly outperformed by the reference method in terms of accuracy and computational efficiency. This is especially true for the sequence entitled “Shadow on Truck”, where our method failed to produce a meaningful motion correction. For the other two sequences, our model was able to generate motion fields that successfully aligned all deformable (non-reference) images in the presence of disturbances such as snow flakes and blinking signs.

We however point to the observation, that this alignment was not constructed with respect to the explicitly given reference but to another implicit reference generated by our algorithm. In

image

Table 2: Computation times of proposed approach and reference method for selected “CDSOF”-sequences

light of this implicit reference, the explicit one hence appears as an outlier. This phenomenon also explains the large discrepancies observed in Fig. 14 between the motion fields generated by our model and those generated by the reference method.

We draw two conclusions from the experiments:

1. The proposed method exhibits a notable sensitivity towards the degree to which input data meets the model assumption of decomposability into structural low-rank components and sparse outlier components. If variations in object appearance are too irregular or if distortions are too large in scale (as in the “Shadow on Truck”-sequence), the approach might fail to produce meaningful solutions.

2. Imposing an explicit reference on a groupwise operating method cannot be expected to produce deformations that align all deformable images to that reference. On the contrary, deformable images might rather be aligned to a more suitable implicit reference. We primarily attribute this phenomenon to the small relative weight of one fixed reference when compared to the remaining group of  N −1 deformable images.

We emphasize that the goal of this experiment is not to compete with a specialized method for a different domain (pairwise registration), but rather to give an indication of its usefulness on challenging non-medical real-world sequences.

In this work, we have investigated a novel dissimilarity metric for groupwise image registration tasks based on low-rank and sparse decompositions. The proposed metric corrects the major drawbacks that the established RPCA-image distance from [27, 16] exhibited in the experiments of Sec. 2. It is primarily suited for registering image data, that features objects with recurring changes in appearance and that can be represented in a low-dimensional linear subspace with potential sparse outliers. We especially emphasize the advantage in interpretability, when dealing with threshold constraints instead of weighted penalties.

We further developed a first-order primal-dual optimization framework for solving non-para-metric registration tasks using our metric in conjunction with TV regularization, which can easily be replaced by other regularization techniques suited for the individual application.

Experimentally, we were able to show the superiority of our method when compared to two commonly used groupwise registration models. The experiments included both synthetic and real-world image data, that met the assumptions made by our model well.

We further investigated the robustness of our model on a number of test sequences from the optical flow community, where we found that albeit it was outperformed by a highly optimized reference method, our model was able to correct motion in presence of distortions like snow flakes obstructing the view and illumination changes from blinking signs. As an interesting phenomenon, providing a reference to a groupwise method did not result in deformable images being aligned to the reference, but rather in the reference being treated as an outlier by our model. It remains to be investigated, to what extent this behavior is a trait of our particular model or of the general groupwise approach.

Another avenue for future research is the application of our proposed decomposition model to tasks other than image registration.

The authors thank Allen D. Elster (MRIQuestions.com) for kindly providing the cardiac cine study used in this article. The authors further acknowledge support through DFG grant LE 4064/1-1 “Functional Lifting 2.0: Efficient Convexifications for Imaging and Vision” and NVIDIA Corporation.

image

[1] L. Ambrosio, N. Fusco, and D. Pallara. Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, 2000.

[2] K. K. Bhatia, J. Hajnal, A. Hammers, and D. Rueckert. Similarity metrics for groupwise non-rigid registration. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2007, pages 544–552. Springer Berlin Heidelberg, 2007.

[3] K. K. Bhatia, J. V. Hajnal, B. K. Puri, A. D. Edwards, and D. Rueckert. Consistent groupwise non-rigid registration for atlas construction. In 2004 2nd IEEE International Symposium on Biomedical Imaging: Nano to Macro, volume 1, pages 908–911, 2004.

[4] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

[5] K. Brehmer, H. O. Aggrawal, S. Heldmann, and J. Modersitzki. Variational registration of multiple images with the SVD based SqN distance measure. In Scale Space and Variational Methods in Computer Vision – 7th International Conference, SSVM 2019, Hofgeismar, Germany, June 30 – July 4, 2019, Proceedings, pages 251–262, 2019.

[6] K. Brehmer, B. Wacker, and J. Modersitzki. A novel similarity measure for image sequences. In Biomedical Image Registration, pages 47–56. Springer International Publishing, 2018.

[7] E. J. Cand`es, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of the ACM, 58(3):1–37, 2011.

[8] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40:120–145, 2011.

[9] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2):572–596, 2011.

[10] A. Derksen, S. Heldmann, T. Polzin, and B. Berkels. Image registration with sliding motion constraints for 4D CT motion correction. In Bildverarbeitung fr die Medizin 2015, pages 335–340. Springer Berlin Heidelberg, 2015.

[11] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the  ℓ1-ball for learning in high dimensions. In Proceedings of the 25th International Conference on Machine Learning, pages 272–279, 2008.

[12] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, MD, USA, 3rd edition, 1996.

[13] J.-M. Guyader, W. Huizinga, D. H. J. Poot, M. van Kranenburg, A. Uitterdijk, W. J. Niessen, and S. Klein. Groupwise image registration based on a total correlation dissimilarity measure for quantitative MRI and dynamic imaging data. Scientific Reports, 8(1), 2018.

[14] C. Guyon, T. Bouwmans, and E.-h. Zahzah. Robust principal component analysis for background subtraction: Systematic evaluation and comparative analysis. In Principal Component Analysis, chapter 12. IntechOpen, 2012.

[15] V. Hamy, N. Dikaios, S. Punwani, A. Melbourne, A. Latifoltojar, J. Makanyanga, M. Chouhan, E. Helbren, A. Menys, S. Taylor, and D. Atkinson. Respiratory motion correction in dynamic MRI using robust data decomposition registration – application to DCE-MRI. Medical Image Analysis, 18(2):301–313, 2014.

[16] S. Heber and T. Pock. Shape from light field meets robust PCA. In Computer Vision – ECCV 2014, pages 751–767. Springer International Publishing, 2014.

[17] B. K. Horn and B. G. Schunck. Determining optical flow. Artificial intelligence, 17(1-3):185–203, 1981.

[18] W. Huizinga, D. H. J. Poot, J.-M. Guyader, R. Klaassen, B. F. Coolen, M. van Kranenburg, R. J. van Geuns, A. Uitterdijk, M. Polfliet, J. Vandemeulebroucke, A. Leemans, W. J. Niessen, and S. Klein. PCA-based groupwise image registration for quantitative MRI. Medical Image Analysis, 29:65–78, 2016.

[19] S. Klein, M. Staring, K. Murphy, M. A. Viergever, and J. P. Pluim. elastix: a toolbox for intensity-based medical image registration. IEEE Transactions on Medical Imaging, 29(1):196–205, 2010.

[20] X. Liu, M. Niethammer, R. Kwitt, M. McCormick, and S. Aylward. Low-rank to the rescue – atlas-based analyses in the presence of pathologies. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, pages 97–104. Springer International Publishing, 2014.

[21] S. Meister, B. J¨ahne, and D. Kondermann. Outdoor stereo camera system for the generation of real-world benchmark data sets. Optical Engineering, 51(2):1–7, 2012.

[22] C. T. Metz, S. Klein, M. Schaap, T. van Walsum, and W. J. Niessen. Nonrigid registration of dynamic medical imaging data using nD+t B-splines and a groupwise optimization approach. Medical Image Analysis, 15(2):238–249, 2011.

[23] J. Modersitzki. Numerical Methods for Image Registration. Oxford University Press, 2003.

[24] J. Modersitzki. FAIR: Flexible Algorithms for Image Registration. Society for Industrial and Applied Mathematics, 2009.

[25] N. Papenberg, A. Bruhn, T. Brox, S. Didas, and J. Weickert. Highly accurate optic flow computation with theoretically justified warping. International Journal of Computer Vision, 67(2):141–158, 2006.

[26] N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1:127–239, 2014.

[27] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma. RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 763–770, 2010.

[28] T. Pock, M. Urschler, C. Zach, R. Beichel, and H. Bischof. A duality based algorithm for TV-L1-optical-flow image registration. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2007, pages 511–518. Springer Berlin Heidelberg, 2007.

[29] M. Polfliet, S. Klein, W. Huizinga, M. M. Paulides, W. J. Niessen, and J. Vandemeulebroucke. Intrasubject multimodal groupwise registration with the conditional template entropy. Medical Image Analysis, 46:15–25, 2018.

[30] R. T. Rockafellar. Convex Analysis. Princeton Mathematical Series. Princeton University Press, Princeton, N. J., 1970.

[31] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis. Springer-Verlag Berlin Heidelberg, 1998.

[32] A. Sotiras, C. Davatzikos, and N. Paragios. Deformable medical image registration: A survey. IEEE Transactions on Medical Imaging, 32(7):1153–1190, 2013.

[33] D. Sun, S. Roth, and M. J. Black. Secrets of optical flow estimation and their principles. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2432–2439. IEEE, 2010.

[34] V. Vishnevskiy, T. Gass, S. Szekely, C. Tanner, and O. Goksel. Isotropic total variation regularization of displacements in parametric image registration. IEEE Transactions on Medical Imaging, 36(2):385–395, 2017.

[35] C. Zach, T. Pock, and H. Bischof. A duality based approach for realtime TV-L1 optical flow. In Pattern Recognition, pages 214–223. Springer Berlin Heidelberg, 2007.


Designed for Accessibility and to further Open Science