1.1 Groupwise Image Registration
The problem of aligning one image with another image of the same object is a well-studied problem in image processing and variational methods have proven successful for the task [23, 32]. However, many application scenarios involve data comprised of more than two images, as in the case of image data gathered over time, which necessitates groupwise methods. Naive pairwise techniques, that select one image from the group as a fixed reference and register all other images to the reference have been shown to be inconsistent with respect to registration accuracy (depending on the choice of the reference) and are generally deemed inferior to groupwise methods [22, 18]. These allow all images of the group to be deformed simultaneously and therefore operate on an implicit reference.
A crucial step in solving any image registration problem is the selection of a suitable dissimilarity metric on pairs or groups of images. In the past, both generalizations of established dissimilarity metrics for the classic two image problem and new concepts have been proposed to measure the distance between a group of N > 2 images. Examples for the former case include the variance-measure found in [2, 22] that extends the well-known sum of squared distances, different generalizations of the mutual information from [29, 18] and a multi-image version of the normalized gradient fields-measure in [6].
One example of a newly developed metric that is also related to the metric proposed in this work is from [18]. Given N images
, this measure operates on the so-called Casorati matrix
where vec() denotes a column-major vectorization. In
, one proceeds to penalize a weighted sum of the (nonnegative) eigenvalues
of the correlation matrix
¯M is the repeated columnwise mean of and Σ is diagonal with diagonal elements given by the standard deviations of the columns of
. To be exact, the metric is given by
As the number of nonzero eigenvalues of K is equal to the rank of , minimizing
promotes low-rankness of
and similarity between images is modeled as linear dependency. Note that apart from the Σ
-weighting in (2), the eigenvalues
correspond to variances along the principal components of
, which emphasizes the relation to the eponymous PCA.
will serve as a comparison method for our proposed metric in the experiments of section 6.
1.2 Robust PCA
As the classic PCA is known for its sensitivity towards sparsely distributed outliers, such methods are prone to fail for datasets involving partially unreliable data or strong changes in image intensity over time. To overcome this issue, different versions of a Robust PCA (RPCA) were proposed in the literature – see [14] for an extensive comparison. The most widely-used RPCA-variant is arguably the Principal Component Pursuit (PCP) from [9, 7]. PCP is derived as a convex relaxation of the combinatorial optimization problem
for given data . The term
denotes the number of non-zero entries of E. Replacing both summands of (4) with their convex hulls yields
which is convex in L and thus poses a more tractable optimization problem. is the so-called nuclear norm, defined as the sum of all singular values of L (see [12]) and
=
is a
-type norm. Especially recall the relationship between the singular values
and the rank of a matrix: rank(A) = #
(see again [12]). The decomposition of M generated by (5) is usually referred to as a low-rank and sparse decomposition, in which L is of low rank and
is sparse.
PCP has previously been used in the context of groupwise image registration by [27, 16, 15, 20]. Primarily tackled therein were datasets for which low-dimensional approximations using PCAbased techniques were not applicable due to occlusions, local changes in image intensity (for the case of DCE-MRI data) and irregular pathologies in medical image data. In all these publications, the data matrix M for (5) was constructed as a Casorati matrix (1). The authors of [15, 20] however only used low-rank and sparse decompositions as preprocessing steps and performed subsequent registrations on the generated low-rank components L with different algorithms. Contrary to that, [27, 16] both used the optimal value of (5) as a metric for the similarity of a set of given images .
Section 2 of this paper will present a deeper analysis of PCP as a distance measure. We argue that PCP has some inherent drawbacks: Perfect alignments of all often constitute local minimizers of PCP in only very narrow neighborhoods. At the same time, degenerated deformations result in comparatively lower energies. To overcome these issues, we present in this work a modification of PCP that is still convex and therefore easy to optimize.
1.3 Proposed Approach
Precisely, we propose to use the following groupwise dissimilarity measure:
Here is again the Casorati matrix (1), ¯L is the repeated columnwise mean of L and
0 is a suitable threshold for the nuclear norm. The intuition behind (6) is to jointly measure the
-distance between the input images and their optimal approximations in a low-dimensional linear subspace. Details are given in section 2. Our main contributions in this work involve the following:
• A novel technique for low-rank and sparse decompositions that results in a more suitable distance metric for groupwise registration tasks than previous approaches.
• A less restrictive uniqueness constraint than the one commonly employed in the literature.
• A multi-level strategy with theoretically justified scaling that solves the registration model in an iterative process and that uses first-order primal-dual optimization techniques to solve the subproblems.
1.4 Other Related Work
Major differences between our approach and related methods for variational groupwise registration are as follows:
Besides the fact that the -measure from [18] is based on the classic PCA (whereas ours is based on RPCA), the authors suggest a parametric deformation model based on B-Splines. Instead, we employ a non-parametric model that is fully deformable and that is explicitly (and flexibly) regularized through a total variation penalty (see section 3). Regularization in [18] is handled implicitly through grid point spacing, and the same is true for all of [3, 2, 22, 18, 13, 29], as they use B-Spline deformations in the same manner. Concerning the two PCP-based registration approaches [27, 16], the former is even further restricted to affine deformations, while the latter operates on light-field data, for which a geometric relationship between input images is known a priori and is exploited in the registration process.
Another non-parametric approach is presented in [6]. While also based on rank minimization, the authors use normalized image gradients as feature vectors and define alignments locally (instead of image intensities as features and global alignments as in this article). A continuation of [6] is found in [5], which generalizes the former approach to different kinds of feature vectors and formulates alignments globally.
1.5 Outline and Contributions
The remainder of this article is organized as follows: In section 2, we analyze the established PCP-metric and derive our proposed approach as a replacement. In section 3, the total variation is discussed as a regularizer for our model and a new uniqueness constraint for groupwise image registration algorithms is introduced. In section 4, an in-depth account of the optimization strategy and its implementation is given, including a multilevel scheme with theoretically justified scaling. In the subsequent sections 5 and 6, we introduce the benchmark data and present a numerical comparison to related approaches. Section 7 gives concluding remarks.
Figure 1: Depicted are five subsequent frames from a MRI sequence that serve as input for the experiments in Fig. 2: In order to analyze the behavior of a given dissimilarity measure, its energy is determined while one image is kept fixed and the remaining images are warped uniformly in a prescribed manner. Due to their short temporal offset, the input images
can be regarded as aligned
2.1 Classical Approach
The classical PCP image distance from [27, 16] is given by
with M as a Casorati matrix. The parameter
0 controls the weighting between the requirement on L to be of low rank and the requirement on E :=
to be sparse.
In order to assess the general applicability of (7) in the context of non-parametric groupwise registration, we conducted a number of experiments to examine the behavior of under certain predefined deformations of the images
.
To this end, we define an image (given over the same grid) through linear interpolation – see, e.g., [24, Chapter 3.3]. Using this convention, the experiments were performed by evaluating
)) for the four cases of the deformation sequence (
describing a translation, a rotation, a scaling and a shearing.
therefore acted as a fixed reference, while
were warped uniformly by
.
A schematic depiction of each transformation sequence is given in the first row of Fig. 2. Test data was comprised of the N = 5 frames from a cardiac MRI sequence, that are displayed in Fig. 1. As suggested by [7, Theorem 1.1], the weighting parameter for (7) was chosen as = (
.
The energy plots of for all four experiments are shown in the second row of Fig. 2. Additionally, their respective decompositions into the two summands
and
are displayed, where
denotes the minimizer of (7) over the variable L.
The results show that has two major shortcomings that are unfavorable for the purposes of image registration. Firstly, the point
= 0 at which the N images are most appropriately aligned only marks a local minimizer of
in a very narrow neighborhood of
in the translation experiment. In the scaling experiment,
even constitutes a global maximizer. Secondly, both the left or the right endpoints of each energy plot, i.e., the most degenerated of all evaluated deformations
, represent global minimizers in every experiment except for the rotation. This is especially problematic in case of the translation, since constant translations are also not penalized by any regularizer, that is based on derivatives of deformations. As a result, we consider
unsuitable as a distance metric in the general case.
2.2 Proposed Modified Measure
Based on these observations, we propose a new distance metric that modifies in two aspects. As a first step, the
-penalty term in (7) is replaced by a hard constraint to the set
for some suitable threshold
0. As a second step, we propose not to constrain the nuclear norm of L itself, but to constrain that of the centered variable
¯L instead. To this end, let
Figure 2: Experiments on the distance measures . The classical
only very narrow minima at the position
= 0 of perfect alignment. Even worse so, global minimizers for all experiments except the rotation are given by degenerated transformations. The proposed modification
these problems:
= 0 constitutes a global minimizer across all experiments. Furthermore, degenerated deformations generally result in high energies and are therefore not favored by this metric
¯L := ()
denote the matrix, in which every column is given by the average of the columns
of L. The proposed dissimilarity measure, which we term
-RPCA, is then given by
for all q > 1. In terms of the registration model, this means that a smaller nuclear norm for the uncentered variable L can be achieved by shifting all deformable images out of the image domain – thereby replacing them with the boundary value of zero – than by aligning them inside the images domain.
The continuation of the above example shows that this situation, which is highly undesirable for the purpose of image registration, is reversed when dealing with centered variables. These are given by ¯
(1
) and
¯
= 0 respectively and in fact, the
equivalent of relation (9) now reads
As a consequence, (8) does not favor shifting the deformable images out of the image domain and is therefore more suited for image registration.
Crucially, is still convex in the variable L, since
constitutes the level set of a convex function and is therefore convex [30, Theorem 4.6].
Repeating the above experiments for with the choice of
= 0
¯
for every set of deformed images in M, one obtains the energy plots in the third row of Fig. 2. In contrast to
, the modified metric
shows global minimizers at the point
= 0 across all cases. Additionally, the global maximizer of each curve is found towards its left or right endpoint and consequently results from a degenerated deformation. In conclusion, the two main issues of
as a distance function for registration tasks are hence resolved by
.
Apart from the interpretation of as a modified version of
, it can also be interpreted in the sense described as follows. First consider the case
= 0. This implies L = ¯L, which in turn implies constant columns
of L. Using this, one can solve (8) analytically for L by recalling, that
-distance minimization problems of the type
are solved by the median of () [4, p. 433]. As a consequence, the constant columns of L in the problem above are given by the pointwise median of
and
represents the remaining
-distance between the input images and that median.
In the case of 0,
can now more generally be interpreted as the joint
-distance between the images
and their individual (optimal) approximations
with deviations from the mean ¯l :=
restricted to a low-dimensional linear subspace.
Consequently, we deem (8) especially suited for image groups with inherent low-dimensional structure such as image sequences with strong or pronounced temporal repetition.
3.1 Total Variation Regularization
Total variation (TV) is a popular choice for regularizing motion fields in applications of both optical flow estimation and image registration due to its distinguishing feature of allowing discontinuities in the solution. TV therefore sets itself apart from other common regularizers such as diffusive, elastic or curvature energies that favor smooth transformations. Exemplary early applications of TV regularization for optical flow estimation can be found in [25, 35] and for image registration in [28, 34]. In the context of medical image processing, TV regularization is particularly interesting when modeling non-smooth sliding motions, since it eliminates the necessity to explicitly mask all sliding interfaces beforehand [10].
can be defined as
where denotes the distributional (measure-valued) derivative of
with values in
. In case of
), (12) is equivalent to
For all further details, we refer to [1].
In our registration model, we assume rectangular domains Ω and employ a standard discretization scheme with cell-centered grids of resolution
and grid spacings of (
in the two coordinate directions. Optimization is performed over discrete displacement fields
, for which we use finite forward differences and Neumann boundary conditions to discretize (13). Following [34], we use the notation
Therein, denotes the finite difference operator with the aforementioned characteristics.
3.2 Uniqueness Constraint
As our model does not make use of an explicit reference image that all other images are aligned to, we need to employ an additional constraint on the displacements in order to ensure the uniqueness of a solution.
This can be seen from the simple example, in which display uniform objects, e.g., white rectangles, before a black background. Consider the case of a perfect alignment
) =
) of these rectangles inside the image domain Ω. If all deformations
are simultaneously offset by
, such that the new deformations ˆ
still align
inside the common domain
, then (ˆ
constitute a solution equal to (
both in terms of
and TV
.
For TV, this is explained by (15) solely penalizing derivatives of deformation fields which are always invariant to translations.
The invariance for is due to the equivalence of an offset by t and a simple reordering of the pixels between
) and
) (due to the zero boundary condition). Clearly, the
-term in (8) is invariant to any reordering and the same is true for the nuclear norm constraint, since a consistent reordering of all
) results in a row permutation of the Casorati matrix M = [vec(
vec(
))]. As a short derivation shows, a row permutation does not affect the singular values of a matrix:
Let be an arbitrary matrix and let
be a permutation. If a singular value decomposition (SVD) of A is given by
, then PA = (
constitutes a valid SVD of PA due to PU still being orthogonal, i.e.,
Thus, the singular values on the diagonal of Σ stay unaffected and so does the nuclear norm .
In order to eliminate this remaining degree of freedom from the model, we impose an additional constraint on the deformations , enforcing the mean (or equivalently the sum) over all deformations and grid points to be zero in each coordinate direction:
Note that [22, 18, 13, 29] constrain their deformations in a related manner by demanding the mean of all deformations to be zero at every grid point as first introduced by [3]. The difference however is, that (17) only imposes one constraint per dimension instead of one constraint per grid point and dimension. As a result, (17) restricts the space of feasible solutions much less severely while still ensuring uniqueness.
In this section, we present an optimization scheme for our groupwise registration model that is strongly related to the work in [16]. First we combine all components derived in the previous sections into the complete registration model
in which 0 controls the regularization strength.
4.1 Linearized Subproblems
In order to be able to apply convex optimization methods to (18), one needs to deal with the nonlinearity of the expressions ) that leads to a non-convexity of the model. As in [28, 35, 27, 16], an iterative linearization of the deformed images is used to overcome this issue. Note that while a one-time linear approximation would also be possible in theory, the strong locality of such an approximation becomes prohibiting when larger deformations are required to align the images.
In the following, we assume all variables to be in vector format (including the values of all ) and for brevity’s sake omit the explicit notation of reshaping operations like vec(
). The linearization of
can then be expressed as
for a suitable point ˜. This enables one to approximate the first term in (18) by
Using vectorized variables further allows one to rewrite the centering of L as a linear operation KL with
Since solving (18) through iterative (re-)linearization amounts to solving a series of subproblems, we propose to treat is as a process, in which the threshold is successively decreased to the threshold value for which the original problem (18) is meant to be solved.
Assuming a predefined number of linearization steps and denoting the final threshold by
, we therefore employ a series of thresholds
for the iterative solution of the separate subproblems. As a strategy to select these parameters, we propose to choose
relative to the nuclear norm of the centered input images and to progressively decrease
to that value by multiplication with a constant factor
(0, 1).
More specifically, let M denote the Casorati matrix of the input images and let ¯M denote the columnwise repetition of their mean. If one now wants to meet a final threshold of ¯
for some
(0, 1) and one employs a predefined number of
linearization steps, the proposed strategy amounts to choosing
where .
4.2 Solving the Convex Subproblem
Denoting all entries of corresponding to the c-th coordinate axis by
(
in the non- vectorized notation of (18)), the t-th subproblem now reads
We solve these subproblems using the primal-dual optimization algorithm 1 from [8] that is designed for finding saddle-points of problems of the type
=: ¯
:
¯R are proper, lower-semicontinuous, convex functions and
denotes the conjugate of another proper, lower-semicontinuous, convex function
¯R.
further denotes a linear operator. As is well-known, (24) is equivalent to the primal
minimization problem
For all details, we refer to [31, Chapter 11].
In our case, we bring (23) into the form (25) by assigning the first three terms of (23) to F and the remaining uniqueness term to H. The primal variables for this problem are given by the union of all variables over which (23) is minimized,
and we define the linear operator A to be
where K and G are as in (21) and (15) respectively.
This allows for a separable definition of the function F by F(z) := ) +
) +
) and
where := (
and where the vector b = (
gathers the constants
:=
for k = 1, . . . , N. As the remaining uniqueness term does not depend on
, we define H(x) := ˜
)) and
In order to apply Alg. 1 to the problem, the proximal operators (id +and (id +
are required to compute the updates (26). We point out that the separable nature of F implies a decomposability of (id +into three terms corresponding to the three summands of F (or equivalently of
). These terms as well as the proximal operator of H are given by standard expressions, for which we refer to [26]. We however emphasize the point that the proximal step corresponding to the nuclear norm constraint
requires both an SVD y = U diag(, of the input y (assumed to be
-shaped) and a projection Π
onto the
-unit ball. While the latter step cannot be solved in a decoupled manner, there exist exact algorithms with time complexity O(N) to compute such projections – in our implementation we employ the approach from [11].
Finally, we shortly address the problem of determining the spectral norm that the primal and dual step sizes
for Alg. 1 are based on. Since the linear operator A given by (28) contains the image gradients
) and is therefore dependent on empirical data, an analytical solution for
is unattainable. Instead, we apply a simple power iteration scheme to estimate this quantity [12, Section 7.3.1].
4.3 Multilevel Scheme and Parameter Scaling
Moreover, a consistent scaling of the thresholds is required since the low-rank components change in resolution as well in between stages. To this end, it is useful to derive by what factor the nuclear norm of a matrix
scales when it is prolongated to the next higher resolution. Interpreting the columns of M as vectorized images and assuming p > q, we define the prolongation
of M as the separate prolongation of these image columns (as in Fig. 3). The operation can therefore be expressed as
where is a suitable permutation. As the singular values of a matrix are invariant under row-permutations (see (16)), one can however restrict the analysis of the singular values for (35) to the case P = I. Let an economic SVD of M now be given by
with
and Σ
[12, Chapter 2.5]. Then it holds
Figure 3: Prolongation scheme. Variable values for the index (i, j) in the low-resolution coordinate system (left) are propagated to the variables indexed by (2) in the high-resolution system
(36) however does not constitute a valid (economic) SVD of ˆM, since the columns of ˆU are no longer normalized: One has ( ˆ= 4 for i = j and ( ˆ
= 0 for
. In order to regain a valid SVD, a factor of 2 has to be redistributed from ˆU to Σ, i.e.,
This implies || ˆ= 2
, which in turn implies that the sought factor is given by 2. Also note that this result can easily be generalized to the case of d-dimensional images, where that factor is given 2
.
The overall solution scheme is summarized by Alg. 2, where an image domain of Ω = [0[0, n] is assumed for the input images
. Further assumed are a predefined number
of resolution stages, predefined numbers
of linearization steps per stage (for j = 1
), as well as a final relative threshold parameter of
= exp(ln(
)
) (see subsection 4.1).
5.1 Synthetic Dataset: Textured Ellipse
The purpose of the first synthetic dataset is to illustrate the capacity of our model to correct the motion of objects with recurring changes in texture, exposing the inherent low-dimensional structure of the dataset.
The image sequence is comprised of ten frames displaying a textured ellipse moving in a semicircular manner before a black background that further features a fixed white rectangle and a fixed white frame. The texture of the ellipse alternates between vertical stripes for all oddly indexed frames and horizontal stripes for all evenly indexed frames.
To quantify the accuracy of registration on this dataset, we equipped each frame with 17 landmarks at the same corresponding (analytically determined) positions. All frames were generated at a resolution of 200 200 pixels. As an example, four out of the ten input frames are displayed along with their landmarks in Fig. 4.
5.2 Real-world Dataset I: Cardiac MRI
Besides the challenge of motion correction in the presence of recurring changes in object appearance that the first synthetic dataset posed, the first real-world dataset comes with the additional difficulty of irregular disturbances to object appearance.
The sequence consists of cardiac MRI data in the so-called two-chamber view, where the left atrium and ventricle are on display. Seven repetitions of the heart cycle with blood flow in and out of the two chambers as well as breathing-induced motions of several structures like the thorax, the diaphragm, and the heart are shown.
For this dataset, changes in object appearance relate to different phases of the heart cycle as we selected one frame from each systole, one frame from each diastolic relaxation and one frame from each diastolic filling (making for a total of 21 input frames). Due to the turbulent nature of the
Figure 4: Exemplary frames from the textured ellipse-dataset with their respective landmarks. A perfect motion correction is expected to unify the ellipse positions, while keeping the white frames and rectangles stationary
Figure 5: Exemplary frames from the cardiac MRI dataset with their respective landmarks. While clear visual congruences between different images of the same phase exist, they are obscured by the irregularity of the blood flow and pose a particular challenge to distance measures that model similarity as linear dependence
Figure 6: Reference frames for selected sequences from [21]. The “Blinking Arrow”-sequence (left) is deemed challenging because of intensity changes resulting from a blinking traffic sign, the challenge in the “Flying Snow”-sequence (center) consists of heavy snowfall obstructing the view, and the “Shadow on Truck”-sequence (right) features rapidly changing shadow patterns that do not describe physical motion
blood flow, the visual appearances of these phases are somewhat irregular and pose an interesting test case for the low-rank/sparse decomposition generated by our model.
As with the textured ellipse-dataset, we equipped this sequence with 23 handselected landmarks per frame. Each individual image was resolved with 220220 pixels. The respective input frames for the first and last heart cycle are displayed together with their respective landmarks in Fig. 5.
5.3 Real-world Dataset II: Challenging Data for Stereo and Optical Flow
We also evaluated our model on a variety of test sequences from the “Challenging Data for Stereo and Optical Flow”-dataset (CDSOF) [21]. This dataset features eleven sequences captured in real-world traffic situations that are deemed challenging for motion estimation algorithms due to diverse phenomena such as illumination changes from blinking signs, occlusions from snowflakes, and blurs from water spray.
We selected subsequences of 10 frames from the datasets entitled “Blinking Arrow”, “Flying Snow” and “Shadow on Truck” as test cases as they all feature different distortions that pose interesting challenges to the robustness of our model. In order to restrict the required computational effort, we downsampled all used frames to a resolution of 271 328 pixels.
Contrary to the other two datasets, all sequences from [21] come with a predefined reference frame, which is why we drop the uniqueness constraint from section 3.2 for these inputs. Instead, we enforce alignments with the reference through a constraint of the form ), where
is the displacement field for the respective reference. The reference images for all selected sequences are shown in Fig. 6.
For the former two datasets from section 5, we compare our registration approach to the following two methods:
1. An approach based on the simple variance dissimilarity measure given by
which has previously been used by [2, 22]. We combine (38) with the same TV-regularization and the same uniqueness constraint as in our model (18).
2. A publicly available implementation of the -metric from [18] in the elastix software package [19]. This method uses a cubic B-spline transformation model and implicit regularization.
For each individual landmark, accuracy is measured in terms of mean Euclidean distance to the mean landmark position
therein denotes the position of the i-th landmark in the k-th image.For the CDSOF-datasets we compared our approach to the publicly available implementation of the “nonlocal” optical flow estimation method from [33], that was suggested as a referemce by the authors of [21]. [33] extends the classical Horn-Schunck model for optical flow estimation [17] by a number of techniques, e.g., an additional nonlocal term derived from median filtering. To give a meaningful comparison, we altered our algorithm to include the median filtering of flow fields in between linearization steps as well. Note that the reference method only operates in a pairwise fashion.
All experiments were performed on an Intel Core i7-8700 (63.20 GHz) system with 64 GB of memory, running Matlab R2019a under Ubuntu 18.04 (64-Bit). Our implementation is publicly available at
Computation times for the textured ellipse- and cardiac MRI-datasets are given in Tab. 1 and for the CDSOF-dataset in Tab. 2.
6.1 Textured Ellipse
Using the parameters = 0
= 0.2 as well as
= 3 with
= 16 and
= 2 for
2 in Alg. 2, we achieved the results displayed in Figures 7 and 9 for the textured ellipse-dataset. We shortly note that we observed values of
2 for
2 to be mandatory in order for Alg. 2 to generate useful linearization points on higher resolution levels. Furthermore, a small enough
Table 1: Computation times of competing methods
was crucial in finding low-rank components L that accurately describe the structural changes in texture – higher values of on the other hand allowed for unwanted motion artifacts in L.
For the variance method based on (38), we employed an adapted version of Alg. 2 with the regularization strength set to = 0.1. The number of resolution levels as well as the iterations per level were kept the same as for our method.
In the elastix-based implementation of , we increased the settings recommended by the authors of [18] to three resolution stages (instead of the recommended two), 2.000 iterations per stage (recommended: 1.000) and 25.000 random coordinates per stage (recommended: 2.048) in order to ensure sufficient computational capacity for the method to produce accurate solutions.
Fig. 7 visualizes both the deformations calculated by our model and the warped images for all four exemplary frames from Fig. 4. Fig. 9 further analyzes the generated low-rank components in terms of singular values and (left) singular vectors of ¯L, i.e., the matrix whose nuclear norm is constrained by our model (18). As the dataset is of synthetic nature and therefore without intensity distortions, the sparse outlier components
are negligible and are not displayed.
A quantitative comparison in terms of landmark accuracy between our approach and the two competing methods is presented in Fig. 8. The comparison shows that our method significantly outperforms the other approaches on this dataset: While it corrects the ellipse positions most accurately out of the three methods, it is also the only one that does not introduce notable motion to the white rectangle and frame (which were already stationary in the input sequence).
In terms qualitative results, Fig. 9 shows that our method was moreover able to find a nearperfect embedding of the motion-compensated images in a low-dimensional subspace: The negligible magnitudes of the singular values of the centered low-rank components
¯L indicate that a two-dimensional basis consisting of the mean low-rank component ¯l and the singular vector
is largely sufficient to approximate the output images.
6.2 Cardiac MRI
For the cardiac MRI dataset, we only adjusted the regularization strength and threshold-scaling parameters of our method to = 0.125 and
= 0.95. In the variance registration method, we kept all parameters fixed except for
= 0.065 and since the
model does not feature explicit regularization parameters, we left all the above settings unchanged for this method.
Results of our approach are presented in Figures 10, 12 and 13. Figures 10 and 13 show deformations, warped images and sparsity components for all example frames from Fig. 5. In Fig. 12, singular values and singular vectors of ¯L are analyzed in a presentation similar to Fig. 9. The quantitative evaluation in terms of landmark accuracy is visualized in Fig. 11.
Although results are generally more balanced than was the case for the synthetic data, our approach still outperforms the two competing methods on this real-world dataset in terms of landmark accuracy. We refer to Fig. 11, where our method not only achieves the highest accuracy for the majority of the landmarks, but where it is the only out of the three methods that did not introduce additional motion to any landmarks in the registration process: both competing approaches generate several landmarks with worse accuracy than in the unregistered input sequence.
In terms of the low-rank/sparse decomposition, we remark that our model was able to generate meaningful low-rank components alongside the actual motion-compensation: As seen from Fig. 12, the centered low-rank components matrix ¯L is dominated by three singular value/vector pairs in which the singular vectors
exhibit clear visual congruences with the three considered phases of the heart cycle. Congruences thereby consist of highlighted anatomical structures and physiological features such as the mitral valve and the direction of blood flow.
Furthermore, granting the method the ability to define sparse outlier components aided the generation of a meaningful low-rank approximation by filtering out highly irregular image feature such as the turbulences of blood flow (see Fig. 13).
Figure 7: Results of proposed approach for the selected frames from Fig. 4. Our model allows to correct the motion of the ellipse through piecewise constant deformations (top row), while automatically detecting and discarding repetitive structural noise (the horizontal and vertical bars) in the registration process. The motion-corrected images (bottom row) exhibit a good visual correspondence between matching landmarks. Quantitative results are given in Fig. 8, in which landmarks are indexed in the same order as in
Figure 8: Comparison of landmark accuracy for the textured ellipse-dataset as measured by (39) (lower is better). Landmarks are ordered as in Fig. 7 with landmarks 5 9 attributed to the moving ellipse and the remaining landmarks positioned around the stationary white rectangle and frame. Note that the latter do not appear in the input curve due to zero error and logarithmic axis scaling. Our method clearly outperforms the two competing approaches as it is able to correct the positions of the “ellipse-landmarks” most accurately without introducing artificial motion to the remaining landmarks
Figure 9: Singular values and vectors of centered low-rank components . The singular value progression (left) over the inner iterations of Alg. 2 (scaling adjusted, see subsection 4.3) shows that the norm
– the quantity constrained by our model – is dominated by the singular value
towards the end of the iteration. This indicates that the columns of L can be reconstructed from their mean ¯l (second from left) and the dominating singular vector
of the centered low-rank components
with only minor error. Intuitively,
is added to ¯l when reconstructing horizontal bars and subtracted from ¯l when reconstructing vertical bars. In comparison, the second singular vector
only describes negligible remainders of motion with little influence on
(as indicated by the final magnitude of
in the left curve). To summarize, our method was able to automatically detect the low-dimensional texture variations present in the dataset
Figure 10: Results of proposed approach on the cardiac MRI-dataset. The difference images per phase between the first and seventh heart cycle before and after registration (left two columns) show a successful motion compensation: Motion in the thorax area as well as the front-facing area of the diaphragm and the heart apex is greatly reduced across all phases. The main intensity differences after registration are located inside the heart itself and are due to irregular turbulences of the blood flow. The two right-hand columns show the actual motion-compensated images including their deformed landmarks – in the upper-left image, an additonal ordering of the landmarks is given that is referred to again in Fig. 11
Figure 11: Comparison of landmark accuracy for the cardiac MRI-dataset as measured by (39) (lower is better). Landmarks are ordered as in Fig. 10. While our method still produces the most accurate deformations (performing best for 13 out of 23 landmarks), results are much more balanced than in the textured ellipse-experiment (see Fig. 8) with performing remarkably well despite being based on pure PCA. However, one notable drawback which both competing methods exhibit is that landmarks occasionally feature more motion in the registered images than was actually present in the input sequence – this is indicated by crossings of their respective curves with the gray input curve (see for example landmarks 7 and 16 from the thorax area)
Figure 12: Singular values and vectors of the centered low-rank components for the cardiac MRI-dataset. The development of the singular values (left) shows, that the nuclear norm
(which is constrained by our model) is dominated by the three largest singular values
. Consequently, the low-rank components of the warped images primarily consist of a linear combination of ¯l (second from left) and the three corresponding singular vectors
(third, second, first from right). Especially note the visual congruences between these three singular vectors and the characteristics of the three considered heart phases: While
marks a blood flow into the left ventricle (see the diastolic filling in Fig. 10),
highlights the mitral valve, which is clearly visibly closed during the diastolic relaxation phase (see again Fig. 10). Moreover,
exhibits a high contrast between atrium and ventricel as present during the systole in Fig. 10
Figure 13: Sparse components for all frames used in Fig. 10. Nonzero entries are sparsely distributed across all images and primarily serve to correct the irregularities of the blood flow that were not captured by the low-rank components
, i.e., that were not representable in a low-dimensional linear subspace. Thus, a meaningful low-rank approximation of the motion-corrected images is only enabled by allowing for sparsely distributed outliers
6.3 CDSOF
A comparison of the motion estimation capabilities of our method with the pairwise operating reference [33] for all sequences from Fig. 6 is given in Fig. 14.
In our model, we used the parameters = 4,
= 16,
for
2,
= 0.91 across all three examples as well as
= 0.075 for the “Blinking Arrow”,
= 0.045 for the “Flying Snow” and
= 0.125 for the “Shadow on Truck” sequences. In the reference method implementation, we kept all parameters at standard values except for the regularization strengths, which were set to
= 10 for the “Blinking Arrow” and to
= 20 for both the “Flying Snow” and the “Shadow on Truck” sequence.
Upon inspection of the results in Fig. 14 and Tab. 2, it is clear that our approach was mostly outperformed by the reference method in terms of accuracy and computational efficiency. This is especially true for the sequence entitled “Shadow on Truck”, where our method failed to produce a meaningful motion correction. For the other two sequences, our model was able to generate motion fields that successfully aligned all deformable (non-reference) images in the presence of disturbances such as snow flakes and blinking signs.
We however point to the observation, that this alignment was not constructed with respect to the explicitly given reference but to another implicit reference generated by our algorithm. In
Table 2: Computation times of proposed approach and reference method for selected “CDSOF”-sequences
light of this implicit reference, the explicit one hence appears as an outlier. This phenomenon also explains the large discrepancies observed in Fig. 14 between the motion fields generated by our model and those generated by the reference method.
We draw two conclusions from the experiments:
1. The proposed method exhibits a notable sensitivity towards the degree to which input data meets the model assumption of decomposability into structural low-rank components and sparse outlier components. If variations in object appearance are too irregular or if distortions are too large in scale (as in the “Shadow on Truck”-sequence), the approach might fail to produce meaningful solutions.
2. Imposing an explicit reference on a groupwise operating method cannot be expected to produce deformations that align all deformable images to that reference. On the contrary, deformable images might rather be aligned to a more suitable implicit reference. We primarily attribute this phenomenon to the small relative weight of one fixed reference when compared to the remaining group of 1 deformable images.
We emphasize that the goal of this experiment is not to compete with a specialized method for a different domain (pairwise registration), but rather to give an indication of its usefulness on challenging non-medical real-world sequences.
In this work, we have investigated a novel dissimilarity metric for groupwise image registration tasks based on low-rank and sparse decompositions. The proposed metric corrects the major drawbacks that the established RPCA-image distance from [27, 16] exhibited in the experiments of Sec. 2. It is primarily suited for registering image data, that features objects with recurring changes in appearance and that can be represented in a low-dimensional linear subspace with potential sparse outliers. We especially emphasize the advantage in interpretability, when dealing with threshold constraints instead of weighted penalties.
We further developed a first-order primal-dual optimization framework for solving non-para-metric registration tasks using our metric in conjunction with TV regularization, which can easily be replaced by other regularization techniques suited for the individual application.
Experimentally, we were able to show the superiority of our method when compared to two commonly used groupwise registration models. The experiments included both synthetic and real-world image data, that met the assumptions made by our model well.
We further investigated the robustness of our model on a number of test sequences from the optical flow community, where we found that albeit it was outperformed by a highly optimized reference method, our model was able to correct motion in presence of distortions like snow flakes obstructing the view and illumination changes from blinking signs. As an interesting phenomenon, providing a reference to a groupwise method did not result in deformable images being aligned to the reference, but rather in the reference being treated as an outlier by our model. It remains to be investigated, to what extent this behavior is a trait of our particular model or of the general groupwise approach.
Another avenue for future research is the application of our proposed decomposition model to tasks other than image registration.
The authors thank Allen D. Elster (MRIQuestions.com) for kindly providing the cardiac cine study used in this article. The authors further acknowledge support through DFG grant LE 4064/1-1 “Functional Lifting 2.0: Efficient Convexifications for Imaging and Vision” and NVIDIA Corporation.
[1] L. Ambrosio, N. Fusco, and D. Pallara. Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, 2000.
[2] K. K. Bhatia, J. Hajnal, A. Hammers, and D. Rueckert. Similarity metrics for groupwise non-rigid registration. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2007, pages 544–552. Springer Berlin Heidelberg, 2007.
[3] K. K. Bhatia, J. V. Hajnal, B. K. Puri, A. D. Edwards, and D. Rueckert. Consistent groupwise non-rigid registration for atlas construction. In 2004 2nd IEEE International Symposium on Biomedical Imaging: Nano to Macro, volume 1, pages 908–911, 2004.
[4] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[5] K. Brehmer, H. O. Aggrawal, S. Heldmann, and J. Modersitzki. Variational registration of multiple images with the SVD based SqN distance measure. In Scale Space and Variational Methods in Computer Vision – 7th International Conference, SSVM 2019, Hofgeismar, Germany, June 30 – July 4, 2019, Proceedings, pages 251–262, 2019.
[6] K. Brehmer, B. Wacker, and J. Modersitzki. A novel similarity measure for image sequences. In Biomedical Image Registration, pages 47–56. Springer International Publishing, 2018.
[7] E. J. Cand`es, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of the ACM, 58(3):1–37, 2011.
[8] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40:120–145, 2011.
[9] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2):572–596, 2011.
[10] A. Derksen, S. Heldmann, T. Polzin, and B. Berkels. Image registration with sliding motion constraints for 4D CT motion correction. In Bildverarbeitung fr die Medizin 2015, pages 335–340. Springer Berlin Heidelberg, 2015.
[11] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the -ball for learning in high dimensions. In Proceedings of the 25th International Conference on Machine Learning, pages 272–279, 2008.
[12] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, MD, USA, 3rd edition, 1996.
[13] J.-M. Guyader, W. Huizinga, D. H. J. Poot, M. van Kranenburg, A. Uitterdijk, W. J. Niessen, and S. Klein. Groupwise image registration based on a total correlation dissimilarity measure for quantitative MRI and dynamic imaging data. Scientific Reports, 8(1), 2018.
[14] C. Guyon, T. Bouwmans, and E.-h. Zahzah. Robust principal component analysis for background subtraction: Systematic evaluation and comparative analysis. In Principal Component Analysis, chapter 12. IntechOpen, 2012.
[15] V. Hamy, N. Dikaios, S. Punwani, A. Melbourne, A. Latifoltojar, J. Makanyanga, M. Chouhan, E. Helbren, A. Menys, S. Taylor, and D. Atkinson. Respiratory motion correction in dynamic MRI using robust data decomposition registration – application to DCE-MRI. Medical Image Analysis, 18(2):301–313, 2014.
[16] S. Heber and T. Pock. Shape from light field meets robust PCA. In Computer Vision – ECCV 2014, pages 751–767. Springer International Publishing, 2014.
[17] B. K. Horn and B. G. Schunck. Determining optical flow. Artificial intelligence, 17(1-3):185–203, 1981.
[18] W. Huizinga, D. H. J. Poot, J.-M. Guyader, R. Klaassen, B. F. Coolen, M. van Kranenburg, R. J. van Geuns, A. Uitterdijk, M. Polfliet, J. Vandemeulebroucke, A. Leemans, W. J. Niessen, and S. Klein. PCA-based groupwise image registration for quantitative MRI. Medical Image Analysis, 29:65–78, 2016.
[19] S. Klein, M. Staring, K. Murphy, M. A. Viergever, and J. P. Pluim. elastix: a toolbox for intensity-based medical image registration. IEEE Transactions on Medical Imaging, 29(1):196–205, 2010.
[20] X. Liu, M. Niethammer, R. Kwitt, M. McCormick, and S. Aylward. Low-rank to the rescue – atlas-based analyses in the presence of pathologies. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, pages 97–104. Springer International Publishing, 2014.
[21] S. Meister, B. J¨ahne, and D. Kondermann. Outdoor stereo camera system for the generation of real-world benchmark data sets. Optical Engineering, 51(2):1–7, 2012.
[22] C. T. Metz, S. Klein, M. Schaap, T. van Walsum, and W. J. Niessen. Nonrigid registration of dynamic medical imaging data using nD+t B-splines and a groupwise optimization approach. Medical Image Analysis, 15(2):238–249, 2011.
[23] J. Modersitzki. Numerical Methods for Image Registration. Oxford University Press, 2003.
[24] J. Modersitzki. FAIR: Flexible Algorithms for Image Registration. Society for Industrial and Applied Mathematics, 2009.
[25] N. Papenberg, A. Bruhn, T. Brox, S. Didas, and J. Weickert. Highly accurate optic flow computation with theoretically justified warping. International Journal of Computer Vision, 67(2):141–158, 2006.
[26] N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1:127–239, 2014.
[27] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma. RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 763–770, 2010.
[28] T. Pock, M. Urschler, C. Zach, R. Beichel, and H. Bischof. A duality based algorithm for TV-L1-optical-flow image registration. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2007, pages 511–518. Springer Berlin Heidelberg, 2007.
[29] M. Polfliet, S. Klein, W. Huizinga, M. M. Paulides, W. J. Niessen, and J. Vandemeulebroucke. Intrasubject multimodal groupwise registration with the conditional template entropy. Medical Image Analysis, 46:15–25, 2018.
[30] R. T. Rockafellar. Convex Analysis. Princeton Mathematical Series. Princeton University Press, Princeton, N. J., 1970.
[31] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis. Springer-Verlag Berlin Heidelberg, 1998.
[32] A. Sotiras, C. Davatzikos, and N. Paragios. Deformable medical image registration: A survey. IEEE Transactions on Medical Imaging, 32(7):1153–1190, 2013.
[33] D. Sun, S. Roth, and M. J. Black. Secrets of optical flow estimation and their principles. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2432–2439. IEEE, 2010.
[34] V. Vishnevskiy, T. Gass, S. Szekely, C. Tanner, and O. Goksel. Isotropic total variation regularization of displacements in parametric image registration. IEEE Transactions on Medical Imaging, 36(2):385–395, 2017.
[35] C. Zach, T. Pock, and H. Bischof. A duality based approach for realtime TV-L1 optical flow. In Pattern Recognition, pages 214–223. Springer Berlin Heidelberg, 2007.