Depth Descent Synchronization in $\mathrm{SO}(D)$

2020·Arxiv

Abstract

Abstract

We give robust recovery results for synchronization on the rotation group, SO(D). In particular, we consider an adversarial corruption setting, where a limited percentage of the observations are arbitrarily corrupted. We develop a novel algorithm that exploits Tukey depth in the tangent space of SO(D). This algorithm, called Depth Descent Synchronization, exactly recovers the underlying rotations up to an outlier percentage of , which corresponds to 1/4 for SO(2) and 1/8 for SO(3). In the case of SO(2), we demonstrate that a variant of this algorithm converges linearly to the ground truth rotations. We implement this algorithm for the case of SO(3) and demonstrate that it performs competitively on baseline synthetic data.

Keywords Robust synchronization Structure from motion Nonconvex optimization Multiple rotation averaging

1 Introduction

The typical synchronization problem involves recovery of n group elements from pairwise measurements between them. It arises, for example, when solving the Structure from Motion (SfM) problem. One subproblem of SfM is to recover the three-dimensional orientations and positions of cameras from pairwise orientations and positions in relation to a scene [ ¨Ozyes¸il et al., 2017]. Here, we specifically focus on robust synchronization over SO(D), the rotation group for . That is, given pairwise rotations in SO(D), some of which are corrupted, we aim to recover the original set of n rotations.

We assume n unknown, ground truth elements of SO(D), which we denote by . We form a graph G([n], E), where [n] := {1, . . . , n} indexes the n unknown elements and E designates the edges for which measurements of relative rotations are taken. For each , we are provided with the measurement

We can think of in the following way: If we are oriented in the coordinate system with respect to node k, then rotates our coordinate system into the coordinate system we would see if we were sitting at node j. This synchronization formulation extends to any given group, where one wishes to recover , an n-tuple of elements in the group, given measurements of the group ratios .

In reality, we cannot hope to exactly measure all the pairwise rotations in (1). In many real systems, both noisy and corrupted measurements occur: our focus here is on adversarially corrupted measurements. That is, within the measurement graph G, the corruption model is assumed to be fully adversarial. Our model is specified by partitioning the measured data into two parts:

1. We observe corrupted (or “bad”) edges , where all edges in have a corresponding arbitrary corruption. The adversary is allowed to choose (and thus may to some degree influence the connectivity of ) as well as the corrupted values for . For each node, the adversary is only allowed to corrupt a limited fraction of edges.

2. The rest of the observed edges are uncorrupted (or “good”) edges , where each edge in has an associated measurement given by (1). Theoretically guaranteed methods for robust synchronization are still lacking, especially in adversarial

and nonconvex settings. The development of these methods is important because in practice measurements are

usually quite corrupted, especially in applied problems like Structure from Motion [ ¨Ozyes¸il et al., 2017]. The results we establish here are concerned with exact recovery. That is, given a set of corrupted measurements, we wish to exactly recover . We will show that this is possible for a nonconvex method even in the presence of a significant amount of arbitrary corruption. Our method falls into the class of multiple rotation averaging algorithms [Govindu, 2004, Martinec and

Pajdla, 2007, Hartley et al., 2013]. These methods are effectively coordinate descent algorithms, which are

present highly efficient algorithms for nonconvex programs that are notoriously hard to analyze. While their analysis is challenging, it is imperative to develop a theoretical understanding of these methods and their robust counterparts [Hartley et al., 2011, Chatterjee and Govindu, 2017]. Moreover, as we discuss later, there are few robustness guarantees for group synchronization with adversarial corruption. Among the limited guarantees, none cover our model, and we thus make a significant contribution to this area. This work is also of general appeal to the nonconvex optimization community since we are able to prove convergence results in the complex nonconvex landscape of robust multiple rotation averaging. Furthermore, some energy landscapes associated with this problem exhibit many local minima and spurious fixed points, which we are able to avoid with our new method.

1.1 Contributions of This Work

The main contributions of this work follow.

1. As a warm-up, we develop an adversarially robust algorithm for synchronization in SO(2), which we call Trimmed Averaging Synchronization (TAS). In Theorem 3, under a generic condition on the measurement graph G, which we call the “well-connectedness”, and proper initialization, we show that it can tolerate a fraction of outliers per node that is bounded above by 1/4. We further prove that it converges linearly for fully connected observation graphs.

2. To extend this result to SO(D), we develop a new algorithm that we call Depth Descent Synchronization (DDS) based on Tukey depth in the tangent space of SO(D). To our knowledge, this is the first application of a manifold version of Tukey depth in an applied setting.

3. Assuming well-connectedness and good initialization, the DDS algorithm exactly recovers an underlying signal in the presence of a significant amount of adversarial outliers. This result is given in Theorem 6 and is the first guarantee of robustness to adversarial corruption for a multiple rotation averaging algorithm. This result extends elegantly to sparse random graphs, where we show that it achieves the information theoretic rate with respect to graph sparsity for Erd¨os-R´enyi observation graphs in Section 4.5.

4. We show that this algorithm can be efficiently implemented for SO(3). We run baseline experiments that show it performs competitively on some baseline synthetic data for SO(3) synchronization, which arises in the important application of Structure from Motion. While we carefully review related work later in Section 2, we emphasize here our contributions in terms

of the most relevant works. Again, we emphasize that we study an efficient, nonconvex algorithm for rotation

synchronization that has guarantees for adversarial outliers.

A robustness result for SO(D) synchronization based on semidefinite programming is given in Wang and Singer [2013]. However, the probabilistic model in this work is very restrictive (see details in Section 2), and the proposed method is slow for large n.

Huang et al. [2019] use a truncated least squares framework to do robust rotation synchronization. The truncated least squares framework was originally proposed by Huang et al. [2017] in the context of translation synchronization, and sequentially filters those pairwise measurements that are furthest from the current estimated pairwise measurements. Following this, Huang et al. [2019] show that this can be extended to rotation synchronization. Under an appropriate choice of a thresholding parameter, they demonstrate that it is possible to exactly recover the ground truth in the presence of outliers if a certain generic condition is satisfied. This method has two downsides. First, one must repeatedly compute the lowest eigenvectors of the graph connection Laplacian, which has a higher memory cost for dense graphs than DDS and may also have issues of numerical stability. Second, the bound on the fraction of outliers that they present is not clear in general settings since it depends on the -norm of the pseudoinverse of the graph connection Laplacian. This quantity is hard to control, and so it is unclear how their bound scales with various parameters as well as what it would state for arbitrary outliers. On the other hand, Huang et al. [2017] guarantees success of the truncated least squares method for adversarial outliers when considering translation synchronization in 1-dimension, but we do not see how to extend these to the problem of rotation synchronization.

The only existing result for adversarial robust synchronization was recently given by Lerman and Shi [2019]. They propose a general method, called Cycle-Edge Message Passing (CEMP), for group synchronization that is guaranteed to be robust to adversarial corruption. However, their method uses information from 3-cycles, that is, triangles in the graph, and so it is less efficient than typical multiple rotation averaging schemes by an order of n (the ratio between the number of triangles and the number of edges in the graph). Beyond this, multiple rotation averaging algorithms are also attractive because they are more memory efficient. A caveat to our current work is that our new method is not as efficient as previous multiple rotation averaging algorithms. In particular, we require the computation of a depth-based estimator, and so each rotation update has complexity for SO(3), where is the number of neighbors of the node to be updated. Therefore, we do not claim that DDS is uniformly most efficient for adversarially robust synchronization in terms of time complexity, although it is more computationally efficient than CEMP for very sparse graphs. However, our depth descent method does still have the benefit of more efficient memory complexity than Lerman and Shi [2019].

Beyond computational efficiency, the theoretical guarantees are also different: we bound the ratio of corrupted edges, whereas Lerman and Shi [2019] bound the ratio of corrupted triangles. The method of Lerman and Shi [2019] degrades with extremely sparse graphs, due to the fact that they need to ensure that it contains sufficiently many triangles, whereas we require a well-connectedness condition of the graph G that extends to extremely sparse cases. Other well-connectedness conditions appear, sometimes implicitly, in works minimizing energy functions [Wang and Singer, 2013, Hand et al., 2018, Lerman et al., 2018, Huang et al., 2017]. Finally, CEMP is tailored to finding the corruption level in the graph, but it does not have complete guarantees for the recovery of the underlying rotations themselves.

1.2 Notation

Bold uppercase letters will be used to denote matrices, while bold lowercase letters will be used to denote vectors. For a set X in a Hilbert space, the convex hull is denoted by conv(X). The sphere in is written as . For an ordered tuple of n rotations, SO(D), we write .

1.3 Structure of the Rest of the Paper

We now outline the structure of this paper. First, we review related work in Section 2. We then discuss the specific case of synchronization over SO(2) in Section 3 and give a simple, adversarially robust algorithm, called Trimmed Averaging Synchronization. Following this, in Section 4 we develop our novel Depth Descent Synchronization algorithm, which utilizes Tukey depth to yield robust updates. Coupled with this, we develop its theoretical guarantees of robustness and convergence. Section 5 presents some baseline experiments demonstrating the practicality of our proposed method.

2 Related Work

Interest in the synchronization problem has grown in recent years due to applications in computer vision and image processing, such as SfM [Govindu, 2004, Martinec and Pajdla, 2007, Arie-Nachimson et al., 2012, Hartley et al., 2013, Tron and Vidal, 2009, Ozyesil et al., 2015, Boumal, 2016], cryo-electron microscopy [Wang and Singer, 2013] and Simultaneous Localization And Mapping (SLAM) [Rosen et al., 2019].

The most common formulation for solving rotation and other group synchronization problems involve a non-convex least squares formulation that can be addressed by spectral methods [Singer, 2011] or semidefinite relaxation [Bandeira et al., 2017]. On the other hand, the work of Wang and Singer [2013] uses a semidefinite relaxation of a least absolute deviations formulation to obtain a robust estimate for SO(d) synchronization. They prove recovery for the pure optimizer of this convex problem in a restricted setting. In this setting the full graph is complete, every edge is corrupted with a certain probability p (in the case of SO(2), they require that and for SO(3) they require ) and the corrupted group ratios are distributed uniformly on SO(D). In practice, they advocate using an alternating direction augmented Lagrangian to solve their optimization problem. One may also use methods like the Burer-Monteiro formulation [Boumal et al., 2018], although current guarantees require the rank of the semidefinite program to be at least , which results in storing iterates much larger than the underlying signal that is a vector of size n [Waldspurger and Waters, 2018]. Another recent work tries to leverage a low-rank plus sparse decomposition for robust synchronization [Arrigoni et al., 2018]. However, this work does not contain robustness guarantees.

2.1 Robust Synchronization Methods

For a survey of robust rotation synchronization, see Tron et al. [2016]. Some early works on rotation synchronization include Govindu [2001, 2006], Martinec and Pajdla [2007], with later follow-up works by Hartley et al. [2013], Chatterjee and Govindu [2013, 2017]. The later works discuss some least absolute deviations based approaches to multiple rotation averaging which we will discuss later. For theoretical foundations on averaging rotations, one can consult Moakher [2002]. For foundational work on optimization on the manifold SO(d), see [Taylor and Kriegman, 1994, Arora, 2009].

Robust multiple rotation averaging algorithms were studied in Hartley et al. [2011] and Hartley et al. [2013]. There, the authors used a least absolute deviations formulation over SO(3) using successive averaging with a Weiszfeld algorithm and a gradient-based algorithm. The authors also give a counterexample that shows that local minima exist and thus the global minimum of their problem may be hard to find in general. However, the authors give no guarantee of convergence or recovery in any setting. Also, we have found that this method may suffer from suboptimal fixed points in general, which we analyze in more detail in a forthcoming work (see also Section 5 of Maunu and Lerman [2020]).

2.2 Adversarially Robust Synchronization

A work that does contain guarantees is that of Lerman and Shi [2019], which considers a message-passing procedure that incorporates consistent information from cycles. This algorithm was guaranteed to be robust for the adversarial setting and applies to any compact group. Although its adversarial setting is very general, it requires a bound on the ratio of corrupted cycles per edge and not on the ratio of corrupted edges. Furthermore, the use of cycles results in a potentially more computationally intensive algorithm than the one in this work that only uses pairwise information.

Guarantees for exact recovery with adversarial, or partially adversarial, corruption appear in few other synchronization problems. The adversarial corruption in synchronization is very special since there is a single choice to corrupt a group ratio. Under a special probabilistic model, Bandeira [2018] established asymptotic and probabilistic exact recovery for the SDP relaxation of the least squares energy function of synchronization. The model assumes that G([n], E) is an Erd¨os-R´enyi graph with probability p of connection, edges are randomly corrupted with probability q and . Hand et al. [2018] and Lerman et al. [2018] established asymptotic exact recovery under a probabilistic model for solutions of the different problem of location recovery from pairwise orientations. In this problem ratios of the Euclidean group are normalized to the sphere. They assume an i.i.d. Gaussian generative probabilistic model for the ground truth locations and an Erd¨os-R´enyi model for the graph G([n], E) and further bounded the ratio of maximal degree of over n. In both works, these bounds approach zero as n approaches infinity, unlike the constant bound of this work.

Robust permutation synchronization was studied by Huang and Guibas [2013], where they give a maximum corruption percentage of 1/4 in the case of fully connected observation graphs. Huang et al. [2017] analyzed a robust algorithm for one-dimensional translation synchronization that uses a truncated least squares formulation. They show that their method achieves a maximum corruption percentage of 1/6 for fully connected graphs. However, their generic condition is rather complicated and in order to interpret it they must make the fully connected assumption and they also restrict the maximal degree of . In both [Huang and Guibas, 2013, Huang et al., 2017], the bounds degrade for sparser observation graphs.

The results of Huang et al. [2017] were extended to the problem of rotation synchronization in Huang et al. [2019]. Here, the authors show that a truncated least squares formulation for rotation synchronization can recover the underlying signal assuming a generic bound that includes the -norm of the pseudoinverse of the graph connection Laplacian. However, this bound is hard to interpret in general.

2.3 Synchronization in Other Settings

In contrast to corrupted settings, some works have considered estimation in a noisy setting. Bandeira et al. [2017] study maximum likelihood estimation of the angular synchronization problem and show that the associated semidefinite relaxation is tight. More recently, message-passing algorithms have been used for maximum likelihood estimation in the Gaussian setting [Perry et al., 2018]. Other recent results leverage multiple phases to obtain better results in noisy settings [Gao and Zhao, 2019]. Minimax estimation under the squared loss over SO(2) is considered in Gao and Zhang [2020], and optimization methods for the squared error over subgroups over the orthogonal group are considered in Liu et al. [2020].

Another related problem over SO(2) is the synchronization of Kuramoto oscillators. In particular, a primary question is the minimal graph connectivity requirement ensuring that the energy landscape is nice. The weakest known requirement is that every vertex is connected to at least 0.7889n other vertices [Lu and Steinerberger, 2019]. The conjectured bound is 0.75n, which is reminiscent of the bound we require for local recovery with adversarial corruption over SO(2).

2.4 Nonconvex Optimization

Optimization problems cast over SO(D) are usually nonconvex. We can think of our method as attempting to solve a nonconvex problem over SO(D) as well. Therefore, our work also fits in with the growing body of work analyzing nonconvex energy landscapes and procedures [Dauphin et al., 2014, Hardt, 2014, Jain et al., 2014, Netrapalli et al., 2014, Yi et al., 2016, Zhang and Yang, 2018, Ge et al., 2015, Lee et al., 2016, Arora et al., 2015, Mei et al., 2018, Ge et al., 2016, Boumal, 2016, Sun et al., 2015b,a, Lerman and Maunu, 2017, Cherapanamjeri et al., 2017, Ma et al., 2018, Maunu et al., 2019].

2.5 Tukey Depth

Finally, we appeal to tangent space depth, that is, using Tukey depth [Tukey, 1974] in the tangent space of the manifold SO(D), to create a provable robust method. Tangent space depth, for a general manifold, first appeared in Mizera [2002], where the author proves existence and depth bounds for maximum tangent depth estimators. Earlier work on Tukey, or halfspace, depth includes Rado [1946], which proves a depth lower bound for general measures, and Danzer et al. [1963], which discusses the relation to Helly’s Theorem. More recently, the classical reference of Donoho and Gasko [1992] proves bounds on the maximum depth achieved in a dataset under ellipticity conditions. Computation of depth contours was considered in Liu [2017], Hammer et al. [2020]. Recently, an interesting connection between depth estimators and generative adversarial networks has been exhibited [Gao et al., 2018], which may perhaps lead to more computationally efficient estimators.

2.6 Notions of Robustness

We finish by clarifying our setting in the context of robustness. In order to quantify our notion of robustness, we introduce the following terminology. Recall that we have an underlying graph G([n], E) corresponding to the pairwise measurements, where E is partitioned into an inlier set, , and outlier set . For any , we define its neighborhood as well as its inlier and outlier neighborhoods as

We will denote by the maximum percentage of outliers per node. That is, is the maximum of over all , where throughout the rest of the paper denotes the number of points in a set and is the degree of node .

The following notion of recovery threshold is related to the notion of a breakdown point in robust statistics. However, our goal is somewhat different, since we desire an exact estimator rather than an approximation, as is typically considered for classical breakdown points. This is similar to the notion of RSR breakdown point given in Section 1.1 of Maunu and Lerman [2019].

Definition 1 (Recovery Threshold) The recovery threshold of a robust rotation synchronization algorithm is the largest value of such that the algorithm outputs an estimator that satisfies (20).

The simplest information-theoretic bound for the recovery threshold is . Indeed, if , then an adversary could easily choose to have a subgraph that dominates with a consistent set of measurements for an alternative signal . That is, the observations would be

If the adversary chooses the partition and properly, then one could easily think that is the true underlying signal. For our method, we obtain recovery thresholds for that are smaller than 1/2.

On the other hand, the information theoretic bound may be much higher in special models. For example, suppose that G([n], E) is an Erd¨os-R´enyi graph with parameter p. Suppose further that each edge in E is corrupted independently with probability q, and the corrupted measurements are i.i.d. uniform on SO(D). Wang and Singer [2013] call this the uniform corruption model. Then, Singer [2011] and Chen et al. [2016] established the following information theoretic threshold for the rotation synchronization problem:

Notice that one needs to ensure that the underlying Erd¨os-R´enyi graph is connected. For fixed p, notice that one can take q arbitrarily close to 1 (and so is then very close to 1) as long as n is sufficiently large. We later discuss how our main results extend to this model in Section 4.5, where we show that the DDS algorithm achieves optimal recovery rates with respect to p (i.e., it can tolerate extremely sparse observation graphs).

3 An Adversarially Robust Algorithm for SO(2) Synchronization

To begin to build motivation for our method, we consider the case of synchronization over SO(2), where the method becomes considerably simpler due to its 1-dimensional manifold structure. First, Section 3.1 gives definitions of some geometrical objects on SO(2), which we identify with for mathematical convenience. Then, in Section 3.2, we define our SO(2) synchronization method, which we call Trimmed Averaging Synchronization (TAS) and is a special case of our later DDS algorithm. Finally, Section 3.3 discusses the initialization and well-connectedness assumptions and uses these to give an adversarial recovery guarantee for the TAS algorithm.

3.1 The Geometry of

We define a few structures related to the manifold . The tangent space can be identified with R. Let be a unit direction in the tangent space at (i.e., ). The geodesic originating at in the direction v is given by . The exponential map and inverse exponential map (logarithm map) on this 1-dimensional manifold are given by

Finally, the cut-locus of a point is defined as the set of points for which there is not a unique geodesic from z. It is not hard to see that this is given by .

Recall that we seek an underlying signal . Notice that its elements, for , can be parameterized by angles, . This angle is also known as the argument of the complex number, and so we write , where . The angular, or geodesic, distance between and is

For later reference, we plot the extended angular distance function in Figure 1.

Fig. 1 The angular distance function

Recall that if , then the edge measurement is correct, that is, , where is defined analogously to (1). For , the measurement is assumed to be an arbitrary element of . From the measurements is only identified up to a global rotation, due to the ambiguity that , and so generates the same pairwise measurements as .

To deal with this ambiguity, the following function will be used to demonstrate convergence of a sequence to :

This is again nothing but a function that measures the maximum distance between normalization products. Notice that for some rotation . Therefore, convergence of to zero indicates convergence of z to , and an algorithm exactly recovers iff .

3.2 Trimmed Averaging Synchronization

A natural way to solve the rotation synchronization problem involves energy minimization. The simplest strategy [Govindu, 2001, Martinec and Pajdla, 2007] attempts to minimize

A coordinate descent strategy to solve (8) involves updating by solving

Applying this sequentially over the indices j = 1, . . . , n results in the multiple rotation averaging (MRA) discussed in more detail in Hartley et al. [2013]. While such coordinate descent strategies generally lack coordination across all objects like global synchronization methods, they lead to algorithms that are more memory efficient and that can be decentralized easily.

Another simple way to robustify (9) is to select the average of all points that fall within a trimmed set, which results in a trimmed averaging procedure. To account for the 1-dimensional manifold structure of SO(2), we propose to do this trimming in the tangent space, which yields the TAS algorithm. An illustration of one trimmed averaging step is given in Figure 2.

For a discrete and a fraction 0 < p < 1, we write the pth quantile of X by . It is convenient to define the trimming operator

We also denote the average of a dataset by ave(X). That is, ave.

Due to the simplified geometry of SO(2), we will show in the following that using this trimmed rotation averaging scheme converges to the underlying solution linearly when the percentage of outliers is at most when G is fully connected. In the case where (G, [n]) is not fully connected, the result is a corollary of our later Theorem 6 under a connectedness assumption on (G, [n]). We note that this fraction is similar to the one given in Lerman and Shi [2019], although there the bound is formulated for corrupted triangles in the graph.

For clarity, we give the TAS algorithm in Algorithm 1. To allow for damping of the updates, we include the step-size parameter . When , we refer to the algorithm as Damped TAS or DTAS for short.

Fig. 2 Illustration of the TAS algorithm at a fixed step and a fixed node j. The measurement is . After projecting into the tangent space, the outermost points in red are filtered, and the green points are averaged. This trimmed average is then projected back to the manifold.

3.3 Recovery Guarantees for DTAS

We begin by discussing the assumptions that will make a synchronization problem tractable for TAS. The first assumption we require is a good initialization, which is common in the analysis of such nonconvex methods. Assumption 1 The initial set of rotations lies within a -neighborhood of : that is, there exists a such that

Note that this is equivalent to the assumption that .

While corruptions are arbitrary, we require an assumption on the underlying graph (G, [n]). It essentially requires that the graph is sufficiently well connected.

Assumption 2 (-Well-connectedness condition) For a fixed , for any such that n/2, there exists an index such that

In words, this assumption requires that inside any set of at most n/2 nodes, there is a node that is connected to a significant number of nodes outside this set. The condition in this assumption is equivalent to requiring that

We include a discussion of this condition and its connection with random graphs, conductance, and expanders later in Section 4.5.

For the case of Assumption 2 with , we just call the graph well-connected. While fully connected graphs satisfy this condition, there exist many more examples of graphs satisfying it with as well, and we give some examples of some simple graphs that meet this assumption in Figure 3.

The following theorem gives the main recovery result for the DTAS algorithm. While the algorithm converges linearly, the rate we derive depends on n and is worst-case. In the few simulations we have run, the algorithm seems to converge at a faster rate that merits more study. Also, a more complicated proof may yield linear convergence in the general case of well-connected G, but for sake of brevity, we only prove it for the fully connected case.

Fig. 3 Examples of graphs that satisfy the well-connectedness condition for n = 4, 5 and 6. In each of these graphs, (12) is satisfied with : that is, all subsets J of size at most n/2, there exists a node

Theorem 3 Suppose that , Assumption 1 holds, G satisfies Assumption 2 with parameter , and is the sequence generated by DTAS, for and . Then, , and the algorithm exactly recovers . Furthermore, in the case where G is fully connected, the DTAS algorithm linearly converges to .

Proof The proof of convergence under well-connectedness follows from the fact that, when updating index j at iteration t, the selection rule defined by choosing the trimmed average yields a point in the interior of Log. The proof then follows from the proof of Theorem 6.

To see linear convergence in the fully connected case, we prove that all normalization products contract during each pass over the dataset. Denote

These are the translation of the normalization products to the angular coordinates of the points . Also, define the sets

In this proof, we will write as a shorthand. Notice that we must have

unless , in which case for some and z(t) recovers .

For the update with respect to index j, all good pairwise measurements must lie in . Since there are at least good measurements, all trimmed points must lie in this interval as well. Therefore, for all , after updating we have

Using this fact, we will now show that the indices in must move inwards. Indeed, since n/2, we must have for all . Therefore, for each trimmed mean for , we have

Thus,

After the coordinate update, we have that for all ,

After repeating this argument for all j over the course of an epoch, or pass over all indices j = 1, . . . , n, this yields that

as long as . The width of this interval is , which yields the desired result.

4 Robust Synchronization over SO(D)

This section presents a novel algorithm for robust synchronization over the rotation group, SO(D). We assume a fixed observation graph G that encodes which pairwise rotations we observe. The pairwise rotations are written as SO(D), where the good edges have the associated observation , and the bad edges are arbitrarily chosen from G and have arbitrarily corrupted measurements.

To proceed, we must make clear our goal for the synchronization problem, since there is a well known ambiguity, similar to the one encountered for SO(2) synchronization in Section 3.1 – we can only recover up to right multiplication by an element of SO(D). This is because, after this multiplication, one arrives at the same pairwise measurements in (1). This is a form of rotational symmetry in the nonconvex problem, which may be leveraged to develop tractable nonconvex programs [Zhang et al., 2020]. Exactly recovering the ground truth measurements SOup to right multiplication by SO(D) is equivalent to finding a set of rotations such that

for some SO(D). We refer to the set of rotations , as normalization products since, when , they reveal the normalization factor that multiplies each element of from the right. One could extend this discussion to the case of approximate recovery by requiring that the normalization products are approximately equal.

With our goal now in mind, we begin in Section 4.1 by discussing the geometry of the manifold SO(D) and presenting some basic geometric results that will be used in our main theorem. Following this, Section 4.2 reviews the concept of halfspace depth from robust statistics, which will be the core tool that we use to construct our algorithm. In Section 4.3, we give outline the DDS algorithm. Then, in Section 4.4, we give the theoretical guarantees that constitute the main innovations of this work. We finish in Section 4.5 with a discussion of the assumptions we make in our theorem.

4.1 The Manifold Structure of SO(D)

The rotation synchronization problem is obviously a robust recovery problem on the product Riemannian manifold SO. Therefore, in the following, we freely use concepts from Riemannian geometry, and specifically those concepts related to the geometry of SO(D).

4.1.1 Riemannian Geometry of SO(D)

The set of rotations SO(D) is a -dimensional Lie group that has a natural Riemannian structure. The bi-invariant distance metric d : SOSOis given by

where log is the matrix logarithm. The corresponding Lie algebra is so(D), the set of skew-symmetric matrices. The tangent space of SO(D) at SO(D) is

Notice that every tangent vector SO(D) has a corresponding element of so(D), which we denote by . The corresponding Riemannian metric (which is an inner product and thus should not be confused with a distance metric) for SO(D) is given by TrTr. Equipped with this metric, SO(D) is a Riemannian manifold with nonnegative sectional curvature. An open ball with respect to the metric d is written as B(R, r), where the radius is r and the center is R. Its closure and boundary are B(R, r) and , respectively.

The exponential map is given by

where exp is the matrix exponential. The logarithmic map is the inverse of this:

The geodesic between SO(D) is written as ExpLog, for . In the following, we use the notation for a halfspace of SO(D),

4.1.2 Local Convexity Properties of SO(D)

We continue by recalling some local convexity properties of manifolds like SO(D). The following result on the convexity of sufficiently small balls is standard in the literature [Karcher, 1977, Afsari, 2009, Petersen, 2016].

Theorem 4 [Convexity of small balls] In a closed ball SO(D) with , the squared distance metric is strictly convex. This implies, in particular, that for all , for all .

Note that the closed ball in the previous theorem has the property that the interior of any nonconstant geodesic lies strictly in the interior of the ball. The following result is also readily apparent. It states that, for boundary points on a sufficiently small ball in SO(D), all interior directions are contained in a halfspace.

Corollary 1 Let B be a ball on SO(D) with radius and . Then, there is a halfspace SO(D) such that Log.

An important component of our later theoretical results relies on showing that the radius of the smallest ball containing a set of rotations shrinks. The final lemma of this section states that if a discrete set of rotations R is contained in a ball, and if half of the ball contains no boundary measurements, then the set R is actually contained in a ball of smaller radius. Thus, for such a set of rotations, this gives us a sufficient condition for decreasing the radius of the smallest containing ball.

Then, R is contained in a ball with radius less than r.

Proof Let c(t) be the geodesic Exp. We claim that, for t sufficiently small, , which is an open ball. Heuristically, one should expect this to be true, since the closed halfspace contains no boundary points, and so moving the center a small amount in the v direction keeps all points within the ball.

By the first order approximation to and since Log, we have

for t sufficiently small. On the other hand, since

the distance to C over all Expis bounded away from r. By continuity of d(c(t), S) for all , this implies that there is an such that

Putting (25) and (26) together implies that a small shift of the ball results in a new center such that r. In turn, this means that all points of R lie in a ball with

4.2 Tukey Depth and its Properties

We will use the concept of Tukey depth to determine descent directions on the manifold SO, although other notions of depth could potentially be used as well (see Ch. 58 of Toth et al. [2017] for a discussion of different notions of depth). In Euclidean space, the Tukey depth of a point in a dataset X = is given by

The depth is therefore the minimum number of points contained in any halfspace that has x in its separating hyperplane. A natural robust estimator is then the point of maximum depth, which is also called the Tukey median. The -depth level set for is defined by

This level set is convex and compact, and its boundary is made up of hyperplanes defined by sets of D points [Liu et al., 2019]. This function will be used in the construction of our algorithm.

As an example, consider the 1-dimensional dataset . Here, the formulation of depth is quite simple:

With this in mind, the -depth level set is where denotes the ith order statistic. The Tukey median in this case is just the median.

We recall the following theorem, which bounds the maximum possible depth within a general dataset. Notice that, in particular, this guarantees that the depth level set is nonempty for all .

Proposition 1 (Rado [1946]) Suppose that X is a set of n points. Then, the maximum depth in X is bounded below by .

A particularly useful property of Tukey depth is that it is affine equivariant, that is, it is stable under affine transformations.

Lemma 2 (Donoho and Gasko [1992])

This implies, in particular, that things behave nicely if we change the inner product on .

With this property, we can map between the depth regions in and by

This affine equivariance implies, in particular, that Proposition 1 extends to datasets in tangent spaces of manifolds.

We finish with a simple lemma on depth level sets that will be the key to the robustness guarantee for our later algorithm. In simple words, this lemma guarantees sufficient conditions for a depth level set to contain a nonzero value in a dataset containing many zeros. In the following, datasets are represented by multisets and may contain duplicate points. For a halfspace , we write its separating hyperplane as L(0, v).

Lemma 3 Suppose that we have a dataset and a subset that satisfies the properties i) , ii) There exists closed halfspace such that , and iii) the only points of Y in L(0, v) are 0. Then, {0}). Beyond this, if , then .

Proof It is obvious by the properties of depth that , since any point on the boundary of conv(Y) has depth less than . By Proposition 1, there is a point of depth at least , and so is nonempty.

Suppose that less than points in Y are zero, that is, . We claim that . Define the auxiliary set , that is, Z removes the 0 values in Y from X. Since , we have that . Within Z, there is a point of depth at least m/(D + 1). Further, since and , we have

which implies that must lie in H(0, v). On the other hand, since , we have that

which means that is a point of depth at least in

4.3 Depth Descent Synchronization

We now use the results of the previous sections to derive the DDS algorithm.

4.3.1 The General DDS Algorithm

We assume a selection rule on convex, compact subsets of SO(D) for all SO(D). In particular, for our theorem, we assume that this selection rule chooses a nonzero point from the convex set if possible, and otherwise outputs zero. To select this point, one selection rule could be to take a point uniformly at random, or another could be to take the center of mass. As our theorem makes clear, the choice of this selection rule does not affect the exact recovery result, but it may change convergence rates. Since we do not give quantitative convergence rates in this work, we leave this choice as arbitrary.

In any case, given a selection rule over such subsets of SO(d), suppose our estimated rotations at time t are (R(t)). Our update direction at this time for index j = t mod n is defined by

In words, the direction is a direction in the tangent space at that is sufficiently deep with respect to the neighbor measurements {Log. Given , the algorithm updates

for a chosen step size . Our theory below restricts this step size according to Theorem 4.2 of Afsari et al. [2013]: in the case of D = 2 or 3, one can take , while for D > 3 the upper bound is more restrictive (the reader can consult the discussion in Afsari et al. [2013] for a more thorough discussion of the bound). For sake of clarity, we write the full DDS algorithm in Algorithm 2.

One benefit of DDS is that there is no need to tune the step size, which is due to the local convexity properties of the manifold. As we outline in our main theorem, the step size can be directly selected from the guidance of Afsari et al. [2013].

As for computational complexity, at least in low-dimensions, depth regions can be calculated efficiently for small datasets [Liu et al., 2019]. In particular, the most straightforward algorithm involves an exhaustive search over hyperplanes spanned by D-subsets of X that cut off points in X. The time complexity of this method for is , and so could be used for moderately sized datasets. Translating this to our problem in SO(3), the time complexity for updating is , and so we see that this method is efficient for sparser graphs. In an Erd¨os-R´enyi model, the complexity to update all n rotations is , where p is the Erd¨os-R´enyi parameter.

As discussed in the introduction, the time complexity of the DDS algorithm is not necessarily more efficient than that of Lerman and Shi [2019]. Indeed, the complexity of their message-passing algorithm is for a single update to all rotations, while for our method it is for SO(3) (and much larger for higher dimensions). Therefore, DDS has better complexity for sparse graphs, while Lerman and Shi [2019] has better complexity for dense graphs. On the other hand, our complexity is uniformly better in SO(2), where depth contours can be easily found in O(n log(n)) by sorting, and it thus takes time to update all nodes. Finally, we also note that our method has better scaling in terms of memory usage: the multiple rotation averaging scheme takes memory while the message-passing scheme takes .

4.3.2 The Approximate DDS Algorithm

To make the DDS algorithm more computationally efficient, we employ a few strategies to develop an approximate DDS algorithm.

First, so that we do not need to resort to computing full depth contours, we instead take the average of the deepest points in the set {Logas our update direction at each iteration.

Second, to avoid computation of the full depth of every point in this set, we instead use an approximation of depth based on sampling. Suppose that we wish to approximate the depth of the vectors in with respect to Y. We can sample a set of vectors . Then, for each , the approximation of depth, , is

Notice that replaces the minimum over in (27) by the minimum over the discrete set of vectors . For (which corresponds to the tangent space for SO(3)), computation of the full depth for all points would take time, where for each one would need to search over all planes defined by triplets , for distinct i, j, k. On the other hand, the computation of the approximate depth using (34) takes and can be more efficiently implemented due to the fact that the are shared between all .

The approximate DDS algorithm is given in Algorithm 3. Here, we use the notation SO(D) for the set of all unit vectors in SO(D).

4.4 Exact Recovery for DDS

For many nonconvex methods, good initialization is quite important [Li et al., 2019, Maunu et al., 2019, Qu et al., 2019, Chi et al., 2019, Qu et al., 2020]. We only obtain a recovery result for DDS if we can initialize in a suitable neighborhood of .

Definition 2 A set of n rotations (R) lies within a -neighborhood of if the normalization products all lie in a ball of radius . That is, there exists a SO(D) such that

With this terminology, our main assumption is that that we can initialize in a -neighborhood of . This is the analog of Assumption 1 for SO(D).

Assumption 5 The initial set of rotations (R(0)) for our algorithm lies within a -neighborhood of .

As we discuss in Section 4.5, we believe that this assumption is not so restrictive, and we later give some intuition for how this might occur in a real scenario.

The following theorem constitutes the main theoretical result of this work. It states that, with proper initialization and well-connectedness, the recovery threshold of Algorithm 2 is . As examples when , in the case of SO(2), Theorem 6 yields a corruption threshold of . In the case of SO(3), Theorem 6 yields a corruption level of .

Theorem 6 Suppose that , Assumptions 5 and 2 hold, and is generated by (33) with . Further assume in the case of D = 2, 3 that , and in the case of D > 3 that is chosen according to Theorem 4.2 of Afsari et al. [2013]. Then, 0 for all j, k, and the DDS algorithm exactly recovers .

Proof (Proof of Theorem 6) To aid in the proof, we denote the smallest ball enclosing our normalization products as

The center of B(t) is C(t) and its radius is r(B(t)). Our goal will be to show that .

The proof of the theorem is broken into three parts. In the first part, we prove that the sequence remains in a nested sequence of balls. In the second part, we show that, after sufficiently many iterations, the radius of the smallest enclosing ball must shrink. We finish in the third part by appealing to a general convergence theorem for monotonic algorithms.

Part I: : First, we show that at time t, no matter which index is updated, the normalization products remain in B(t). This is true at t = 0, so assume that it is true at a time t. Let j = t mod n and consider the pairwise measurements in the tangent space at : for each , the corresponding point in the tangent space is given by Log. By assumption, we have that for all k. Since and ,

since the set in the right-hand side of the display contains more than a fraction of points. We can now apply Theorem 3.7 of Afsari et al. [2013]. This follows from the fact that the update direction is the gradient of the Frech´et mean function for a weighted combination of {Log(since it lies in the convex hull of of these points). More formally, letting , since conv({Log, there exist weights such that

Therefore, choosing the step size as in Afsari et al. [2013] implies that , and further that when . In turn, this implies that and, if , then as well (i.e., interior points cannot move to the boundary).

In the case of SO(3), Theorem 3.7 of Afsari et al. [2013] tells us that choosing suffices. The case of SO(D) for D > 3 is dealt with in a similar way using Theorem 4.2 of Afsari et al. [2013].

Part II: : We now must show that after sufficiently many iterations, the radius of B(t) strictly decreases. To this end, fix a time s. At this time, at least one normalization product must lie on the boundary . For convenience, define the index set J(s) of boundary rotations at time s by

We will show that there exists a such that at some future time ,

To this end, pick a direction w uniformly at random from SO(D) such that . This vector separates into two halfspaces, and thus partitions B(s) into two halves. One of these halves contains at most n/2 points Log, and we will denote the corresponding halfspace of SO(D) by H. Since the direction w is chosen uniformly at random, there are no points on the boundary of this halfspace with probability 1.

Let K(s) denote the set

that is, the set of indices corresponding to boundary normalization products at time s. At each time t = s+m for m > 0, if none of the normalization products , lie in , then set and we can apply Lemma 1 to yield (36).

Otherwise, by Assumption 2, there is at least one index such that is in and

which yields that and . Thus, for sufficiently small . Repeating this sequentially for all elements of K(s), there must exist a such that

We are left in a situation where , and there exists a w such that contains no boundary points. By appealing to Lemma 1, we know that lies in a ball of radius smaller than r(B(s)), and therefore r(B(s)) has a strictly monotonic subsequence.

Part III: Strict monotonicity implies convergence: By Mizera [2002], as long as 1), the point to set mapping

is non-empty and outer semicontinuous with respect to the empirical measure on

Therefore, the associated algorithm (33) is upper semicontinuous in the sense of Theorem 3.1 of Meyer [1976], and we obtain convergence of R(t) to a fixed point of (33).

We finish by examining fixed points of the algorithm (33). Suppose that R is a fixed point of this sequence, such that as defined by (32) is zero. The fixed point is characterized by for at least measurements k. By -well-connectedness, for all subsets J of size at most n/2, there is an index j such that , because otherwise Lemma 3 would yield a nonzero

update direction. Thus, there is an index such that . This implies that for all

4.5 Discussion of Assumptions

The -well-connectedness condition in Assumption 2 bears some similarity to the notions of conductance and graph expansion. From the perspective of graph theoretical results, we note that a sufficient condition for Assumption 2 with is for the conductance of the graph to be greater than or equal to 1/2. This follows from a simple pigeonhole argument. It is unclear if this condition holds for Erd¨os-R´enyi graphs. Indeed, if one uses Cheeger’s inequality, one would need the spectral gap to be greater than or equal to 1, but for Erd¨os-R´enyi graphs one only expects this gap to concentrate around 1 in practice [Hoffman et al., 2021].

A sufficient condition for (13) is for the conductance to be bounded below by , which can be achieved with high probability for any fixed by Erd¨os-R´enyi graphs when (see, for example, Hoff- man et al. [2021]), as well as expander graphs (see, for example, Friedman et al. [2003]).

Our result in Theorem 6 holds with high probability for the uniform corruption model discussed in Section 2.6 with and . This means that DDS achieves the information theoretically optimal rate with respect to p in this model. Indeed, we can read (4) as .

Assumption 5 requires that we initialize the DDS algorithm so that the normalization products lie in sufficiently small ball. This can be achieved in practice for cameras whose orientations lie close enough together. That is, suppose that all of the rotations lie in for all j, for some S and . Then, if the initial point for the DDS algorithm is chosen to be (I, . . . , I), then it is not hard to see that

which directly shows that Assumption 5 holds. Notice that is a large ball that essentially makes up half of the manifold SO(3), since the distance from any point to its cut-locus is . For example, if one considers reconstruction of an object from many images taken from points on a sphere that surrounds this object, then our requirement would essentially boil down to needing all of the images being taken from a single hemisphere.

We conjecture that one can weaken this initialization condition, to only require that neighboring normalization products are close to each other, but we leave weakening of this assumption to future work.

5 Empirical Evaluation

The algorithms we compare with are MRA and L1-MRA [Hartley et al., 2013], IRLS after L1-MRA initialization [Chatterjee and Govindu, 2013], LTS [Huang et al., 2019], CEMP [Lerman and Shi, 2019], and MPLS [Shi and Lerman, 2020]. Default parameters of all methods are used. For CEMP, the method computes the corruptions levels first to determine which edges are most corrupted. Then, using these corruption levels, it finds a minimum spanning tree, from which one can fix and then propagate from this to find the other rotations along this tree. For LTS, we implement the truncation step with parameter and run for 40 iterations. The approximate DDS algorithm is run for 40 epochs (or passes over the data, which means we take T = 40n) with a step size of and number of depth vectors m = 20.

We compute the distance between the estimated rotations and by first aligning them by solving