b

DiscoverSearch
About
My stuff
Depth Descent Synchronization in SO(D)\mathrm{SO}(D)
2020·arXiv
Abstract
Abstract

We give robust recovery results for synchronization on the rotation group, SO(D). In particular, we consider an adversarial corruption setting, where a limited percentage of the observations are arbitrarily corrupted. We develop a novel algorithm that exploits Tukey depth in the tangent space of SO(D). This algorithm, called Depth Descent Synchronization, exactly recovers the underlying rotations up to an outlier percentage of  1/(D(D − 1) + 2), which corresponds to 1/4 for SO(2) and 1/8 for SO(3). In the case of SO(2), we demonstrate that a variant of this algorithm converges linearly to the ground truth rotations. We implement this algorithm for the case of SO(3) and demonstrate that it performs competitively on baseline synthetic data.

Keywords Robust synchronization  ·Structure from motion  ·Nonconvex optimization  ·Multiple rotation averaging

The typical synchronization problem involves recovery of n group elements from pairwise measurements between them. It arises, for example, when solving the Structure from Motion (SfM) problem. One subproblem of SfM is to recover the three-dimensional orientations and positions of cameras from pairwise orientations and positions in relation to a scene [ ¨Ozyes¸il et al., 2017]. Here, we specifically focus on robust synchronization over SO(D), the rotation group for  RD. That is, given pairwise rotations in SO(D), some of which are corrupted, we aim to recover the original set of n rotations.

We assume n unknown, ground truth elements of SO(D), which we denote by  R⋆1, . . . , R⋆n. We form a graph G([n], E), where [n] := {1, . . . , n} indexes the n unknown elements and E designates the edges for which measurements of relative rotations are taken. For each  jk ∈ E, we are provided with the measurement

image

We can think of  R⋆jkin the following way: If we are oriented in the coordinate system with respect to node k, then  R⋆jkrotates our coordinate system into the coordinate system we would see if we were sitting at node j. This synchronization formulation extends to any given group, where one wishes to recover  (g1, . . . , gn), an n-tuple of elements in the group, given measurements of the group ratios  gig−1j , i, j = 1, . . . , n.

In reality, we cannot hope to exactly measure all the pairwise rotations in (1). In many real systems, both noisy and corrupted measurements occur: our focus here is on adversarially corrupted measurements. That is, within the measurement graph G, the corruption model is assumed to be fully adversarial. Our model is specified by partitioning the measured data into two parts:

1. We observe corrupted (or “bad”) edges  Eb ⊂ E, where all edges in  Ebhave a corresponding arbitrary corruption. The adversary is allowed to choose  Eb(and thus may to some degree influence the connectivity of  E \ Eb) as well as the corrupted values  Rjkfor  jk ∈ Eb. For each node, the adversary is only allowed to corrupt a limited fraction of edges.

2. The rest of the observed edges are uncorrupted (or “good”) edges  Eg = E \ Eb, where each edge in  Eghas an associated measurement given by (1). Theoretically guaranteed methods for robust synchronization are still lacking, especially in adversarial

and nonconvex settings. The development of these methods is important because in practice measurements are

usually quite corrupted, especially in applied problems like Structure from Motion [ ¨Ozyes¸il et al., 2017]. The results we establish here are concerned with exact recovery. That is, given a set of corrupted measurements, we wish to exactly recover  R⋆1, . . . , R⋆n. We will show that this is possible for a nonconvex method even in the presence of a significant amount of arbitrary corruption. Our method falls into the class of multiple rotation averaging algorithms [Govindu, 2004, Martinec and

Pajdla, 2007, Hartley et al., 2013]. These methods are effectively coordinate descent algorithms, which are

present highly efficient algorithms for nonconvex programs that are notoriously hard to analyze. While their analysis is challenging, it is imperative to develop a theoretical understanding of these methods and their robust counterparts [Hartley et al., 2011, Chatterjee and Govindu, 2017]. Moreover, as we discuss later, there are few robustness guarantees for group synchronization with adversarial corruption. Among the limited guarantees, none cover our model, and we thus make a significant contribution to this area. This work is also of general appeal to the nonconvex optimization community since we are able to prove convergence results in the complex nonconvex landscape of robust multiple rotation averaging. Furthermore, some energy landscapes associated with this problem exhibit many local minima and spurious fixed points, which we are able to avoid with our new method.

1.1 Contributions of This Work

The main contributions of this work follow.

1. As a warm-up, we develop an adversarially robust algorithm for synchronization in SO(2), which we call Trimmed Averaging Synchronization (TAS). In Theorem 3, under a generic condition on the measurement graph G, which we call the “well-connectedness”, and proper initialization, we show that it can tolerate a fraction of outliers per node that is bounded above by 1/4. We further prove that it converges linearly for fully connected observation graphs.

2. To extend this result to SO(D), we develop a new algorithm that we call Depth Descent Synchronization (DDS) based on Tukey depth in the tangent space of SO(D). To our knowledge, this is the first application of a manifold version of Tukey depth in an applied setting.

3. Assuming well-connectedness and good initialization, the DDS algorithm exactly recovers an underlying signal in the presence of a significant amount of adversarial outliers. This result is given in Theorem 6 and is the first guarantee of robustness to adversarial corruption for a multiple rotation averaging algorithm. This result extends elegantly to sparse random graphs, where we show that it achieves the information theoretic rate with respect to graph sparsity for Erd¨os-R´enyi observation graphs in Section 4.5.

4. We show that this algorithm can be efficiently implemented for SO(3). We run baseline experiments that show it performs competitively on some baseline synthetic data for SO(3) synchronization, which arises in the important application of Structure from Motion. While we carefully review related work later in Section 2, we emphasize here our contributions in terms

of the most relevant works. Again, we emphasize that we study an efficient, nonconvex algorithm for rotation

synchronization that has guarantees for adversarial outliers.

A robustness result for SO(D) synchronization based on semidefinite programming is given in Wang and Singer [2013]. However, the probabilistic model in this work is very restrictive (see details in Section 2), and the proposed method is slow for large n.

Huang et al. [2019] use a truncated least squares framework to do robust rotation synchronization. The truncated least squares framework was originally proposed by Huang et al. [2017] in the context of translation synchronization, and sequentially filters those pairwise measurements that are furthest from the current estimated pairwise measurements. Following this, Huang et al. [2019] show that this can be extended to rotation synchronization. Under an appropriate choice of a thresholding parameter, they demonstrate that it is possible to exactly recover the ground truth in the presence of outliers if a certain generic condition is satisfied. This method has two downsides. First, one must repeatedly compute the lowest eigenvectors of the graph connection Laplacian, which has a higher memory cost for dense graphs than DDS and may also have issues of numerical stability. Second, the bound on the fraction of outliers that they present is not clear in general settings since it depends on the  (1, ∞)-norm of the pseudoinverse of the graph connection Laplacian. This quantity is hard to control, and so it is unclear how their bound scales with various parameters as well as what it would state for arbitrary outliers. On the other hand, Huang et al. [2017] guarantees success of the truncated least squares method for adversarial outliers when considering translation synchronization in 1-dimension, but we do not see how to extend these to the problem of rotation synchronization.

The only existing result for adversarial robust synchronization was recently given by Lerman and Shi [2019]. They propose a general method, called Cycle-Edge Message Passing (CEMP), for group synchronization that is guaranteed to be robust to adversarial corruption. However, their method uses information from 3-cycles, that is, triangles in the graph, and so it is less efficient than typical multiple rotation averaging schemes by an order of n (the ratio between the number of triangles and the number of edges in the graph). Beyond this, multiple rotation averaging algorithms are also attractive because they are more memory efficient. A caveat to our current work is that our new method is not as efficient as previous multiple rotation averaging algorithms. In particular, we require the computation of a depth-based estimator, and so each rotation update has complexity  O(n3j log(nj))for SO(3), where  njis the number of neighbors of the node to be updated. Therefore, we do not claim that DDS is uniformly most efficient for adversarially robust synchronization in terms of time complexity, although it is more computationally efficient than CEMP for very sparse graphs. However, our depth descent method does still have the benefit of more efficient memory complexity than Lerman and Shi [2019].

Beyond computational efficiency, the theoretical guarantees are also different: we bound the ratio of corrupted edges, whereas Lerman and Shi [2019] bound the ratio of corrupted triangles. The method of Lerman and Shi [2019] degrades with extremely sparse graphs, due to the fact that they need to ensure that it contains sufficiently many triangles, whereas we require a well-connectedness condition of the graph G that extends to extremely sparse cases. Other well-connectedness conditions appear, sometimes implicitly, in works minimizing energy functions [Wang and Singer, 2013, Hand et al., 2018, Lerman et al., 2018, Huang et al., 2017]. Finally, CEMP is tailored to finding the corruption level in the graph, but it does not have complete guarantees for the recovery of the underlying rotations themselves.

1.2 Notation

Bold uppercase letters will be used to denote matrices, while bold lowercase letters will be used to denote vectors. For a set X in a Hilbert space, the convex hull is denoted by conv(X). The sphere in  RDis written as  SD−1. For an ordered tuple of n rotations,  R1, . . . , Rn ∈SO(D), we write  (R) = (R1, . . . , Rn).

1.3 Structure of the Rest of the Paper

We now outline the structure of this paper. First, we review related work in Section 2. We then discuss the specific case of synchronization over SO(2) in Section 3 and give a simple, adversarially robust algorithm, called Trimmed Averaging Synchronization. Following this, in Section 4 we develop our novel Depth Descent Synchronization algorithm, which utilizes Tukey depth to yield robust updates. Coupled with this, we develop its theoretical guarantees of robustness and convergence. Section 5 presents some baseline experiments demonstrating the practicality of our proposed method.

Interest in the synchronization problem has grown in recent years due to applications in computer vision and image processing, such as SfM [Govindu, 2004, Martinec and Pajdla, 2007, Arie-Nachimson et al., 2012, Hartley et al., 2013, Tron and Vidal, 2009, Ozyesil et al., 2015, Boumal, 2016], cryo-electron microscopy [Wang and Singer, 2013] and Simultaneous Localization And Mapping (SLAM) [Rosen et al., 2019].

The most common formulation for solving rotation and other group synchronization problems involve a non-convex least squares formulation that can be addressed by spectral methods [Singer, 2011] or semidefinite relaxation [Bandeira et al., 2017]. On the other hand, the work of Wang and Singer [2013] uses a semidefinite relaxation of a least absolute deviations formulation to obtain a robust estimate for SO(d) synchronization. They prove recovery for the pure optimizer of this convex problem in a restricted setting. In this setting the full graph is complete, every edge is corrupted with a certain probability p (in the case of SO(2), they require that  p ≤ 0.543and for SO(3) they require  p ≤ 0.5088) and the corrupted group ratios are distributed uniformly on SO(D). In practice, they advocate using an alternating direction augmented Lagrangian to solve their optimization problem. One may also use methods like the Burer-Monteiro formulation [Boumal et al., 2018], although current guarantees require the rank of the semidefinite program to be at least  O(√n), which results in storing iterates much larger than the underlying signal that is a vector of size n [Waldspurger and Waters, 2018]. Another recent work tries to leverage a low-rank plus sparse decomposition for robust synchronization [Arrigoni et al., 2018]. However, this work does not contain robustness guarantees.

2.1 Robust Synchronization Methods

For a survey of robust rotation synchronization, see Tron et al. [2016]. Some early works on rotation synchronization include Govindu [2001, 2006], Martinec and Pajdla [2007], with later follow-up works by Hartley et al. [2013], Chatterjee and Govindu [2013, 2017]. The later works discuss some least absolute deviations based approaches to multiple rotation averaging which we will discuss later. For theoretical foundations on averaging rotations, one can consult Moakher [2002]. For foundational work on optimization on the manifold SO(d), see [Taylor and Kriegman, 1994, Arora, 2009].

Robust multiple rotation averaging algorithms were studied in Hartley et al. [2011] and Hartley et al. [2013]. There, the authors used a least absolute deviations formulation over SO(3) using successive averaging with a Weiszfeld algorithm and a gradient-based algorithm. The authors also give a counterexample that shows that local minima exist and thus the global minimum of their problem may be hard to find in general. However, the authors give no guarantee of convergence or recovery in any setting. Also, we have found that this method may suffer from suboptimal fixed points in general, which we analyze in more detail in a forthcoming work (see also Section 5 of Maunu and Lerman [2020]).

2.2 Adversarially Robust Synchronization

A work that does contain guarantees is that of Lerman and Shi [2019], which considers a message-passing procedure that incorporates consistent information from cycles. This algorithm was guaranteed to be robust for the adversarial setting and applies to any compact group. Although its adversarial setting is very general, it requires a bound on the ratio of corrupted cycles per edge and not on the ratio of corrupted edges. Furthermore, the use of cycles results in a potentially more computationally intensive algorithm than the one in this work that only uses pairwise information.

Guarantees for exact recovery with adversarial, or partially adversarial, corruption appear in few other synchronization problems. The adversarial corruption in  Z2synchronization is very special since there is a single choice to corrupt a group ratio. Under a special probabilistic model, Bandeira [2018] established asymptotic and probabilistic exact recovery for the SDP relaxation of the least squares energy function of  Z2synchronization. The model assumes that G([n], E) is an Erd¨os-R´enyi graph with probability p of connection, edges are randomly corrupted with probability q and  p (1 − 2q)2 ≤ 0.5. Hand et al. [2018] and Lerman et al. [2018] established asymptotic exact recovery under a probabilistic model for solutions of the different problem of location recovery from pairwise orientations. In this problem ratios of the Euclidean group are normalized to the sphere. They assume an i.i.d. Gaussian generative probabilistic model for the ground truth locations and an Erd¨os-R´enyi model for the graph G([n], E) and further bounded the ratio of maximal degree of  G([n], Eb)over n. In both works, these bounds approach zero as n approaches infinity, unlike the constant bound of this work.

Robust permutation synchronization was studied by Huang and Guibas [2013], where they give a maximum corruption percentage of 1/4 in the case of fully connected observation graphs. Huang et al. [2017] analyzed a robust algorithm for one-dimensional translation synchronization that uses a truncated least squares formulation. They show that their method achieves a maximum corruption percentage of 1/6 for fully connected graphs. However, their generic condition is rather complicated and in order to interpret it they must make the fully connected assumption and they also restrict the maximal degree of  G([n], Eb). In both [Huang and Guibas, 2013, Huang et al., 2017], the bounds degrade for sparser observation graphs.

The results of Huang et al. [2017] were extended to the problem of rotation synchronization in Huang et al. [2019]. Here, the authors show that a truncated least squares formulation for rotation synchronization can recover the underlying signal assuming a generic bound that includes the  (1, ∞)-norm of the pseudoinverse of the graph connection Laplacian. However, this bound is hard to interpret in general.

2.3 Synchronization in Other Settings

In contrast to corrupted settings, some works have considered estimation in a noisy setting. Bandeira et al. [2017] study maximum likelihood estimation of the angular synchronization problem and show that the associated semidefinite relaxation is tight. More recently, message-passing algorithms have been used for maximum likelihood estimation in the Gaussian setting [Perry et al., 2018]. Other recent results leverage multiple phases to obtain better results in noisy settings [Gao and Zhao, 2019]. Minimax estimation under the squared loss over SO(2) is considered in Gao and Zhang [2020], and optimization methods for the squared error over subgroups over the orthogonal group are considered in Liu et al. [2020].

Another related problem over SO(2) is the synchronization of Kuramoto oscillators. In particular, a primary question is the minimal graph connectivity requirement ensuring that the energy landscape is nice. The weakest known requirement is that every vertex is connected to at least 0.7889n other vertices [Lu and Steinerberger, 2019]. The conjectured bound is 0.75n, which is reminiscent of the bound we require for local recovery with adversarial corruption over SO(2).

2.4 Nonconvex Optimization

Optimization problems cast over SO(D) are usually nonconvex. We can think of our method as attempting to solve a nonconvex problem over SO(D) as well. Therefore, our work also fits in with the growing body of work analyzing nonconvex energy landscapes and procedures [Dauphin et al., 2014, Hardt, 2014, Jain et al., 2014, Netrapalli et al., 2014, Yi et al., 2016, Zhang and Yang, 2018, Ge et al., 2015, Lee et al., 2016, Arora et al., 2015, Mei et al., 2018, Ge et al., 2016, Boumal, 2016, Sun et al., 2015b,a, Lerman and Maunu, 2017, Cherapanamjeri et al., 2017, Ma et al., 2018, Maunu et al., 2019].

2.5 Tukey Depth

Finally, we appeal to tangent space depth, that is, using Tukey depth [Tukey, 1974] in the tangent space of the manifold SO(D), to create a provable robust method. Tangent space depth, for a general manifold, first appeared in Mizera [2002], where the author proves existence and depth bounds for maximum tangent depth estimators. Earlier work on Tukey, or halfspace, depth includes Rado [1946], which proves a depth lower bound for general measures, and Danzer et al. [1963], which discusses the relation to Helly’s Theorem. More recently, the classical reference of Donoho and Gasko [1992] proves bounds on the maximum depth achieved in a dataset under ellipticity conditions. Computation of depth contours was considered in Liu [2017], Hammer et al. [2020]. Recently, an interesting connection between depth estimators and generative adversarial networks has been exhibited [Gao et al., 2018], which may perhaps lead to more computationally efficient estimators.

2.6 Notions of Robustness

We finish by clarifying our setting in the context of robustness. In order to quantify our notion of robustness, we introduce the following terminology. Recall that we have an underlying graph G([n], E) corresponding to the pairwise measurements, where E is partitioned into an inlier set,  Eg, and outlier set  Eb. For any  j ∈ [n], we define its neighborhood as well as its inlier and outlier neighborhoods as

image

We will denote by  α0the maximum percentage of outliers per node. That is,  α0is the maximum of  #(Ejb)/njover all  j ∈ [n], where throughout the rest of the paper  #(·)denotes the number of points in a set and  njis the degree of node  j, nj = #(Ej).

The following notion of recovery threshold is related to the notion of a breakdown point in robust statistics. However, our goal is somewhat different, since we desire an exact estimator rather than an approximation, as is typically considered for classical breakdown points. This is similar to the notion of RSR breakdown point given in Section 1.1 of Maunu and Lerman [2019].

Definition 1 (Recovery Threshold) The recovery threshold of a robust rotation synchronization algorithm is the largest value of  α0such that the algorithm outputs an estimator  ( ˆR)that satisfies (20).

The simplest information-theoretic bound for the recovery threshold is  α0 ≤ 1/2. Indeed, if  α0 > 1/2, then an adversary could easily choose  Ebto have a subgraph that dominates  Egwith a consistent set of measurements for an alternative signal  (Rb) = (Rb1, . . . , Rbn). That is, the observations would be

image

If the adversary chooses the partition  Egand  Ebproperly, then one could easily think that  Rbis the true underlying signal. For our method, we obtain recovery thresholds for  α0that are smaller than 1/2.

On the other hand, the information theoretic bound may be much higher in special models. For example, suppose that G([n], E) is an Erd¨os-R´enyi graph with parameter p. Suppose further that each edge in E is corrupted independently with probability q, and the corrupted measurements are i.i.d. uniform on SO(D). Wang and Singer [2013] call this the uniform corruption model. Then, Singer [2011] and Chen et al. [2016] established the following information theoretic threshold for the rotation synchronization problem:

image

Notice that one needs  p ≳ log(n)/nto ensure that the underlying Erd¨os-R´enyi graph is connected. For fixed p, notice that one can take q arbitrarily close to 1 (and so  α0is then very close to 1) as long as n is sufficiently large. We later discuss how our main results extend to this model in Section 4.5, where we show that the DDS algorithm achieves optimal recovery rates with respect to p (i.e., it can tolerate extremely sparse observation graphs).

To begin to build motivation for our method, we consider the case of synchronization over SO(2), where the method becomes considerably simpler due to its 1-dimensional manifold structure. First, Section 3.1 gives definitions of some geometrical objects on SO(2), which we identify with  C1for mathematical convenience. Then, in Section 3.2, we define our SO(2) synchronization method, which we call Trimmed Averaging Synchronization (TAS) and is a special case of our later DDS algorithm. Finally, Section 3.3 discusses the initialization and well-connectedness assumptions and uses these to give an adversarial recovery guarantee for the TAS algorithm.

3.1 The Geometry of  C1

We define a few structures related to the manifold  C1. The tangent space can be identified with R. Let v ∈ TzC1be a unit direction in the tangent space at  zj(i.e.,  v = ±1). The geodesic originating at  zjin the direction v is given by  γ(t) = eivtzj, t ∈ [0, π/|v|]. The exponential map and inverse exponential map (logarithm map) on this 1-dimensional manifold are given by

image

Finally, the cut-locus of a point  z ∈ C1is defined as the set of points for which there is not a unique geodesic from z. It is not hard to see that this is given by  cut(z) = {−z}.

Recall that we seek an underlying signal  z⋆ ∈ Cn1. Notice that its elements,  z⋆j ∈ C1for  j ∈ [n], can be parameterized by angles,  z⋆j = eiθ⋆j. This angle is also known as the argument of the complex number, and so we write  arg(eiθ) = θ, where  θ ∈ (−π, π]. The angular, or geodesic, distance between  z1and  z2 ∈ C1is

image

For later reference, we plot the extended angular distance function in Figure 1.

image

Fig. 1 The angular distance function  d∠(eiθ, 1).

Recall that if  jk ∈ Eg, then the edge measurement is correct, that is,  zjk = z⋆jk, where  z⋆jk := z⋆j z⋆kis defined analogously to (1). For  jk ∈ Eb, the measurement  zjkis assumed to be an arbitrary element of  C1. From the measurements  z⋆jk, jk ∈ Eg, z⋆is only identified up to a global rotation, due to the ambiguity that z⋆j z⋆k = z⋆j yyz⋆k, y ∈ C1, and so  z⋆ygenerates the same pairwise measurements as  z⋆.

To deal with this ambiguity, the following function will be used to demonstrate convergence of a sequence to  z⋆:

image

This is again nothing but a function that measures the maximum distance between normalization products. Notice that  δ(z) = 0 ⇐⇒ z = z⋆yfor some rotation  y ∈ C1. Therefore, convergence of  δ(z)to zero indicates convergence of z to  z⋆, and an algorithm exactly recovers  z⋆iff  δ(z) → 0.

3.2 Trimmed Averaging Synchronization

A natural way to solve the rotation synchronization problem involves energy minimization. The simplest strategy [Govindu, 2001, Martinec and Pajdla, 2007] attempts to minimize

image

A coordinate descent strategy to solve (8) involves updating  zjby solving

image

Applying this sequentially over the indices j = 1, . . . , n results in the multiple rotation averaging (MRA) discussed in more detail in Hartley et al. [2013]. While such coordinate descent strategies generally lack coordination across all objects like global synchronization methods, they lead to algorithms that are more memory efficient and that can be decentralized easily.

Another simple way to robustify (9) is to select the average of all points that fall within a trimmed set, which results in a trimmed averaging procedure. To account for the 1-dimensional manifold structure of SO(2), we propose to do this trimming in the tangent space, which yields the TAS algorithm. An illustration of one trimmed averaging step is given in Figure 2.

For a discrete  X ⊂ Rand a fraction 0 < p < 1, we write the pth quantile of X by  Xp. It is convenient to define the trimming operator

image

We also denote the average of a dataset  X ⊂ Rby ave(X). That is, ave(X) =� �x∈X x�/#(X).

Due to the simplified geometry of SO(2), we will show in the following that using this trimmed rotation averaging scheme converges to the underlying solution linearly when the percentage of outliers is at most α0 < 1/4when G is fully connected. In the case where (G, [n]) is not fully connected, the result is a corollary of our later Theorem 6 under a connectedness assumption on (G, [n]). We note that this fraction is similar to the one given in Lerman and Shi [2019], although there the bound is formulated for corrupted triangles in the graph.

For clarity, we give the TAS algorithm in Algorithm 1. To allow for damping of the updates, we include the step-size parameter  η ∈ (0, 1]. When  η < 1, we refer to the algorithm as Damped TAS or DTAS for short.

image

Fig. 2 Illustration of the TAS algorithm at a fixed step and a fixed node j. The measurement is  zj = z = i. After projecting into the tangent space, the outermost points in red are filtered, and the green points are averaged. This trimmed average is then projected back to the manifold.

image

3.3 Recovery Guarantees for DTAS

We begin by discussing the assumptions that will make a synchronization problem tractable for TAS. The first assumption we require is a good initialization, which is common in the analysis of such nonconvex methods. Assumption 1 The initial set of rotations  z(0) ∈ Cn1lies within a  π/2-neighborhood of  z⋆: that is, there exists a  w ∈ C1such that

image

Note that this is equivalent to the assumption that  δ(z) < π.

While corruptions are arbitrary, we require an assumption on the underlying graph (G, [n]). It essentially requires that the graph is sufficiently well connected.

Assumption 2 (ζ-Well-connectedness condition) For a fixed  ζ ∈ (0, 1], for any  J ⊂ [n]such that  #(J) ≤n/2, there exists an index  j ∈ Jsuch that

image

In words, this assumption requires that inside any set of at most n/2 nodes, there is a node that is connected to a significant number of nodes outside this set. The condition in this assumption is equivalent to requiring that

image

We include a discussion of this condition and its connection with random graphs, conductance, and expanders later in Section 4.5.

For the case of Assumption 2 with  ζ = 1, we just call the graph well-connected. While fully connected graphs satisfy this condition, there exist many more examples of graphs satisfying it with  ζ = 1as well, and we give some examples of some simple graphs that meet this assumption in Figure 3.

The following theorem gives the main recovery result for the DTAS algorithm. While the algorithm converges linearly, the rate we derive depends on n and is worst-case. In the few simulations we have run, the algorithm seems to converge at a faster rate that merits more study. Also, a more complicated proof may yield linear convergence in the general case of well-connected G, but for sake of brevity, we only prove it for the fully connected case.

image

Fig. 3 Examples of graphs that satisfy the well-connectedness condition for n = 4, 5 and 6. In each of these graphs, (12) is satisfied with  ζ = 1: that is, all subsets J of size at most n/2, there exists a node  j ∈ J such that #(Ej ∩ ([n] \ J)) > #(Ej ∩ J).

Theorem 3 Suppose that  α0 < ζ/4, Assumption 1 holds, G satisfies Assumption 2 with parameter  ζ, and [z(t)]t∈Nis the sequence generated by DTAS, for  η ∈ (0, 1)and  τ = ζ/4. Then,  δ(z(t)) → 0, and the algorithm exactly recovers  z⋆. Furthermore, in the case where G is fully connected, the DTAS algorithm linearly converges to  z⋆.

Proof The proof of convergence under well-connectedness follows from the fact that, when updating index j at iteration t, the selection rule defined by choosing the trimmed average yields a point in the interior of Tτ({Logzj(t)(zjkzk(t)) : k ∈ Ej}). The proof then follows from the proof of Theorem 6.

To see linear convergence in the fully connected case, we prove that all normalization products  z⋆j zj(t)contract during each pass over the dataset. Denote

image

These are the translation of the normalization products to the angular coordinates of the points  z1(t), . . . , zn(t). Also, define the sets

image

In this proof, we will write  δ = δ(z(t))as a shorthand. Notice that we must have

image

unless  I+(t) = I−(t)) = [n], in which case  z(t) = z⋆wfor some  w ∈ C1and z(t) recovers  z⋆.

For the update with respect to index j, all good pairwise measurements must lie in  −δj + [−δ/2, δ/2]. Since there are at least  3nj/4good measurements, all trimmed points must lie in this interval as well. Therefore, for all  j ∈ I−(t), after updating we have

image

Using this fact, we will now show that the indices in  I+(t)must move inwards. Indeed, since  #I+(t) ≤n/2, we must have  #(Ejg ∩ I−(t)) ≥ 1for all  j ∈ I+(t). Therefore, for each trimmed mean for  j ∈ I+(t), we have

image

Thus,

image

After the coordinate update, we have that for all  j ∈ I+(t),

image

After repeating this argument for all j over the course of an epoch, or pass over all indices j = 1, . . . , n, this yields that

image

as long as  η < (n − 1)/(n + 1). The width of this interval is  (n − 1 − η)δ(z(t))/(n − 1), which yields the desired result. ⊓⊔

This section presents a novel algorithm for robust synchronization over the rotation group, SO(D). We assume a fixed observation graph G that encodes which pairwise rotations we observe. The pairwise rotations are written as  Rjk ∈SO(D), where the good edges  jk ∈ Eghave the associated observation  R⋆jR⋆⊤k, and the bad edges are arbitrarily chosen from G and have arbitrarily corrupted measurements.

To proceed, we must make clear our goal for the synchronization problem, since there is a well known ambiguity, similar to the one encountered for SO(2) synchronization in Section 3.1 – we can only recover R⋆up to right multiplication by an element of SO(D). This is because, after this multiplication, one arrives at the same pairwise measurements in (1). This is a form of rotational symmetry in the nonconvex problem, which may be leveraged to develop tractable nonconvex programs [Zhang et al., 2020]. Exactly recovering the ground truth measurements  (R⋆) = (R⋆1, . . . , R⋆n) ∈SO(D)nup to right multiplication by  S ∈SO(D) is equivalent to finding a set of rotations  (R) = (R1, . . . , Rn)such that

image

for some  S ∈SO(D). We refer to the set of rotations  R⋆⊤j Rj, j = 1, . . . , n, as normalization products since, when  (R) = (R⋆S), they reveal the normalization factor that multiplies each element of  (R⋆)from the right. One could extend this discussion to the case of approximate recovery by requiring that the normalization products are approximately equal.

With our goal now in mind, we begin in Section 4.1 by discussing the geometry of the manifold SO(D) and presenting some basic geometric results that will be used in our main theorem. Following this, Section 4.2 reviews the concept of halfspace depth from robust statistics, which will be the core tool that we use to construct our algorithm. In Section 4.3, we give outline the DDS algorithm. Then, in Section 4.4, we give the theoretical guarantees that constitute the main innovations of this work. We finish in Section 4.5 with a discussion of the assumptions we make in our theorem.

4.1 The Manifold Structure of SO(D)

The rotation synchronization problem is obviously a robust recovery problem on the product Riemannian manifold SO(D)n. Therefore, in the following, we freely use concepts from Riemannian geometry, and specifically those concepts related to the geometry of SO(D).

4.1.1 Riemannian Geometry of SO(D)

The set of rotations SO(D) is a  D(D − 1)/2-dimensional Lie group that has a natural Riemannian structure. The bi-invariant distance metric d : SO(D) ×SO(D) → [0, ⌊ D2 ⌋π]is given by

image

where log is the matrix logarithm. The corresponding Lie algebra is so(D), the set of  D×Dskew-symmetric matrices. The tangent space of SO(D) at  R ∈SO(D) is

image

Notice that every tangent vector  v ∈ TRSO(D) has a corresponding element of so(D), which we denote by vso(D). The corresponding Riemannian metric (which is an inner product and thus should not be confused with a distance metric) for  v, w ∈ TRSO(D) is given by  ⟨v, w⟩R =Tr(v⊤w)/2 =Tr(v⊤so(D)wso(D))/2. Equipped with this metric, SO(D) is a Riemannian manifold with nonnegative sectional curvature. An open ball with respect to the metric d is written as B(R, r), where the radius is r and the center is R. Its closure and boundary are B(R, r) and  ∂B(R, r), respectively.

The exponential map is given by

image

where exp is the matrix exponential. The logarithmic map is the inverse of this:

image

The geodesic between  R, S ∈SO(D) is written as −−→RS(t) =ExpR(tLogR(S)), for  t ∈ [0, 1]. In the following, we use the notation for a halfspace of  TRSO(D),

image

4.1.2 Local Convexity Properties of SO(D)

We continue by recalling some local convexity properties of manifolds like SO(D). The following result on the convexity of sufficiently small balls is standard in the literature [Karcher, 1977, Afsari, 2009, Petersen, 2016].

Theorem 4 [Convexity of small balls] In a closed ball  B(C, r) ⊂SO(D) with  r < π/2, the squared distance metric  d2is strictly convex. This implies, in particular, that for all  R0, R1 ∈ B(C, r), −−−→R0R1(t) ∈ B(C, r)for all  t ∈ (0, 1).

Note that the closed ball in the previous theorem has the property that the interior of any nonconstant geodesic lies strictly in the interior of the ball. The following result is also readily apparent. It states that, for boundary points on a sufficiently small ball in SO(D), all interior directions are contained in a halfspace.

Corollary 1 Let B be a ball on SO(D) with radius  r < π/2and  R ∈ ∂B. Then, there is a halfspace H ⊂ TRSO(D) such that LogR B ⊂ H.

An important component of our later theoretical results relies on showing that the radius of the smallest ball containing a set of rotations shrinks. The final lemma of this section states that if a discrete set of rotations R is contained in a ball, and if half of the ball contains no boundary measurements, then the set R is actually contained in a ball of smaller radius. Thus, for such a set of rotations, this gives us a sufficient condition for decreasing the radius of the smallest containing ball.

image

Then, R is contained in a ball with radius less than r.

Proof Let c(t) be the geodesic ExpC(tv). We claim that, for t sufficiently small,  R ⊂ B(c(t), r), which is an open ball. Heuristically, one should expect this to be true, since the closed halfspace  H(C, −v)contains no boundary points, and so moving the center a small amount in the v direction keeps all points within the ball.

By the first order approximation to  d2(c(t), R)and since  ∠(v,LogC R) < π/2, we have

image

for t sufficiently small. On the other hand, since

image

the distance to C over all  R ∈ExpC[H(C, −v)] ∩ ∂B ∩ Ris bounded away from r. By continuity of d(c(t), S) for all  R ∈ R, this implies that there is an  ϵsuch that

image

Putting (25) and (26) together implies that a small shift of the ball results in a new center such that  maxR∈R d(c(t), R) <r. In turn, this means that all points of R lie in a ball  B(c(t), r′)with  r′ < r. ⊓⊔

4.2 Tukey Depth and its Properties

We will use the concept of Tukey depth to determine descent directions on the manifold SO(D)n, although other notions of depth could potentially be used as well (see Ch. 58 of Toth et al. [2017] for a discussion of different notions of depth). In Euclidean space, the Tukey depth of a point  x ∈ RDin a dataset X = {x1, . . . , xn} ⊂ RDis given by

image

The depth is therefore the minimum number of points contained in any halfspace that has x in its separating hyperplane. A natural robust estimator is then the point of maximum depth, which is also called the Tukey median. The  β-depth level set for  β ∈ [0, 1]is defined by

image

This level set is convex and compact, and its boundary is made up of hyperplanes defined by sets of D points [Liu et al., 2019]. This function will be used in the construction of our algorithm.

As an example, consider the 1-dimensional dataset  X = {xi}. Here, the formulation of depth is quite simple:

image

With this in mind, the  β-depth level set is  Dβ(X) = [x(⌈βn⌉), x(⌊(1−β)n⌋)],where  x(i)denotes the ith order statistic. The Tukey median in this case is just the median.

We recall the following theorem, which bounds the maximum possible depth within a general dataset. Notice that, in particular, this guarantees that the depth level set  Dβ(X)is nonempty for all  β ≤ 1/(D + 1).

Proposition 1 (Rado [1946]) Suppose that X is a set of n points. Then, the maximum depth in X is bounded below by  ⌈n/(D + 1)⌉.

A particularly useful property of Tukey depth is that it is affine equivariant, that is, it is stable under affine transformations.

Lemma 2 (Donoho and Gasko [1992])

image

This implies, in particular, that things behave nicely if we change the inner product on  RD.

image

With this property, we can map between the depth regions in  (RD, ⟨·, ·⟩)and  (RD, ⟨·, A·⟩)by

image

This affine equivariance implies, in particular, that Proposition 1 extends to datasets in tangent spaces of manifolds.

We finish with a simple lemma on depth level sets that will be the key to the robustness guarantee for our later algorithm. In simple words, this lemma guarantees sufficient conditions for a depth level set to contain a nonzero value in a dataset containing many zeros. In the following, datasets are represented by multisets and may contain duplicate points. For a halfspace  H(0, v) := {x ∈ RD : v⊤x ≥ 0} ⊂ RD, we write its separating hyperplane as L(0, v).

Lemma 3 Suppose that we have a dataset  X = {x1, . . . , xn} ∈ RDand a subset  Y ⊂ Xthat satisfies the properties i)  #(Y) > n − n(ζ/(2D + 2)), ii) There exists closed halfspace  H(0, v) ⊂ RDsuch that H(0, v) ⊃ Y, and iii) the only points of Y in L(0, v) are 0. Then,  Dζ/(2D+2)(X) ⊂ conv(Y) ⊂ (H(0, v) ∪{0}). Beyond this, if  #(Y ∩ L(0, v)) < (1 − ζ/2)n, then  (Dζ/(2D+2)(X)) ∩ H(0, v) ̸= ∅.

Proof It is obvious by the properties of depth that  Dζ/(2D+2)(X) ⊂ conv(Y) ⊂ (H(0, v) ∪ {0}), since any point on the boundary of conv(Y) has depth less than  ζ/(2D + 2). By Proposition 1, there is a point of depth at least  ζ/(2D + 2), and so  Dζ/(2D+2)(X)is nonempty.

Suppose that less than  (1 − ζ/2)npoints in Y are zero, that is,  #(Y ∩ L(0, v)) < (1 − ζ/2)n. We claim that  Dζ/(2D+2)(X) ∩ H(0, v) ̸= ∅. Define the auxiliary set  Z = X \ (Y ∩ L(0, v)), that is, Z removes the 0 values in Y from X. Since  #(Y∩L(0, v)) < (1−ζ/2)n, we have that  m = #(Z) ≥ (ζ/2)n. Within Z, there is a point  ˆzof depth at least m/(D + 1). Further, since  n ≤ 2m/ζand  #(X ∩ H(0, −v)) < nζ/(2D + 2), we have

image

which implies that  ˆzmust lie in H(0, v). On the other hand, since  m ≥ ζn/2, we have that

image

which means that  ˆzis a point of depth at least  ζ/(2D + 2)in  X. ⊓⊔

4.3 Depth Descent Synchronization

We now use the results of the previous sections to derive the DDS algorithm.

4.3.1 The General DDS Algorithm

We assume a selection rule  SRon convex, compact subsets of  TRSO(D) for all  R ∈SO(D). In particular, for our theorem, we assume that this selection rule chooses a nonzero point from the convex set if possible, and otherwise outputs zero. To select this point, one selection rule could be to take a point uniformly at random, or another could be to take the center of mass. As our theorem makes clear, the choice of this selection rule does not affect the exact recovery result, but it may change convergence rates. Since we do not give quantitative convergence rates in this work, we leave this choice as arbitrary.

In any case, given a selection rule  SRover such subsets of  TRSO(d), suppose our estimated rotations at time t are (R(t)). Our update direction at this time for index j = t mod n is defined by

image

In words, the direction  vj(t)is a direction in the tangent space at  Rj(t)that is sufficiently deep with respect to the neighbor measurements {LogRj(t) RjkRk(t) : k ∈ Ej}. Given  vj(t), the algorithm updates

image

for a chosen step size  η(t) ∈ (0, η⋆(D)]. Our theory below restricts this step size according to Theorem 4.2 of Afsari et al. [2013]: in the case of D = 2 or 3, one can take  η⋆(D) = 1, while for D > 3 the upper bound is more restrictive (the reader can consult the discussion in Afsari et al. [2013] for a more thorough discussion of the bound). For sake of clarity, we write the full DDS algorithm in Algorithm 2.

image

One benefit of DDS is that there is no need to tune the step size, which is due to the local convexity properties of the manifold. As we outline in our main theorem, the step size can be directly selected from the guidance of Afsari et al. [2013].

As for computational complexity, at least in low-dimensions, depth regions can be calculated efficiently for small datasets [Liu et al., 2019]. In particular, the most straightforward algorithm involves an exhaustive search over hyperplanes spanned by D-subsets of X that cut off  βnpoints in X. The time complexity of this method for  R3is  O(n3 log(n)), and so could be used for moderately sized datasets. Translating this to our problem in SO(3), the time complexity for updating  j ∈ [n]is  O(n3j log(nj)), and so we see that this method is efficient for sparser graphs. In an Erd¨os-R´enyi model, the complexity to update all n rotations is  O(n4p3), where p is the Erd¨os-R´enyi parameter.

As discussed in the introduction, the time complexity of the DDS algorithm is not necessarily more efficient than that of Lerman and Shi [2019]. Indeed, the complexity of their message-passing algorithm is O(n3)for a single update to all rotations, while for our method it is  O(�j n3j log(nj))for SO(3) (and much larger for higher dimensions). Therefore, DDS has better complexity for sparse graphs, while Lerman and Shi [2019] has better complexity for dense graphs. On the other hand, our complexity is uniformly better in SO(2), where depth contours can be easily found in O(n log(n)) by sorting, and it thus takes  O(n2 log(n))time to update all nodes. Finally, we also note that our method has better scaling in terms of memory usage: the multiple rotation averaging scheme takes  O(nj)memory while the message-passing scheme takes  O(n3).

4.3.2 The Approximate DDS Algorithm

To make the DDS algorithm more computationally efficient, we employ a few strategies to develop an approximate DDS algorithm.

First, so that we do not need to resort to computing full depth contours, we instead take the average of the deepest points in the set {LogRj(t) RjkRk(t) : k ∈ Ej}as our update direction at each iteration.

Second, to avoid computation of the full depth of every point in this set, we instead use an approximation of depth based on sampling. Suppose that we wish to approximate the depth of the vectors in Y = {y1, . . . , yn} ⊂ RDwith respect to Y. We can sample a set of vectors  u1, . . . , um ∈ SD−1. Then, for each  yi ∈ Y, the approximation of depth, �depth, is

image

Notice that �depthreplaces the minimum over  u ∈ SD−1in (27) by the minimum over the discrete set of vectors  {u1, . . . , um}. For  R3(which corresponds to the tangent space for SO(3)), computation of the full depth for all points would take  O(n3j)time, where for each  xione would need to search over all planes defined by triplets  xi, xj, xk, for distinct i, j, k. On the other hand, the computation of the approximate depth using (34) takes  O(n2jm)and can be more efficiently implemented due to the fact that the  u1, . . . , umare shared between all  yi.

The approximate DDS algorithm is given in Algorithm 3. Here, we use the notation  SRj(t)SO(D) for the set of all unit vectors in  TRjSO(D).

image

4.4 Exact Recovery for DDS

For many nonconvex methods, good initialization is quite important [Li et al., 2019, Maunu et al., 2019, Qu et al., 2019, Chi et al., 2019, Qu et al., 2020]. We only obtain a recovery result for DDS if we can initialize in a suitable neighborhood of  R⋆.

Definition 2 A set of n rotations (R) lies within a  ρ-neighborhood of  (R⋆)if the normalization products R⋆⊤j Rjall lie in a ball of radius  ρ. That is, there exists a  C ∈SO(D) such that

image

With this terminology, our main assumption is that that we can initialize in a  π/2-neighborhood of  (R⋆). This is the analog of Assumption 1 for SO(D).

Assumption 5 The initial set of rotations (R(0)) for our algorithm lies within a  π/2-neighborhood of  (R⋆).

As we discuss in Section 4.5, we believe that this assumption is not so restrictive, and we later give some intuition for how this might occur in a real scenario.

The following theorem constitutes the main theoretical result of this work. It states that, with proper initialization and well-connectedness, the recovery threshold of Algorithm 2 is  1/(D(D −1)+2). As examples when  ζ = 1, in the case of SO(2), Theorem 6 yields a corruption threshold of  α0 < 1/4. In the case of SO(3), Theorem 6 yields a corruption level of  α0 < 1/8.

Theorem 6 Suppose that  α0 < ζ/(D(D − 1) + 2), Assumptions 5 and 2 hold, and  [(R(t))]t∈Nis generated by (33) with  β = ζ/(D(D−1)+2). Further assume in the case of D = 2, 3 that  η ∈ (0, 1], and in the case of D > 3 that  ηis chosen according to Theorem 4.2 of Afsari et al. [2013]. Then,  d(R⋆⊤j Rj(t), R⋆⊤k Rk(t)) →0 for all j, k, and the DDS algorithm exactly recovers  (R⋆).

Proof (Proof of Theorem 6) To aid in the proof, we denote the smallest ball enclosing our normalization products as

image

The center of B(t) is C(t) and its radius is r(B(t)). Our goal will be to show that  r(B(t)) → 0, t → ∞.

The proof of the theorem is broken into three parts. In the first part, we prove that the sequence  [(R(t))]t∈Nremains in a nested sequence of balls. In the second part, we show that, after sufficiently many iterations, the radius of the smallest enclosing ball must shrink. We finish in the third part by appealing to a general convergence theorem for monotonic algorithms.

Part I:  B(t+1) ⊆ B(t): First, we show that at time t, no matter which index is updated, the normalization products remain in B(t). This is true at t = 0, so assume that it is true at a time t. Let j = t mod n and consider the pairwise measurements in the tangent space at  Rj(t): for each  k ∈ Ejg, the corresponding point in the tangent space is given by LogRj(t)(R⋆jR⋆⊤k Rk(t)). By assumption, we have that  R⋆jR⋆⊤k Rk(t) ∈R⋆jB(t)for all k. Since  α0 < ζ/(D(D − 1) + 2)and  β = ζ/(D(D − 1) + 2),

image

since the set in the right-hand side of the display contains more than a  1 − βfraction of points. We can now apply Theorem 3.7 of Afsari et al. [2013]. This follows from the fact that the update direction  vj(t)is the gradient of the Frech´et mean function for a weighted combination of {LogRj(t)(RjkRk(t)) : k ∈ Ejg}(since it lies in the convex hull of of these points). More formally, letting  m = #(Ejg), since  vj(t) ∈conv({LogRj(t)(RjkRk(t)) : k ∈ Ejg}), there exist weights  a1, . . . , amsuch that

image

Therefore, choosing the step size as in Afsari et al. [2013] implies that  R⋆⊤j Rj(t + 1) ∈ B(t), and further that  R⋆⊤j Rj(t+1) ∈ B(t)when  vj(t) ̸= 0. In turn, this implies that  B(t + 1) ⊆ B(t)and, if  Rj(t) ∈ B(t), then  Rj(t + 1) ∈ B(t)as well (i.e., interior points cannot move to the boundary).

In the case of SO(3), Theorem 3.7 of Afsari et al. [2013] tells us that choosing  η ∈ (0, 1]suffices. The case of SO(D) for D > 3 is dealt with in a similar way using Theorem 4.2 of Afsari et al. [2013].

Part II:  r(B(s + ∆s)) < r(B(s)): We now must show that after sufficiently many iterations, the radius of B(t) strictly decreases. To this end, fix a time s. At this time, at least one normalization product  R⋆⊤j Rj(s)must lie on the boundary  ∂B(s). For convenience, define the index set J(s) of boundary rotations at time s by

image

We will show that there exists a  ∆s > 0such that at some future time  s + ∆s,

image

To this end, pick a direction w uniformly at random from  TC(s)SO(D) such that  ∥w∥C(s) = 1. This vector separates  TC(s)into two halfspaces, and thus partitions B(s) into two halves. One of these halves contains at most n/2 points LogC(s)(R⋆⊤j Rj(s)), and we will denote the corresponding halfspace of  TC(s)SO(D) by H. Since the direction w is chosen uniformly at random, there are no points on the boundary of this halfspace with probability 1.

Let K(s) denote the set

image

that is, the set of indices corresponding to boundary normalization products at time s. At each time t = s+m for m > 0, if none of the normalization products  R⋆⊤k Rk(t), k ∈ K(s), lie in  ∂B(s), then set  ∆s = mand we can apply Lemma 1 to yield (36).

Otherwise, by Assumption 2, there is at least one index  k ∈ K(s)such that  R⋆⊤k Rk(s)is in  ∂B(s)and

image

which yields that  vj(s + m) ∈ conv(Y)and  vj(s + m) ̸= 0. Thus, for sufficiently small  η(s + m), Rj(s +m + 1) ∈ B(s). Repeating this sequentially for all elements of K(s), there must exist a  ∆ssuch that

image

We are left in a situation where  R⋆⊤1 R1(s + ∆s), . . . , R⋆⊤n Rn(s + ∆s) ∈ B(s), and there exists a w such that  H(C(s), −w)contains no boundary points. By appealing to Lemma 1, we know that  (R(s + ∆s)lies in a ball of radius smaller than r(B(s)), and therefore r(B(s)) has a strictly monotonic subsequence.

Part III: Strict monotonicity implies convergence: By Mizera [2002], as long as  β ≤ 1/(D(D−1)/2+1), the point to set mapping

image

is non-empty and outer semicontinuous with respect to the empirical measure on

image

Therefore, the associated algorithm (33) is upper semicontinuous in the sense of Theorem 3.1 of Meyer [1976], and we obtain convergence of R(t) to a fixed point of (33).

We finish by examining fixed points of the algorithm (33). Suppose that R is a fixed point of this sequence, such that  vjas defined by (32) is zero. The fixed point is characterized by  d(Rj, R⋆jR⋆⊤k Rk) = 0for at least  (1 − ζ/2)njmeasurements k. By  ζ-well-connectedness, for all subsets J of size at most n/2, there is an index j such that  #�Ej ∩ J�< (1 − ζ/2)nj, because otherwise Lemma 3 would yield a nonzero

update direction. Thus, there is an index  k ∈ Ej \ [J]such that  d(Rj, R⋆jR⋆⊤k Rk) = 0. This implies that d(Rj, R⋆jR⋆⊤k Rk) = 0for all  j, k. ⊓⊔

4.5 Discussion of Assumptions

The  ζ-well-connectedness condition in Assumption 2 bears some similarity to the notions of conductance and graph expansion. From the perspective of graph theoretical results, we note that a sufficient condition for Assumption 2 with  ζ = 1is for the conductance of the graph to be greater than or equal to 1/2. This follows from a simple pigeonhole argument. It is unclear if this condition holds for Erd¨os-R´enyi graphs. Indeed, if one uses Cheeger’s inequality, one would need the spectral gap to be greater than or equal to 1, but for Erd¨os-R´enyi graphs one only expects this gap to concentrate around 1 in practice [Hoffman et al., 2021].

A sufficient condition for (13) is for the conductance to be bounded below by  ζ/2, which can be achieved with high probability for any fixed  ζ < 1by Erd¨os-R´enyi graphs when  p ≳ log(n)/n(see, for example, Hoff- man et al. [2021]), as well as expander graphs (see, for example, Friedman et al. [2003]).

Our result in Theorem 6 holds with high probability for the uniform corruption model discussed in Section 2.6 with  q < α0 = ζ/(D(D −1)+2)and  p ≳ log(n)/n. This means that DDS achieves the information theoretically optimal rate with respect to p in this model. Indeed, we can read (4) as  p = Ω� 1(1−q)2log(n)n �.

Assumption 5 requires that we initialize the DDS algorithm so that the normalization products lie in sufficiently small ball. This can be achieved in practice for cameras whose orientations lie close enough together. That is, suppose that all of the rotations  R⋆⊤1 , . . . , R⋆⊤nlie in  B(S, ρ)for all j, for some S and ρ < π/2. Then, if the initial point for the DDS algorithm is chosen to be (I, . . . , I), then it is not hard to see that

image

which directly shows that Assumption 5 holds. Notice that  B(S, π/2)is a large ball that essentially makes up half of the manifold SO(3), since the distance from any point to its cut-locus is  π. For example, if one considers reconstruction of an object from many images taken from points on a sphere that surrounds this object, then our requirement would essentially boil down to needing all of the images being taken from a single hemisphere.

We conjecture that one can weaken this initialization condition, to only require that neighboring normalization products are close to each other, but we leave weakening of this assumption to future work.

The algorithms we compare with are MRA and L1-MRA [Hartley et al., 2013], IRLS after L1-MRA initialization [Chatterjee and Govindu, 2013], LTS [Huang et al., 2019], CEMP [Lerman and Shi, 2019], and MPLS [Shi and Lerman, 2020]. Default parameters of all methods are used. For CEMP, the method computes the corruptions levels first to determine which edges are most corrupted. Then, using these corruption levels, it finds a minimum spanning tree, from which one can fix  R1 = Iand then propagate from this to find the other rotations along this tree. For LTS, we implement the truncation step with parameter  γ = 0.96and run for 40 iterations. The approximate DDS algorithm is run for 40 epochs (or passes over the data, which means we take T = 40n) with a step size of  η = 0.7and number of depth vectors m = 20.

We compute the distance between the estimated rotations  ( ˆR)and  (R⋆)by first aligning them by solving

image

The error is then computed as

image

All algorithms take less than a minute to run on each individual dataset on a Macbook Air with a 1.6 GHz Dual-Core Intel Core i5 and 8 GB of RAM.

Figure 4 presents a first comparison of these algorithms on synthetic data. The model is the uniform corruption model, which is discussed in Section 2.6. The graph is Erd¨os-R´enyi on n = 100 nodes with varying parameter p, and each edge on this graph is corrupted with probability q. The underlying rotations, (R⋆1, . . . , R⋆n), are distributed uniformly on SO(D). The bad measurements,  Rbjk, are also uniformly dis- tributed on SO(D). For each set of parameters (p = 0.1, 0.2, . . . , 0.5 and q = 0.05, 0.1, . . . , 0.3), 10 datasets are generated and the color represents the mean of the  log10-errors over these experiments. As we can see, the approximate DDS algorithm performs on par with the other most competitive methods (CEMP and MPLS).

image

Fig. 4 Rotation synchronization experiment with uniform outliers. Here, p is the parameter of the Erd¨os-R´enyi graph, and q is the percentage of corrupted edges. The underlying rotations are distributed uniformly in SO(D), and the corrupted measurements of group ratios are also uniform on SO(D). The color represents the mean of the  log10-errors over the 10 generated datasets.

As a second experiment, Figure 5 presents a more challenging adversarial example on synthetic data. Here, the outliers form a consistent set of measurements themselves, and a similar corruption model is discussed in Section 7.3 of Lerman and Shi [2019], although here we extend this to SO(3) and use a different model for the underlying rotations.

The graph is an Erd¨os-R´enyi graph on n = 50 nodes with parameter p, and each edge on this graph is corrupted with probability q. The ground truth rotations approximately come from a geodesic on SO(3), and the outliers are self-consistent measurements that come from (approximately) another geodesic on SO(3). The ground truth rotations are

image

where v is a fixed vector drawn uniformly from the sphere,  ξi ∼ N(0, 10−4I), and  si = −1 + 2(i − 1)/50. The outliers generated by pairwise measurements between another set of rotations

image

where again  v′is a fixed vector drawn uniformly from the sphere,  ξi ∼ N(0, 0.5I), and  si = −1 +2(i − 1)/50. As before, for each set of parameters p and q, 10 datasets are generated and the color represents the mean of the  log10-errors over these experiments. As we can see again, the approximate DDS algorithm performs well in this experiment, and in fact it performs on par with the most competitive rotation synchronization algorithms (CEMP and MPLS).

In both experiments, we note that the other competitive algorithms are CEMP and MPLS [Lerman and Shi, 2019, Shi and Lerman, 2020]. As mentioned earlier, CEMP, and thus also MPLS that uses ideas of CEMP, have higher memory cost than DDS.

image

Fig. 5 Rotation synchronization experiment with adversarial outliers. Here, p is the parameter of the Erd¨os-R´enyi graph, and q is the percentage of corrupted edges which are uniformly distributed across this graph. The underlying rotations follow the model in (40), and the corrupted measurements are pairwise measurements between rotations generated by the separate set (41). The color represents the mean of the  log10-errors over the 10 generated datasets.

In this work, we developed the first adversarial robustness guarantees for a multiple rotation averaging algorithm. Our novel algorithm relies on finding descent directions using Tukey depth in the tangent space of SO(D). To our knowledge, this represents the first application of manifold Tukey depth in an applied setting. In the case of D = 2 and D = 3, which most frequently arise in practice, our recovery thresholds are 1/4 and 1/8, and the algorithm can be implemented efficiently. We also show how to speed up the algorithm with some approximations, and this approximate algorithm performs competitively on simple synthetic experiments. Future work should also examine if it is possible to extend the analysis to the more practical approximate DDS algorithm.

Another direction for future work is to examine the limits of our analysis. In particular, it would be interesting to know if tighter analyses can yield larger recovery thresholds. At least for the cases of SO(2) and SO(3), which arise in applications, the depth descent estimator discussed in this paper has significant recovery thresholds, while also being computationally tractable. It is not clear what the optimal bounds for recovery with adversarial corruption are in general. Furthermore, if one moves away from adversarial corruption and instead considers special models of data like the uniform corruption model, the bounds could be much better.

Two more concrete directions for future work would be to carry out further examination of the  ζ-well-connectedness condition in Assumption 2. In particular, it would be interesting to see if it can be relaxed at all, what its implications are, and when it actually holds.

Finally, perhaps the most important direction for future work is to give theoretically justified algorithms for a larger range of algorithms employed for SfM [ ¨Ozyes¸il et al., 2017, Bianco et al., 2018]. Indeed, such theoretical work can lead to new and improved algorithms and also to the development of novel state-of-the-art pipelines.

B. Afsari, R. Tron, and R. Vidal. On the convergence of gradient descent for finding the Riemannian center of mass. SIAM Journal on Control and Optimization, 51(3):2230–2260, 2013.

B. Afsari. Means and averaging on Riemannian manifolds. PhD thesis, University of Maryland, College Park, 2009.

M. Arie-Nachimson, S. Z. Kovalsky, I. Kemelmacher-Shlizerman, A. Singer, and R. Basri. Global motion estimation from point matches. In 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pages 81–88. IEEE, 2012.

R. Arora. On learning rotations. In Advances in neural information processing systems, pages 55–63, 2009.

S. Arora, R. Ge, T. Ma, and A. Moitra. Simple, efficient, and neural algorithms for sparse coding. arXiv preprint arXiv:1503.00778, 2015.

F. Arrigoni, B. Rossi, P. Fragneto, and A. Fusiello. Robust synchronization in SO(3) and SE(3) via low-rank and sparse matrix decomposition. Computer Vision and Image Understanding, 174:95–113, 2018.

A. S. Bandeira, N. Boumal, and A. Singer. Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Mathematical Programming, 163(1-2):145–167, 2017.

A. S. Bandeira. Random Laplacian matrices and convex relaxations. Foundations of Computational Mathematics, 18(2):345–379, 2018. doi: 10.1007/s10208-016-9341-9.

S. Bianco, G. Ciocca, and D. Marelli. Evaluating the performance of structure from motion pipelines. Journal of Imaging, 4(8):98, 2018.

N. Boumal. Nonconvex phase synchronization. SIAM Journal on Optimization, 26(4):2355–2377, 2016.

N. Boumal, V. Voroninski, and A. S. Bandeira. Deterministic guarantees for Burer–Monteiro factorizations of smooth semidefinite programs. arXiv preprint arXiv:1804.02008, 2018.

A. Chatterjee and V. M. Govindu. Robust relative rotation averaging. IEEE transactions on pattern analysis and machine intelligence, 40(4):958–972, 2017.

A. Chatterjee and V. M. Govindu. Efficient and robust large-scale rotation averaging. In Proceedings of the IEEE International Conference on Computer Vision, pages 521–528, 2013.

Y. Chen, C. Suh, and A. J. Goldsmith. Information recovery from pairwise measurements. IEEE Transactions on Information Theory, 62(10):5881–5905, 2016.

Y. Cherapanamjeri, P. Jain, and P. Netrapalli. Thresholding based outlier robust PCA. In COLT, pages 593–628, 2017.

Y. Chi, Y. Lu, and Y. Chen. Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019.

L. Danzer, B. Gr¨unbaum, and V. Klee. Helly’s theorem and its relatives. In Proc. Symp. Pure Math., volume 7, pages 101–180. Amer. Math. Soc., 1963.

Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in neural information processing systems, pages 2933–2941, 2014.

D. L. Donoho and M. Gasko. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. The Annals of Statistics, 20(4):1803–1827, 1992.

J. Friedman et al. Relative expanders or weakly relatively ramanujan graphs. Duke Mathematical Journal, 118(1):19–35, 2003.

C. Gao and A. Y. Zhang. Exact minimax estimation for phase synchronization. arXiv preprint arXiv:2010.04345, 2020.

C. Gao, J. Liu, Y. Yao, and W. Zhu. Robust estimation and generative adversarial nets. arXiv preprint arXiv:1810.02030, 2018.

T. Gao and Z. Zhao. Multi-frequency phase synchronization. arXiv preprint arXiv:1901.08235, 2019.

R. Ge, F. Huang, C. Jin, and Y. Yuan. Escaping from saddle points—online stochastic gradient for tensor decomposition. In Proceedings of The 28th Conference on Learning Theory, pages 797–842, 2015.

R. Ge, J. D. Lee, and T. Ma. Matrix completion has no spurious local minimum. arXiv preprint arXiv:1605.07272, 2016.

V. M. Govindu. Combining two-view constraints for motion estimation. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, volume 2, pages II–II. IEEE, 2001.

V. M. Govindu. Lie-algebraic averaging for globally consistent motion estimation. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 1, pages I–I. IEEE, 2004.

V. M. Govindu. Robustness in motion averaging. In Asian Conference on Computer Vision, pages 457–466. Springer, 2006.

H. L. Hammer, A. Yazidi, and H. Rue. Estimating tukey depth using incremental quantile estimators. arXiv preprint arXiv:2001.02393, 2020.

P. Hand, C. Lee, and V. Voroninski. Exact simultaneous recovery of locations and structure from known orientations and corrupted point correspondences. Discrete & Computational Geometry, 59(2):413–450, 2018.

M. Hardt. Understanding alternating minimization for matrix completion. In FOCS, pages 651–660. IEEE, 2014.

R. Hartley, J. Trumpf, Y. Dai, and H. Li. Rotation averaging. International journal of computer vision, 103 (3):267–305, 2013.

R. Hartley, K. Aftab, and J. Trumpf. L1 rotation averaging using the weiszfeld algorithm. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3041–3048. IEEE, 2011.

C. Hoffman, M. Kahle, and E. Paquette. Spectral gaps of random graphs and applications. International Mathematics Research Notices, 2021(11):8353–8404, 2021.

Q.-X. Huang and L. Guibas. Consistent shape maps via semidefinite programming. In Computer Graphics Forum, volume 32, pages 177–186. Wiley Online Library, 2013.

X. Huang, Z. Liang, C. Bajaj, and Q. Huang. Translation synchronization via truncated least squares. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 1459–1468, 2017.

X. Huang, Z. Liang, X. Zhou, Y. Xie, L. J. Guibas, and Q. Huang. Learning transformation synchronization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8082– 8091, 2019.

P. Jain, A. Tewari, and P. Kar. On iterative hard thresholding methods for high-dimensional m-estimation. In Advances in Neural Information Processing Systems, pages 685–693, 2014.

H. Karcher. Riemannian center of mass and mollifier smoothing. Communications on pure and applied mathematics, 30(5):509–541, 1977.

J. D. Lee, M. Simchowitz, M. I. Jordan, and B. Recht. Gradient descent converges to minimizers. University of California, Berkeley, 1050:16, 2016.

G. Lerman and T. Maunu. Fast, robust and non-convex subspace recovery. Information and Inference: A Journal of the IMA, 7(2):277–336, 2017.

G. Lerman and Y. Shi. Robust group synchronization via cycle-edge message passing. arXiv preprint arXiv:1912.11347, 2019.

G. Lerman, Y. Shi, and T. Zhang. Exact camera location recovery by least unsquared deviations. SIAM Journal on Imaging Sciences, 11(4):2692–2721, 2018.

X. Li, S. Ling, T. Strohmer, and K. Wei. Rapid, robust, and reliable blind deconvolution via nonconvex optimization. Applied and computational harmonic analysis, 47(3):893–934, 2019.

H. Liu, M.-C. Yue, and A. So. A unified approach to synchronization problems over subgroups of the orthog- onal group. arXiv preprint arXiv:2009.07514, 2020.

X. Liu. Fast implementation of the tukey depth. Computational Statistics, 32(4):1395–1410, 2017.

X. Liu, K. Mosler, and P. Mozharovskyi. Fast computation of tukey trimmed regions and median in dimension p > 2. Journal of Computational and Graphical Statistics, 28(3):682–697, 2019.

J. Lu and S. Steinerberger. Synchronization of Kuramoto oscillators in dense networks. arXiv preprint arXiv:1911.12336, 2019.

C. Ma, K. Wang, Y. Chi, and Y. Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion. In PMLR, volume 80, pages 3345– 3354, 10–15 Jul 2018.

D. Martinec and T. Pajdla. Robust rotation and translation estimation in multiview reconstruction. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007.

T. Maunu and G. Lerman. Robust subspace recovery with adversarial outliers. arXiv preprint arXiv:1904.03275, 2019.

T. Maunu, T. Zhang, and G. Lerman. A well-tempered landscape for non-convex robust subspace recovery. Journal of Machine Learning Research, 20(37):1–59, 2019.

T. Maunu and G. Lerman. Depth descent synchronization in SO(D). arXiv preprint arXiv:2002.05299 v2, 2020.

S. Mei, Y. Bai, and A. Montanari. The landscape of empirical risk for nonconvex losses. The Annals of Statistics, 46(6A):2747–2774, 2018.

R. R. Meyer. Sufficient conditions for the convergence of monotonic mathematical programming algorithms. J. Comput. System Sci., 12:108–121, 1976.

I. Mizera. On depth and deep points: a calculus. The Annals of Statistics, 30(6):1681–1736, 2002.

M. Moakher. Means and averaging in the group of rotations. SIAM journal on matrix analysis and applications, 24(1):1–16, 2002.

P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P. Jain. Non-convex robust pca. In Advances in Neural Information Processing Systems, pages 1107–1115, 2014.

O. Ozyesil, A. Singer, and R. Basri. Stable camera motion estimation using convex programming. SIAM Journal on Imaging Sciences, 8(2):1220–1262, 2015.

O. ¨Ozyes¸il, V. Voroninski, R. Basri, and A. Singer. A survey of structure from motion*. Acta Numerica, 26: 305–364, 2017.

A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra. Message-passing algorithms for synchronization prob- lems over compact groups. Communications on Pure and Applied Mathematics, 71(11):2275–2322, 2018.

P. Petersen. Riemannian geometry, volume 171. Springer, 3rd edition, 2016.

Q. Qu, Y. Zhang, Y. Eldar, and J. Wright. Convolutional phase retrieval via gradient descent. IEEE Transactions on Information Theory, 66(3):1785–1821, 2019.

Q. Qu, Z. Zhu, X. Li, M. Tsakiris, J. Wright, and R. Vidal. Finding the sparsest vectors in a subspace: Theory, algorithms, and applications. arXiv preprint arXiv:2001.06970, 2020.

R. Rado. A theorem on general measure. Journal of the London Mathematical Society, 1(4):291–300, 1946.

D. M. Rosen, L. Carlone, A. S. Bandeira, and J. J. Leonard. Se-sync: A certifiably correct algorithm for synchronization over the special euclidean group. The International Journal of Robotics Research, 38 (2-3):95–125, 2019.

Y. Shi and G. Lerman. Message passing least squares framework and its application to rotation synchroniza- tion. In International Conference on Machine Learning, pages 8796–8806. PMLR, 2020.

A. Singer. Angular synchronization by eigenvectors and semidefinite programming. Applied and computational harmonic analysis, 30(1):20–36, 2011.

J. Sun, Q. Qu, and J. Wright. Complete dictionary recovery over the sphere. In Sampling Theory and Applications (SampTA), 2015 International Conference on, pages 407–410, May 2015a. doi: 10.1109/ SAMPTA.2015.7148922.

J. Sun, Q. Qu, and J. Wright. When are nonconvex problems not scary? arXiv preprint arXiv:1510.06096, 2015b.

C. J. Taylor and D. J. Kriegman. Minimization on the Lie group SO(3) and related manifolds. Yale University, 16:155, 1994.

C. D. Toth, J. O’Rourke, and J. E. Goodman. Handbook of discrete and computational geometry. Chapman and Hall/CRC, 2017.

R. Tron and R. Vidal. Distributed image-based 3-d localization of camera sensor networks. In Decision and Control, 2009 held jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009. Proceedings of the 48th IEEE Conference on, pages 901–908. IEEE, 2009.

R. Tron, X. Zhou, and K. Daniilidis. A survey on rotation optimization in structure from motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 77–85, 2016.

J. W. Tukey. T6: Order statistics, in mimeographed notes for statistics 411. Department of Statistics, Princeton University, 1974.

I. Waldspurger and A. Waters. Rank optimality for the Burer-Monteiro factorization. arXiv preprint arXiv:1812.03046, 2018.

L. Wang and A. Singer. Exact and stable recovery of rotations for robust synchronization. Information and Inference, 2013.

L. Wang and A. Singer. Exact and stable recovery of rotations for robust synchronization. Information and Inference, 2013. doi: 10.1093/imaiai/iat005.

X. Yi, D. Park, Y. Chen, and C. Caramanis. Fast algorithms for robust PCA via gradient descent. In NIPS, pages 4152–4160, 2016.

T. Zhang and Y. Yang. Robust PCA by manifold optimization. Journal of Machine Learning Research, 19 (80):1–39, 2018.

Y. Zhang, Q. Qu, and J. Wright. From symmetry to geometry: Tractable nonconvex problems. arXiv preprint arXiv:2007.06753, 2020.


Designed for Accessibility and to further Open Science