The typical synchronization problem involves recovery of n group elements from pairwise measurements between them. It arises, for example, when solving the Structure from Motion (SfM) problem. One subproblem of SfM is to recover the three-dimensional orientations and positions of cameras from pairwise orientations and positions in relation to a scene [ ¨Ozyes¸il et al., 2017]. Here, we specifically focus on robust synchronization over SO(D), the rotation group for . That is, given pairwise rotations in SO(D), some of which are corrupted, we aim to recover the original set of n rotations.
We assume n unknown, ground truth elements of SO(D), which we denote by . We form a graph G([n], E), where [n] := {1, . . . , n} indexes the n unknown elements and E designates the edges for which measurements of relative rotations are taken. For each
, we are provided with the measurement
We can think of in the following way: If we are oriented in the coordinate system with respect to node k, then
rotates our coordinate system into the coordinate system we would see if we were sitting at node j. This synchronization formulation extends to any given group, where one wishes to recover
, an n-tuple of elements in the group, given measurements of the group ratios
.
In reality, we cannot hope to exactly measure all the pairwise rotations in (1). In many real systems, both noisy and corrupted measurements occur: our focus here is on adversarially corrupted measurements. That is, within the measurement graph G, the corruption model is assumed to be fully adversarial. Our model is specified by partitioning the measured data into two parts:
1. We observe corrupted (or “bad”) edges , where all edges in
have a corresponding arbitrary corruption. The adversary is allowed to choose
(and thus may to some degree influence the connectivity of
) as well as the corrupted values
for
. For each node, the adversary is only allowed to corrupt a limited fraction of edges.
2. The rest of the observed edges are uncorrupted (or “good”) edges , where each edge in
has an associated measurement given by (1). Theoretically guaranteed methods for robust synchronization are still lacking, especially in adversarial
and nonconvex settings. The development of these methods is important because in practice measurements are
usually quite corrupted, especially in applied problems like Structure from Motion [ ¨Ozyes¸il et al., 2017]. The results we establish here are concerned with exact recovery. That is, given a set of corrupted measurements, we wish to exactly recover . We will show that this is possible for a nonconvex method even in the presence of a significant amount of arbitrary corruption. Our method falls into the class of multiple rotation averaging algorithms [Govindu, 2004, Martinec and
Pajdla, 2007, Hartley et al., 2013]. These methods are effectively coordinate descent algorithms, which are
present highly efficient algorithms for nonconvex programs that are notoriously hard to analyze. While their analysis is challenging, it is imperative to develop a theoretical understanding of these methods and their robust counterparts [Hartley et al., 2011, Chatterjee and Govindu, 2017]. Moreover, as we discuss later, there are few robustness guarantees for group synchronization with adversarial corruption. Among the limited guarantees, none cover our model, and we thus make a significant contribution to this area. This work is also of general appeal to the nonconvex optimization community since we are able to prove convergence results in the complex nonconvex landscape of robust multiple rotation averaging. Furthermore, some energy landscapes associated with this problem exhibit many local minima and spurious fixed points, which we are able to avoid with our new method.
1.1 Contributions of This Work
The main contributions of this work follow.
1. As a warm-up, we develop an adversarially robust algorithm for synchronization in SO(2), which we call Trimmed Averaging Synchronization (TAS). In Theorem 3, under a generic condition on the measurement graph G, which we call the “well-connectedness”, and proper initialization, we show that it can tolerate a fraction of outliers per node that is bounded above by 1/4. We further prove that it converges linearly for fully connected observation graphs.
2. To extend this result to SO(D), we develop a new algorithm that we call Depth Descent Synchronization (DDS) based on Tukey depth in the tangent space of SO(D). To our knowledge, this is the first application of a manifold version of Tukey depth in an applied setting.
3. Assuming well-connectedness and good initialization, the DDS algorithm exactly recovers an underlying signal in the presence of a significant amount of adversarial outliers. This result is given in Theorem 6 and is the first guarantee of robustness to adversarial corruption for a multiple rotation averaging algorithm. This result extends elegantly to sparse random graphs, where we show that it achieves the information theoretic rate with respect to graph sparsity for Erd¨os-R´enyi observation graphs in Section 4.5.
4. We show that this algorithm can be efficiently implemented for SO(3). We run baseline experiments that show it performs competitively on some baseline synthetic data for SO(3) synchronization, which arises in the important application of Structure from Motion. While we carefully review related work later in Section 2, we emphasize here our contributions in terms
of the most relevant works. Again, we emphasize that we study an efficient, nonconvex algorithm for rotation
synchronization that has guarantees for adversarial outliers.
A robustness result for SO(D) synchronization based on semidefinite programming is given in Wang and Singer [2013]. However, the probabilistic model in this work is very restrictive (see details in Section 2), and the proposed method is slow for large n.
Huang et al. [2019] use a truncated least squares framework to do robust rotation synchronization. The truncated least squares framework was originally proposed by Huang et al. [2017] in the context of translation synchronization, and sequentially filters those pairwise measurements that are furthest from the current estimated pairwise measurements. Following this, Huang et al. [2019] show that this can be extended to rotation synchronization. Under an appropriate choice of a thresholding parameter, they demonstrate that it is possible to exactly recover the ground truth in the presence of outliers if a certain generic condition is satisfied. This method has two downsides. First, one must repeatedly compute the lowest eigenvectors of the graph connection Laplacian, which has a higher memory cost for dense graphs than DDS and may also have issues of numerical stability. Second, the bound on the fraction of outliers that they present is not clear in general settings since it depends on the -norm of the pseudoinverse of the graph connection Laplacian. This quantity is hard to control, and so it is unclear how their bound scales with various parameters as well as what it would state for arbitrary outliers. On the other hand, Huang et al. [2017] guarantees success of the truncated least squares method for adversarial outliers when considering translation synchronization in 1-dimension, but we do not see how to extend these to the problem of rotation synchronization.
The only existing result for adversarial robust synchronization was recently given by Lerman and Shi [2019]. They propose a general method, called Cycle-Edge Message Passing (CEMP), for group synchronization that is guaranteed to be robust to adversarial corruption. However, their method uses information from 3-cycles, that is, triangles in the graph, and so it is less efficient than typical multiple rotation averaging schemes by an order of n (the ratio between the number of triangles and the number of edges in the graph). Beyond this, multiple rotation averaging algorithms are also attractive because they are more memory efficient. A caveat to our current work is that our new method is not as efficient as previous multiple rotation averaging algorithms. In particular, we require the computation of a depth-based estimator, and so each rotation update has complexity for SO(3), where
is the number of neighbors of the node to be updated. Therefore, we do not claim that DDS is uniformly most efficient for adversarially robust synchronization in terms of time complexity, although it is more computationally efficient than CEMP for very sparse graphs. However, our depth descent method does still have the benefit of more efficient memory complexity than Lerman and Shi [2019].
Beyond computational efficiency, the theoretical guarantees are also different: we bound the ratio of corrupted edges, whereas Lerman and Shi [2019] bound the ratio of corrupted triangles. The method of Lerman and Shi [2019] degrades with extremely sparse graphs, due to the fact that they need to ensure that it contains sufficiently many triangles, whereas we require a well-connectedness condition of the graph G that extends to extremely sparse cases. Other well-connectedness conditions appear, sometimes implicitly, in works minimizing energy functions [Wang and Singer, 2013, Hand et al., 2018, Lerman et al., 2018, Huang et al., 2017]. Finally, CEMP is tailored to finding the corruption level in the graph, but it does not have complete guarantees for the recovery of the underlying rotations themselves.
1.2 Notation
Bold uppercase letters will be used to denote matrices, while bold lowercase letters will be used to denote vectors. For a set X in a Hilbert space, the convex hull is denoted by conv(X). The sphere in is written as
. For an ordered tuple of n rotations,
SO(D), we write
.
1.3 Structure of the Rest of the Paper
We now outline the structure of this paper. First, we review related work in Section 2. We then discuss the specific case of synchronization over SO(2) in Section 3 and give a simple, adversarially robust algorithm, called Trimmed Averaging Synchronization. Following this, in Section 4 we develop our novel Depth Descent Synchronization algorithm, which utilizes Tukey depth to yield robust updates. Coupled with this, we develop its theoretical guarantees of robustness and convergence. Section 5 presents some baseline experiments demonstrating the practicality of our proposed method.
Interest in the synchronization problem has grown in recent years due to applications in computer vision and image processing, such as SfM [Govindu, 2004, Martinec and Pajdla, 2007, Arie-Nachimson et al., 2012, Hartley et al., 2013, Tron and Vidal, 2009, Ozyesil et al., 2015, Boumal, 2016], cryo-electron microscopy [Wang and Singer, 2013] and Simultaneous Localization And Mapping (SLAM) [Rosen et al., 2019].
The most common formulation for solving rotation and other group synchronization problems involve a non-convex least squares formulation that can be addressed by spectral methods [Singer, 2011] or semidefinite relaxation [Bandeira et al., 2017]. On the other hand, the work of Wang and Singer [2013] uses a semidefinite relaxation of a least absolute deviations formulation to obtain a robust estimate for SO(d) synchronization. They prove recovery for the pure optimizer of this convex problem in a restricted setting. In this setting the full graph is complete, every edge is corrupted with a certain probability p (in the case of SO(2), they require that and for SO(3) they require
) and the corrupted group ratios are distributed uniformly on SO(D). In practice, they advocate using an alternating direction augmented Lagrangian to solve their optimization problem. One may also use methods like the Burer-Monteiro formulation [Boumal et al., 2018], although current guarantees require the rank of the semidefinite program to be at least
, which results in storing iterates much larger than the underlying signal that is a vector of size n [Waldspurger and Waters, 2018]. Another recent work tries to leverage a low-rank plus sparse decomposition for robust synchronization [Arrigoni et al., 2018]. However, this work does not contain robustness guarantees.
2.1 Robust Synchronization Methods
For a survey of robust rotation synchronization, see Tron et al. [2016]. Some early works on rotation synchronization include Govindu [2001, 2006], Martinec and Pajdla [2007], with later follow-up works by Hartley et al. [2013], Chatterjee and Govindu [2013, 2017]. The later works discuss some least absolute deviations based approaches to multiple rotation averaging which we will discuss later. For theoretical foundations on averaging rotations, one can consult Moakher [2002]. For foundational work on optimization on the manifold SO(d), see [Taylor and Kriegman, 1994, Arora, 2009].
Robust multiple rotation averaging algorithms were studied in Hartley et al. [2011] and Hartley et al. [2013]. There, the authors used a least absolute deviations formulation over SO(3) using successive averaging with a Weiszfeld algorithm and a gradient-based algorithm. The authors also give a counterexample that shows that local minima exist and thus the global minimum of their problem may be hard to find in general. However, the authors give no guarantee of convergence or recovery in any setting. Also, we have found that this method may suffer from suboptimal fixed points in general, which we analyze in more detail in a forthcoming work (see also Section 5 of Maunu and Lerman [2020]).
2.2 Adversarially Robust Synchronization
A work that does contain guarantees is that of Lerman and Shi [2019], which considers a message-passing procedure that incorporates consistent information from cycles. This algorithm was guaranteed to be robust for the adversarial setting and applies to any compact group. Although its adversarial setting is very general, it requires a bound on the ratio of corrupted cycles per edge and not on the ratio of corrupted edges. Furthermore, the use of cycles results in a potentially more computationally intensive algorithm than the one in this work that only uses pairwise information.
Guarantees for exact recovery with adversarial, or partially adversarial, corruption appear in few other synchronization problems. The adversarial corruption in synchronization is very special since there is a single choice to corrupt a group ratio. Under a special probabilistic model, Bandeira [2018] established asymptotic and probabilistic exact recovery for the SDP relaxation of the least squares energy function of
synchronization. The model assumes that G([n], E) is an Erd¨os-R´enyi graph with probability p of connection, edges are randomly corrupted with probability q and
. Hand et al. [2018] and Lerman et al. [2018] established asymptotic exact recovery under a probabilistic model for solutions of the different problem of location recovery from pairwise orientations. In this problem ratios of the Euclidean group are normalized to the sphere. They assume an i.i.d. Gaussian generative probabilistic model for the ground truth locations and an Erd¨os-R´enyi model for the graph G([n], E) and further bounded the ratio of maximal degree of
over n. In both works, these bounds approach zero as n approaches infinity, unlike the constant bound of this work.
Robust permutation synchronization was studied by Huang and Guibas [2013], where they give a maximum corruption percentage of 1/4 in the case of fully connected observation graphs. Huang et al. [2017] analyzed a robust algorithm for one-dimensional translation synchronization that uses a truncated least squares formulation. They show that their method achieves a maximum corruption percentage of 1/6 for fully connected graphs. However, their generic condition is rather complicated and in order to interpret it they must make the fully connected assumption and they also restrict the maximal degree of . In both [Huang and Guibas, 2013, Huang et al., 2017], the bounds degrade for sparser observation graphs.
The results of Huang et al. [2017] were extended to the problem of rotation synchronization in Huang et al. [2019]. Here, the authors show that a truncated least squares formulation for rotation synchronization can recover the underlying signal assuming a generic bound that includes the -norm of the pseudoinverse of the graph connection Laplacian. However, this bound is hard to interpret in general.
2.3 Synchronization in Other Settings
In contrast to corrupted settings, some works have considered estimation in a noisy setting. Bandeira et al. [2017] study maximum likelihood estimation of the angular synchronization problem and show that the associated semidefinite relaxation is tight. More recently, message-passing algorithms have been used for maximum likelihood estimation in the Gaussian setting [Perry et al., 2018]. Other recent results leverage multiple phases to obtain better results in noisy settings [Gao and Zhao, 2019]. Minimax estimation under the squared loss over SO(2) is considered in Gao and Zhang [2020], and optimization methods for the squared error over subgroups over the orthogonal group are considered in Liu et al. [2020].
Another related problem over SO(2) is the synchronization of Kuramoto oscillators. In particular, a primary question is the minimal graph connectivity requirement ensuring that the energy landscape is nice. The weakest known requirement is that every vertex is connected to at least 0.7889n other vertices [Lu and Steinerberger, 2019]. The conjectured bound is 0.75n, which is reminiscent of the bound we require for local recovery with adversarial corruption over SO(2).
2.4 Nonconvex Optimization
Optimization problems cast over SO(D) are usually nonconvex. We can think of our method as attempting to solve a nonconvex problem over SO(D) as well. Therefore, our work also fits in with the growing body of work analyzing nonconvex energy landscapes and procedures [Dauphin et al., 2014, Hardt, 2014, Jain et al., 2014, Netrapalli et al., 2014, Yi et al., 2016, Zhang and Yang, 2018, Ge et al., 2015, Lee et al., 2016, Arora et al., 2015, Mei et al., 2018, Ge et al., 2016, Boumal, 2016, Sun et al., 2015b,a, Lerman and Maunu, 2017, Cherapanamjeri et al., 2017, Ma et al., 2018, Maunu et al., 2019].
2.5 Tukey Depth
Finally, we appeal to tangent space depth, that is, using Tukey depth [Tukey, 1974] in the tangent space of the manifold SO(D), to create a provable robust method. Tangent space depth, for a general manifold, first appeared in Mizera [2002], where the author proves existence and depth bounds for maximum tangent depth estimators. Earlier work on Tukey, or halfspace, depth includes Rado [1946], which proves a depth lower bound for general measures, and Danzer et al. [1963], which discusses the relation to Helly’s Theorem. More recently, the classical reference of Donoho and Gasko [1992] proves bounds on the maximum depth achieved in a dataset under ellipticity conditions. Computation of depth contours was considered in Liu [2017], Hammer et al. [2020]. Recently, an interesting connection between depth estimators and generative adversarial networks has been exhibited [Gao et al., 2018], which may perhaps lead to more computationally efficient estimators.
2.6 Notions of Robustness
We finish by clarifying our setting in the context of robustness. In order to quantify our notion of robustness, we introduce the following terminology. Recall that we have an underlying graph G([n], E) corresponding to the pairwise measurements, where E is partitioned into an inlier set, , and outlier set
. For any
, we define its neighborhood as well as its inlier and outlier neighborhoods as
We will denote by the maximum percentage of outliers per node. That is,
is the maximum of
over all
, where throughout the rest of the paper
denotes the number of points in a set and
is the degree of node
.
The following notion of recovery threshold is related to the notion of a breakdown point in robust statistics. However, our goal is somewhat different, since we desire an exact estimator rather than an approximation, as is typically considered for classical breakdown points. This is similar to the notion of RSR breakdown point given in Section 1.1 of Maunu and Lerman [2019].
Definition 1 (Recovery Threshold) The recovery threshold of a robust rotation synchronization algorithm is the largest value of such that the algorithm outputs an estimator
that satisfies (20).
The simplest information-theoretic bound for the recovery threshold is . Indeed, if
, then an adversary could easily choose
to have a subgraph that dominates
with a consistent set of measurements for an alternative signal
. That is, the observations would be
If the adversary chooses the partition and
properly, then one could easily think that
is the true underlying signal. For our method, we obtain recovery thresholds for
that are smaller than 1/2.
On the other hand, the information theoretic bound may be much higher in special models. For example, suppose that G([n], E) is an Erd¨os-R´enyi graph with parameter p. Suppose further that each edge in E is corrupted independently with probability q, and the corrupted measurements are i.i.d. uniform on SO(D). Wang and Singer [2013] call this the uniform corruption model. Then, Singer [2011] and Chen et al. [2016] established the following information theoretic threshold for the rotation synchronization problem:
Notice that one needs to ensure that the underlying Erd¨os-R´enyi graph is connected. For fixed p, notice that one can take q arbitrarily close to 1 (and so
is then very close to 1) as long as n is sufficiently large. We later discuss how our main results extend to this model in Section 4.5, where we show that the DDS algorithm achieves optimal recovery rates with respect to p (i.e., it can tolerate extremely sparse observation graphs).
To begin to build motivation for our method, we consider the case of synchronization over SO(2), where the method becomes considerably simpler due to its 1-dimensional manifold structure. First, Section 3.1 gives definitions of some geometrical objects on SO(2), which we identify with for mathematical convenience. Then, in Section 3.2, we define our SO(2) synchronization method, which we call Trimmed Averaging Synchronization (TAS) and is a special case of our later DDS algorithm. Finally, Section 3.3 discusses the initialization and well-connectedness assumptions and uses these to give an adversarial recovery guarantee for the TAS algorithm.
3.1 The Geometry of
We define a few structures related to the manifold . The tangent space can be identified with R. Let
be a unit direction in the tangent space at
(i.e.,
). The geodesic originating at
in the direction v is given by
. The exponential map and inverse exponential map (logarithm map) on this 1-dimensional manifold are given by
Finally, the cut-locus of a point is defined as the set of points for which there is not a unique geodesic from z. It is not hard to see that this is given by
.
Recall that we seek an underlying signal . Notice that its elements,
for
, can be parameterized by angles,
. This angle is also known as the argument of the complex number, and so we write
, where
. The angular, or geodesic, distance between
and
is
For later reference, we plot the extended angular distance function in Figure 1.
Fig. 1 The angular distance function
Recall that if , then the edge measurement is correct, that is,
, where
is defined analogously to (1). For
, the measurement
is assumed to be an arbitrary element of
. From the measurements
is only identified up to a global rotation, due to the ambiguity that
, and so
generates the same pairwise measurements as
.
To deal with this ambiguity, the following function will be used to demonstrate convergence of a sequence to :
This is again nothing but a function that measures the maximum distance between normalization products. Notice that for some rotation
. Therefore, convergence of
to zero indicates convergence of z to
, and an algorithm exactly recovers
iff
.
3.2 Trimmed Averaging Synchronization
A natural way to solve the rotation synchronization problem involves energy minimization. The simplest strategy [Govindu, 2001, Martinec and Pajdla, 2007] attempts to minimize
A coordinate descent strategy to solve (8) involves updating by solving
Applying this sequentially over the indices j = 1, . . . , n results in the multiple rotation averaging (MRA) discussed in more detail in Hartley et al. [2013]. While such coordinate descent strategies generally lack coordination across all objects like global synchronization methods, they lead to algorithms that are more memory efficient and that can be decentralized easily.
Another simple way to robustify (9) is to select the average of all points that fall within a trimmed set, which results in a trimmed averaging procedure. To account for the 1-dimensional manifold structure of SO(2), we propose to do this trimming in the tangent space, which yields the TAS algorithm. An illustration of one trimmed averaging step is given in Figure 2.
For a discrete and a fraction 0 < p < 1, we write the pth quantile of X by
. It is convenient to define the trimming operator
We also denote the average of a dataset by ave(X). That is, ave
.
Due to the simplified geometry of SO(2), we will show in the following that using this trimmed rotation averaging scheme converges to the underlying solution linearly when the percentage of outliers is at most when G is fully connected. In the case where (G, [n]) is not fully connected, the result is a corollary of our later Theorem 6 under a connectedness assumption on (G, [n]). We note that this fraction is similar to the one given in Lerman and Shi [2019], although there the bound is formulated for corrupted triangles in the graph.
For clarity, we give the TAS algorithm in Algorithm 1. To allow for damping of the updates, we include the step-size parameter . When
, we refer to the algorithm as Damped TAS or DTAS for short.
Fig. 2 Illustration of the TAS algorithm at a fixed step and a fixed node j. The measurement is . After projecting into the tangent space, the outermost points in red are filtered, and the green points are averaged. This trimmed average is then projected back to the manifold.
3.3 Recovery Guarantees for DTAS
We begin by discussing the assumptions that will make a synchronization problem tractable for TAS. The first assumption we require is a good initialization, which is common in the analysis of such nonconvex methods. Assumption 1 The initial set of rotations lies within a
-neighborhood of
: that is, there exists a
such that
Note that this is equivalent to the assumption that .
While corruptions are arbitrary, we require an assumption on the underlying graph (G, [n]). It essentially requires that the graph is sufficiently well connected.
Assumption 2 (-Well-connectedness condition) For a fixed
, for any
such that
n/2, there exists an index
such that
In words, this assumption requires that inside any set of at most n/2 nodes, there is a node that is connected to a significant number of nodes outside this set. The condition in this assumption is equivalent to requiring that
We include a discussion of this condition and its connection with random graphs, conductance, and expanders later in Section 4.5.
For the case of Assumption 2 with , we just call the graph well-connected. While fully connected graphs satisfy this condition, there exist many more examples of graphs satisfying it with
as well, and we give some examples of some simple graphs that meet this assumption in Figure 3.
The following theorem gives the main recovery result for the DTAS algorithm. While the algorithm converges linearly, the rate we derive depends on n and is worst-case. In the few simulations we have run, the algorithm seems to converge at a faster rate that merits more study. Also, a more complicated proof may yield linear convergence in the general case of well-connected G, but for sake of brevity, we only prove it for the fully connected case.
Fig. 3 Examples of graphs that satisfy the well-connectedness condition for n = 4, 5 and 6. In each of these graphs, (12) is satisfied with : that is, all subsets J of size at most n/2, there exists a node
Theorem 3 Suppose that , Assumption 1 holds, G satisfies Assumption 2 with parameter
, and
is the sequence generated by DTAS, for
and
. Then,
, and the algorithm exactly recovers
. Furthermore, in the case where G is fully connected, the DTAS algorithm linearly converges to
.
Proof The proof of convergence under well-connectedness follows from the fact that, when updating index j at iteration t, the selection rule defined by choosing the trimmed average yields a point in the interior of Log
. The proof then follows from the proof of Theorem 6.
To see linear convergence in the fully connected case, we prove that all normalization products contract during each pass over the dataset. Denote
These are the translation of the normalization products to the angular coordinates of the points . Also, define the sets
In this proof, we will write as a shorthand. Notice that we must have
unless , in which case
for some
and z(t) recovers
.
For the update with respect to index j, all good pairwise measurements must lie in . Since there are at least
good measurements, all trimmed points must lie in this interval as well. Therefore, for all
, after updating we have
Using this fact, we will now show that the indices in must move inwards. Indeed, since
n/2, we must have
for all
. Therefore, for each trimmed mean for
, we have
Thus,
After the coordinate update, we have that for all ,
After repeating this argument for all j over the course of an epoch, or pass over all indices j = 1, . . . , n, this yields that
as long as . The width of this interval is
, which yields the desired result.
This section presents a novel algorithm for robust synchronization over the rotation group, SO(D). We assume a fixed observation graph G that encodes which pairwise rotations we observe. The pairwise rotations are written as SO(D), where the good edges
have the associated observation
, and the bad edges are arbitrarily chosen from G and have arbitrarily corrupted measurements.
To proceed, we must make clear our goal for the synchronization problem, since there is a well known ambiguity, similar to the one encountered for SO(2) synchronization in Section 3.1 – we can only recover up to right multiplication by an element of SO(D). This is because, after this multiplication, one arrives at the same pairwise measurements in (1). This is a form of rotational symmetry in the nonconvex problem, which may be leveraged to develop tractable nonconvex programs [Zhang et al., 2020]. Exactly recovering the ground truth measurements
SO
up to right multiplication by
SO(D) is equivalent to finding a set of rotations
such that
for some SO(D). We refer to the set of rotations
, as normalization products since, when
, they reveal the normalization factor that multiplies each element of
from the right. One could extend this discussion to the case of approximate recovery by requiring that the normalization products are approximately equal.
With our goal now in mind, we begin in Section 4.1 by discussing the geometry of the manifold SO(D) and presenting some basic geometric results that will be used in our main theorem. Following this, Section 4.2 reviews the concept of halfspace depth from robust statistics, which will be the core tool that we use to construct our algorithm. In Section 4.3, we give outline the DDS algorithm. Then, in Section 4.4, we give the theoretical guarantees that constitute the main innovations of this work. We finish in Section 4.5 with a discussion of the assumptions we make in our theorem.
4.1 The Manifold Structure of SO(D)
The rotation synchronization problem is obviously a robust recovery problem on the product Riemannian manifold SO. Therefore, in the following, we freely use concepts from Riemannian geometry, and specifically those concepts related to the geometry of SO(D).
4.1.1 Riemannian Geometry of SO(D)
The set of rotations SO(D) is a -dimensional Lie group that has a natural Riemannian structure. The bi-invariant distance metric d : SO
SO
is given by
where log is the matrix logarithm. The corresponding Lie algebra is so(D), the set of skew-symmetric matrices. The tangent space of SO(D) at
SO(D) is
Notice that every tangent vector SO(D) has a corresponding element of so(D), which we denote by
. The corresponding Riemannian metric (which is an inner product and thus should not be confused with a distance metric) for
SO(D) is given by
Tr
Tr
. Equipped with this metric, SO(D) is a Riemannian manifold with nonnegative sectional curvature. An open ball with respect to the metric d is written as B(R, r), where the radius is r and the center is R. Its closure and boundary are B(R, r) and
, respectively.
The exponential map is given by
where exp is the matrix exponential. The logarithmic map is the inverse of this:
The geodesic between SO(D) is written as
Exp
Log
, for
. In the following, we use the notation for a halfspace of
SO(D),
4.1.2 Local Convexity Properties of SO(D)
We continue by recalling some local convexity properties of manifolds like SO(D). The following result on the convexity of sufficiently small balls is standard in the literature [Karcher, 1977, Afsari, 2009, Petersen, 2016].
Theorem 4 [Convexity of small balls] In a closed ball SO(D) with
, the squared distance metric
is strictly convex. This implies, in particular, that for all
,
for all
.
Note that the closed ball in the previous theorem has the property that the interior of any nonconstant geodesic lies strictly in the interior of the ball. The following result is also readily apparent. It states that, for boundary points on a sufficiently small ball in SO(D), all interior directions are contained in a halfspace.
Corollary 1 Let B be a ball on SO(D) with radius and
. Then, there is a halfspace
SO(D) such that Log
.
An important component of our later theoretical results relies on showing that the radius of the smallest ball containing a set of rotations shrinks. The final lemma of this section states that if a discrete set of rotations R is contained in a ball, and if half of the ball contains no boundary measurements, then the set R is actually contained in a ball of smaller radius. Thus, for such a set of rotations, this gives us a sufficient condition for decreasing the radius of the smallest containing ball.
Then, R is contained in a ball with radius less than r.
Proof Let c(t) be the geodesic Exp. We claim that, for t sufficiently small,
, which is an open ball. Heuristically, one should expect this to be true, since the closed halfspace
contains no boundary points, and so moving the center a small amount in the v direction keeps all points within the ball.
By the first order approximation to and since
Log
, we have
for t sufficiently small. On the other hand, since
the distance to C over all Exp
is bounded away from r. By continuity of d(c(t), S) for all
, this implies that there is an
such that
Putting (25) and (26) together implies that a small shift of the ball results in a new center such that r. In turn, this means that all points of R lie in a ball
with
4.2 Tukey Depth and its Properties
We will use the concept of Tukey depth to determine descent directions on the manifold SO, although other notions of depth could potentially be used as well (see Ch. 58 of Toth et al. [2017] for a discussion of different notions of depth). In Euclidean space, the Tukey depth of a point
in a dataset X =
is given by
The depth is therefore the minimum number of points contained in any halfspace that has x in its separating hyperplane. A natural robust estimator is then the point of maximum depth, which is also called the Tukey median. The -depth level set for
is defined by
This level set is convex and compact, and its boundary is made up of hyperplanes defined by sets of D points [Liu et al., 2019]. This function will be used in the construction of our algorithm.
As an example, consider the 1-dimensional dataset . Here, the formulation of depth is quite simple:
With this in mind, the -depth level set is
where
denotes the ith order statistic. The Tukey median in this case is just the median.
We recall the following theorem, which bounds the maximum possible depth within a general dataset. Notice that, in particular, this guarantees that the depth level set is nonempty for all
.
Proposition 1 (Rado [1946]) Suppose that X is a set of n points. Then, the maximum depth in X is bounded below by .
A particularly useful property of Tukey depth is that it is affine equivariant, that is, it is stable under affine transformations.
Lemma 2 (Donoho and Gasko [1992])
This implies, in particular, that things behave nicely if we change the inner product on .
With this property, we can map between the depth regions in and
by
This affine equivariance implies, in particular, that Proposition 1 extends to datasets in tangent spaces of manifolds.
We finish with a simple lemma on depth level sets that will be the key to the robustness guarantee for our later algorithm. In simple words, this lemma guarantees sufficient conditions for a depth level set to contain a nonzero value in a dataset containing many zeros. In the following, datasets are represented by multisets and may contain duplicate points. For a halfspace , we write its separating hyperplane as L(0, v).
Lemma 3 Suppose that we have a dataset and a subset
that satisfies the properties i)
, ii) There exists closed halfspace
such that
, and iii) the only points of Y in L(0, v) are 0. Then,
{0}). Beyond this, if
, then
.
Proof It is obvious by the properties of depth that , since any point on the boundary of conv(Y) has depth less than
. By Proposition 1, there is a point of depth at least
, and so
is nonempty.
Suppose that less than points in Y are zero, that is,
. We claim that
. Define the auxiliary set
, that is, Z removes the 0 values in Y from X. Since
, we have that
. Within Z, there is a point
of depth at least m/(D + 1). Further, since
and
, we have
which implies that must lie in H(0, v). On the other hand, since
, we have that
which means that is a point of depth at least
in
4.3 Depth Descent Synchronization
We now use the results of the previous sections to derive the DDS algorithm.
4.3.1 The General DDS Algorithm
We assume a selection rule on convex, compact subsets of
SO(D) for all
SO(D). In particular, for our theorem, we assume that this selection rule chooses a nonzero point from the convex set if possible, and otherwise outputs zero. To select this point, one selection rule could be to take a point uniformly at random, or another could be to take the center of mass. As our theorem makes clear, the choice of this selection rule does not affect the exact recovery result, but it may change convergence rates. Since we do not give quantitative convergence rates in this work, we leave this choice as arbitrary.
In any case, given a selection rule over such subsets of
SO(d), suppose our estimated rotations at time t are (R(t)). Our update direction at this time for index j = t mod n is defined by
In words, the direction is a direction in the tangent space at
that is sufficiently deep with respect to the neighbor measurements {Log
. Given
, the algorithm updates
for a chosen step size . Our theory below restricts this step size according to Theorem 4.2 of Afsari et al. [2013]: in the case of D = 2 or 3, one can take
, while for D > 3 the upper bound is more restrictive (the reader can consult the discussion in Afsari et al. [2013] for a more thorough discussion of the bound). For sake of clarity, we write the full DDS algorithm in Algorithm 2.
One benefit of DDS is that there is no need to tune the step size, which is due to the local convexity properties of the manifold. As we outline in our main theorem, the step size can be directly selected from the guidance of Afsari et al. [2013].
As for computational complexity, at least in low-dimensions, depth regions can be calculated efficiently for small datasets [Liu et al., 2019]. In particular, the most straightforward algorithm involves an exhaustive search over hyperplanes spanned by D-subsets of X that cut off points in X. The time complexity of this method for
is
, and so could be used for moderately sized datasets. Translating this to our problem in SO(3), the time complexity for updating
is
, and so we see that this method is efficient for sparser graphs. In an Erd¨os-R´enyi model, the complexity to update all n rotations is
, where p is the Erd¨os-R´enyi parameter.
As discussed in the introduction, the time complexity of the DDS algorithm is not necessarily more efficient than that of Lerman and Shi [2019]. Indeed, the complexity of their message-passing algorithm is for a single update to all rotations, while for our method it is
for SO(3) (and much larger for higher dimensions). Therefore, DDS has better complexity for sparse graphs, while Lerman and Shi [2019] has better complexity for dense graphs. On the other hand, our complexity is uniformly better in SO(2), where depth contours can be easily found in O(n log(n)) by sorting, and it thus takes
time to update all nodes. Finally, we also note that our method has better scaling in terms of memory usage: the multiple rotation averaging scheme takes
memory while the message-passing scheme takes
.
4.3.2 The Approximate DDS Algorithm
To make the DDS algorithm more computationally efficient, we employ a few strategies to develop an approximate DDS algorithm.
First, so that we do not need to resort to computing full depth contours, we instead take the average of the deepest points in the set {Logas our update direction at each iteration.
Second, to avoid computation of the full depth of every point in this set, we instead use an approximation of depth based on sampling. Suppose that we wish to approximate the depth of the vectors in with respect to Y. We can sample a set of vectors
. Then, for each
, the approximation of depth,
, is
Notice that replaces the minimum over
in (27) by the minimum over the discrete set of vectors
. For
(which corresponds to the tangent space for SO(3)), computation of the full depth for all points would take
time, where for each
one would need to search over all planes defined by triplets
, for distinct i, j, k. On the other hand, the computation of the approximate depth using (34) takes
and can be more efficiently implemented due to the fact that the
are shared between all
.
The approximate DDS algorithm is given in Algorithm 3. Here, we use the notation SO(D) for the set of all unit vectors in
SO(D).
4.4 Exact Recovery for DDS
For many nonconvex methods, good initialization is quite important [Li et al., 2019, Maunu et al., 2019, Qu et al., 2019, Chi et al., 2019, Qu et al., 2020]. We only obtain a recovery result for DDS if we can initialize in a suitable neighborhood of .
Definition 2 A set of n rotations (R) lies within a -neighborhood of
if the normalization products
all lie in a ball of radius
. That is, there exists a
SO(D) such that
With this terminology, our main assumption is that that we can initialize in a -neighborhood of
. This is the analog of Assumption 1 for SO(D).
Assumption 5 The initial set of rotations (R(0)) for our algorithm lies within a -neighborhood of
.
As we discuss in Section 4.5, we believe that this assumption is not so restrictive, and we later give some intuition for how this might occur in a real scenario.
The following theorem constitutes the main theoretical result of this work. It states that, with proper initialization and well-connectedness, the recovery threshold of Algorithm 2 is . As examples when
, in the case of SO(2), Theorem 6 yields a corruption threshold of
. In the case of SO(3), Theorem 6 yields a corruption level of
.
Theorem 6 Suppose that , Assumptions 5 and 2 hold, and
is generated by (33) with
. Further assume in the case of D = 2, 3 that
, and in the case of D > 3 that
is chosen according to Theorem 4.2 of Afsari et al. [2013]. Then,
0 for all j, k, and the DDS algorithm exactly recovers
.
Proof (Proof of Theorem 6) To aid in the proof, we denote the smallest ball enclosing our normalization products as
The center of B(t) is C(t) and its radius is r(B(t)). Our goal will be to show that .
The proof of the theorem is broken into three parts. In the first part, we prove that the sequence remains in a nested sequence of balls. In the second part, we show that, after sufficiently many iterations, the radius of the smallest enclosing ball must shrink. We finish in the third part by appealing to a general convergence theorem for monotonic algorithms.
Part I: : First, we show that at time t, no matter which index is updated, the normalization products remain in B(t). This is true at t = 0, so assume that it is true at a time t. Let j = t mod n and consider the pairwise measurements in the tangent space at
: for each
, the corresponding point in the tangent space is given by Log
. By assumption, we have that
for all k. Since
and
,
since the set in the right-hand side of the display contains more than a fraction of points. We can now apply Theorem 3.7 of Afsari et al. [2013]. This follows from the fact that the update direction
is the gradient of the Frech´et mean function for a weighted combination of {Log
(since it lies in the convex hull of of these points). More formally, letting
, since
conv({Log
, there exist weights
such that
Therefore, choosing the step size as in Afsari et al. [2013] implies that , and further that
when
. In turn, this implies that
and, if
, then
as well (i.e., interior points cannot move to the boundary).
In the case of SO(3), Theorem 3.7 of Afsari et al. [2013] tells us that choosing suffices. The case of SO(D) for D > 3 is dealt with in a similar way using Theorem 4.2 of Afsari et al. [2013].
Part II: : We now must show that after sufficiently many iterations, the radius of B(t) strictly decreases. To this end, fix a time s. At this time, at least one normalization product
must lie on the boundary
. For convenience, define the index set J(s) of boundary rotations at time s by
We will show that there exists a such that at some future time
,
To this end, pick a direction w uniformly at random from SO(D) such that
. This vector separates
into two halfspaces, and thus partitions B(s) into two halves. One of these halves contains at most n/2 points Log
, and we will denote the corresponding halfspace of
SO(D) by H. Since the direction w is chosen uniformly at random, there are no points on the boundary of this halfspace with probability 1.
Let K(s) denote the set
that is, the set of indices corresponding to boundary normalization products at time s. At each time t = s+m for m > 0, if none of the normalization products , lie in
, then set
and we can apply Lemma 1 to yield (36).
Otherwise, by Assumption 2, there is at least one index such that
is in
and
which yields that and
. Thus, for sufficiently small
. Repeating this sequentially for all elements of K(s), there must exist a
such that
We are left in a situation where , and there exists a w such that
contains no boundary points. By appealing to Lemma 1, we know that
lies in a ball of radius smaller than r(B(s)), and therefore r(B(s)) has a strictly monotonic subsequence.
Part III: Strict monotonicity implies convergence: By Mizera [2002], as long as 1), the point to set mapping
is non-empty and outer semicontinuous with respect to the empirical measure on
Therefore, the associated algorithm (33) is upper semicontinuous in the sense of Theorem 3.1 of Meyer [1976], and we obtain convergence of R(t) to a fixed point of (33).
We finish by examining fixed points of the algorithm (33). Suppose that R is a fixed point of this sequence, such that as defined by (32) is zero. The fixed point is characterized by
for at least
measurements k. By
-well-connectedness, for all subsets J of size at most n/2, there is an index j such that
, because otherwise Lemma 3 would yield a nonzero
update direction. Thus, there is an index such that
. This implies that
for all
4.5 Discussion of Assumptions
The -well-connectedness condition in Assumption 2 bears some similarity to the notions of conductance and graph expansion. From the perspective of graph theoretical results, we note that a sufficient condition for Assumption 2 with
is for the conductance of the graph to be greater than or equal to 1/2. This follows from a simple pigeonhole argument. It is unclear if this condition holds for Erd¨os-R´enyi graphs. Indeed, if one uses Cheeger’s inequality, one would need the spectral gap to be greater than or equal to 1, but for Erd¨os-R´enyi graphs one only expects this gap to concentrate around 1 in practice [Hoffman et al., 2021].
A sufficient condition for (13) is for the conductance to be bounded below by , which can be achieved with high probability for any fixed
by Erd¨os-R´enyi graphs when
(see, for example, Hoff- man et al. [2021]), as well as expander graphs (see, for example, Friedman et al. [2003]).
Our result in Theorem 6 holds with high probability for the uniform corruption model discussed in Section 2.6 with and
. This means that DDS achieves the information theoretically optimal rate with respect to p in this model. Indeed, we can read (4) as
.
Assumption 5 requires that we initialize the DDS algorithm so that the normalization products lie in sufficiently small ball. This can be achieved in practice for cameras whose orientations lie close enough together. That is, suppose that all of the rotations lie in
for all j, for some S and
. Then, if the initial point for the DDS algorithm is chosen to be (I, . . . , I), then it is not hard to see that
which directly shows that Assumption 5 holds. Notice that is a large ball that essentially makes up half of the manifold SO(3), since the distance from any point to its cut-locus is
. For example, if one considers reconstruction of an object from many images taken from points on a sphere that surrounds this object, then our requirement would essentially boil down to needing all of the images being taken from a single hemisphere.
We conjecture that one can weaken this initialization condition, to only require that neighboring normalization products are close to each other, but we leave weakening of this assumption to future work.
The algorithms we compare with are MRA and L1-MRA [Hartley et al., 2013], IRLS after L1-MRA initialization [Chatterjee and Govindu, 2013], LTS [Huang et al., 2019], CEMP [Lerman and Shi, 2019], and MPLS [Shi and Lerman, 2020]. Default parameters of all methods are used. For CEMP, the method computes the corruptions levels first to determine which edges are most corrupted. Then, using these corruption levels, it finds a minimum spanning tree, from which one can fix and then propagate from this to find the other rotations along this tree. For LTS, we implement the truncation step with parameter
and run for 40 iterations. The approximate DDS algorithm is run for 40 epochs (or passes over the data, which means we take T = 40n) with a step size of
and number of depth vectors m = 20.
We compute the distance between the estimated rotations and
by first aligning them by solving
The error is then computed as
All algorithms take less than a minute to run on each individual dataset on a Macbook Air with a 1.6 GHz Dual-Core Intel Core i5 and 8 GB of RAM.
Figure 4 presents a first comparison of these algorithms on synthetic data. The model is the uniform corruption model, which is discussed in Section 2.6. The graph is Erd¨os-R´enyi on n = 100 nodes with varying parameter p, and each edge on this graph is corrupted with probability q. The underlying rotations, , are distributed uniformly on SO(D). The bad measurements,
, are also uniformly dis- tributed on SO(D). For each set of parameters (p = 0.1, 0.2, . . . , 0.5 and q = 0.05, 0.1, . . . , 0.3), 10 datasets are generated and the color represents the mean of the
-errors over these experiments. As we can see, the approximate DDS algorithm performs on par with the other most competitive methods (CEMP and MPLS).
Fig. 4 Rotation synchronization experiment with uniform outliers. Here, p is the parameter of the Erd¨os-R´enyi graph, and q is the percentage of corrupted edges. The underlying rotations are distributed uniformly in SO(D), and the corrupted measurements of group ratios are also uniform on SO(D). The color represents the mean of the -errors over the 10 generated datasets.
As a second experiment, Figure 5 presents a more challenging adversarial example on synthetic data. Here, the outliers form a consistent set of measurements themselves, and a similar corruption model is discussed in Section 7.3 of Lerman and Shi [2019], although here we extend this to SO(3) and use a different model for the underlying rotations.
The graph is an Erd¨os-R´enyi graph on n = 50 nodes with parameter p, and each edge on this graph is corrupted with probability q. The ground truth rotations approximately come from a geodesic on SO(3), and the outliers are self-consistent measurements that come from (approximately) another geodesic on SO(3). The ground truth rotations are
where v is a fixed vector drawn uniformly from the sphere, , and
. The outliers generated by pairwise measurements between another set of rotations
where again is a fixed vector drawn uniformly from the sphere,
, and
. As before, for each set of parameters p and q, 10 datasets are generated and the color represents the mean of the
-errors over these experiments. As we can see again, the approximate DDS algorithm performs well in this experiment, and in fact it performs on par with the most competitive rotation synchronization algorithms (CEMP and MPLS).
In both experiments, we note that the other competitive algorithms are CEMP and MPLS [Lerman and Shi, 2019, Shi and Lerman, 2020]. As mentioned earlier, CEMP, and thus also MPLS that uses ideas of CEMP, have higher memory cost than DDS.
Fig. 5 Rotation synchronization experiment with adversarial outliers. Here, p is the parameter of the Erd¨os-R´enyi graph, and q is the percentage of corrupted edges which are uniformly distributed across this graph. The underlying rotations follow the model in (40), and the corrupted measurements are pairwise measurements between rotations generated by the separate set (41). The color represents the mean of the -errors over the 10 generated datasets.
In this work, we developed the first adversarial robustness guarantees for a multiple rotation averaging algorithm. Our novel algorithm relies on finding descent directions using Tukey depth in the tangent space of SO(D). To our knowledge, this represents the first application of manifold Tukey depth in an applied setting. In the case of D = 2 and D = 3, which most frequently arise in practice, our recovery thresholds are 1/4 and 1/8, and the algorithm can be implemented efficiently. We also show how to speed up the algorithm with some approximations, and this approximate algorithm performs competitively on simple synthetic experiments. Future work should also examine if it is possible to extend the analysis to the more practical approximate DDS algorithm.
Another direction for future work is to examine the limits of our analysis. In particular, it would be interesting to know if tighter analyses can yield larger recovery thresholds. At least for the cases of SO(2) and SO(3), which arise in applications, the depth descent estimator discussed in this paper has significant recovery thresholds, while also being computationally tractable. It is not clear what the optimal bounds for recovery with adversarial corruption are in general. Furthermore, if one moves away from adversarial corruption and instead considers special models of data like the uniform corruption model, the bounds could be much better.
Two more concrete directions for future work would be to carry out further examination of the -well-connectedness condition in Assumption 2. In particular, it would be interesting to see if it can be relaxed at all, what its implications are, and when it actually holds.
Finally, perhaps the most important direction for future work is to give theoretically justified algorithms for a larger range of algorithms employed for SfM [ ¨Ozyes¸il et al., 2017, Bianco et al., 2018]. Indeed, such theoretical work can lead to new and improved algorithms and also to the development of novel state-of-the-art pipelines.
B. Afsari, R. Tron, and R. Vidal. On the convergence of gradient descent for finding the Riemannian center of mass. SIAM Journal on Control and Optimization, 51(3):2230–2260, 2013.
B. Afsari. Means and averaging on Riemannian manifolds. PhD thesis, University of Maryland, College Park, 2009.
M. Arie-Nachimson, S. Z. Kovalsky, I. Kemelmacher-Shlizerman, A. Singer, and R. Basri. Global motion estimation from point matches. In 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pages 81–88. IEEE, 2012.
R. Arora. On learning rotations. In Advances in neural information processing systems, pages 55–63, 2009.
S. Arora, R. Ge, T. Ma, and A. Moitra. Simple, efficient, and neural algorithms for sparse coding. arXiv preprint arXiv:1503.00778, 2015.
F. Arrigoni, B. Rossi, P. Fragneto, and A. Fusiello. Robust synchronization in SO(3) and SE(3) via low-rank and sparse matrix decomposition. Computer Vision and Image Understanding, 174:95–113, 2018.
A. S. Bandeira, N. Boumal, and A. Singer. Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Mathematical Programming, 163(1-2):145–167, 2017.
A. S. Bandeira. Random Laplacian matrices and convex relaxations. Foundations of Computational Mathematics, 18(2):345–379, 2018. doi: 10.1007/s10208-016-9341-9.
S. Bianco, G. Ciocca, and D. Marelli. Evaluating the performance of structure from motion pipelines. Journal of Imaging, 4(8):98, 2018.
N. Boumal. Nonconvex phase synchronization. SIAM Journal on Optimization, 26(4):2355–2377, 2016.
N. Boumal, V. Voroninski, and A. S. Bandeira. Deterministic guarantees for Burer–Monteiro factorizations of smooth semidefinite programs. arXiv preprint arXiv:1804.02008, 2018.
A. Chatterjee and V. M. Govindu. Robust relative rotation averaging. IEEE transactions on pattern analysis and machine intelligence, 40(4):958–972, 2017.
A. Chatterjee and V. M. Govindu. Efficient and robust large-scale rotation averaging. In Proceedings of the IEEE International Conference on Computer Vision, pages 521–528, 2013.
Y. Chen, C. Suh, and A. J. Goldsmith. Information recovery from pairwise measurements. IEEE Transactions on Information Theory, 62(10):5881–5905, 2016.
Y. Cherapanamjeri, P. Jain, and P. Netrapalli. Thresholding based outlier robust PCA. In COLT, pages 593–628, 2017.
Y. Chi, Y. Lu, and Y. Chen. Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019.
L. Danzer, B. Gr¨unbaum, and V. Klee. Helly’s theorem and its relatives. In Proc. Symp. Pure Math., volume 7, pages 101–180. Amer. Math. Soc., 1963.
Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in neural information processing systems, pages 2933–2941, 2014.
D. L. Donoho and M. Gasko. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. The Annals of Statistics, 20(4):1803–1827, 1992.
J. Friedman et al. Relative expanders or weakly relatively ramanujan graphs. Duke Mathematical Journal, 118(1):19–35, 2003.
C. Gao and A. Y. Zhang. Exact minimax estimation for phase synchronization. arXiv preprint arXiv:2010.04345, 2020.
C. Gao, J. Liu, Y. Yao, and W. Zhu. Robust estimation and generative adversarial nets. arXiv preprint arXiv:1810.02030, 2018.
T. Gao and Z. Zhao. Multi-frequency phase synchronization. arXiv preprint arXiv:1901.08235, 2019.
R. Ge, F. Huang, C. Jin, and Y. Yuan. Escaping from saddle points—online stochastic gradient for tensor decomposition. In Proceedings of The 28th Conference on Learning Theory, pages 797–842, 2015.
R. Ge, J. D. Lee, and T. Ma. Matrix completion has no spurious local minimum. arXiv preprint arXiv:1605.07272, 2016.
V. M. Govindu. Combining two-view constraints for motion estimation. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, volume 2, pages II–II. IEEE, 2001.
V. M. Govindu. Lie-algebraic averaging for globally consistent motion estimation. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 1, pages I–I. IEEE, 2004.
V. M. Govindu. Robustness in motion averaging. In Asian Conference on Computer Vision, pages 457–466. Springer, 2006.
H. L. Hammer, A. Yazidi, and H. Rue. Estimating tukey depth using incremental quantile estimators. arXiv preprint arXiv:2001.02393, 2020.
P. Hand, C. Lee, and V. Voroninski. Exact simultaneous recovery of locations and structure from known orientations and corrupted point correspondences. Discrete & Computational Geometry, 59(2):413–450, 2018.
M. Hardt. Understanding alternating minimization for matrix completion. In FOCS, pages 651–660. IEEE, 2014.
R. Hartley, J. Trumpf, Y. Dai, and H. Li. Rotation averaging. International journal of computer vision, 103 (3):267–305, 2013.
R. Hartley, K. Aftab, and J. Trumpf. L1 rotation averaging using the weiszfeld algorithm. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3041–3048. IEEE, 2011.
C. Hoffman, M. Kahle, and E. Paquette. Spectral gaps of random graphs and applications. International Mathematics Research Notices, 2021(11):8353–8404, 2021.
Q.-X. Huang and L. Guibas. Consistent shape maps via semidefinite programming. In Computer Graphics Forum, volume 32, pages 177–186. Wiley Online Library, 2013.
X. Huang, Z. Liang, C. Bajaj, and Q. Huang. Translation synchronization via truncated least squares. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 1459–1468, 2017.
X. Huang, Z. Liang, X. Zhou, Y. Xie, L. J. Guibas, and Q. Huang. Learning transformation synchronization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8082– 8091, 2019.
P. Jain, A. Tewari, and P. Kar. On iterative hard thresholding methods for high-dimensional m-estimation. In Advances in Neural Information Processing Systems, pages 685–693, 2014.
H. Karcher. Riemannian center of mass and mollifier smoothing. Communications on pure and applied mathematics, 30(5):509–541, 1977.
J. D. Lee, M. Simchowitz, M. I. Jordan, and B. Recht. Gradient descent converges to minimizers. University of California, Berkeley, 1050:16, 2016.
G. Lerman and T. Maunu. Fast, robust and non-convex subspace recovery. Information and Inference: A Journal of the IMA, 7(2):277–336, 2017.
G. Lerman and Y. Shi. Robust group synchronization via cycle-edge message passing. arXiv preprint arXiv:1912.11347, 2019.
G. Lerman, Y. Shi, and T. Zhang. Exact camera location recovery by least unsquared deviations. SIAM Journal on Imaging Sciences, 11(4):2692–2721, 2018.
X. Li, S. Ling, T. Strohmer, and K. Wei. Rapid, robust, and reliable blind deconvolution via nonconvex optimization. Applied and computational harmonic analysis, 47(3):893–934, 2019.
H. Liu, M.-C. Yue, and A. So. A unified approach to synchronization problems over subgroups of the orthog- onal group. arXiv preprint arXiv:2009.07514, 2020.
X. Liu. Fast implementation of the tukey depth. Computational Statistics, 32(4):1395–1410, 2017.
X. Liu, K. Mosler, and P. Mozharovskyi. Fast computation of tukey trimmed regions and median in dimension p > 2. Journal of Computational and Graphical Statistics, 28(3):682–697, 2019.
J. Lu and S. Steinerberger. Synchronization of Kuramoto oscillators in dense networks. arXiv preprint arXiv:1911.12336, 2019.
C. Ma, K. Wang, Y. Chi, and Y. Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion. In PMLR, volume 80, pages 3345– 3354, 10–15 Jul 2018.
D. Martinec and T. Pajdla. Robust rotation and translation estimation in multiview reconstruction. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007.
T. Maunu and G. Lerman. Robust subspace recovery with adversarial outliers. arXiv preprint arXiv:1904.03275, 2019.
T. Maunu, T. Zhang, and G. Lerman. A well-tempered landscape for non-convex robust subspace recovery. Journal of Machine Learning Research, 20(37):1–59, 2019.
T. Maunu and G. Lerman. Depth descent synchronization in SO(D). arXiv preprint arXiv:2002.05299 v2, 2020.
S. Mei, Y. Bai, and A. Montanari. The landscape of empirical risk for nonconvex losses. The Annals of Statistics, 46(6A):2747–2774, 2018.
R. R. Meyer. Sufficient conditions for the convergence of monotonic mathematical programming algorithms. J. Comput. System Sci., 12:108–121, 1976.
I. Mizera. On depth and deep points: a calculus. The Annals of Statistics, 30(6):1681–1736, 2002.
M. Moakher. Means and averaging in the group of rotations. SIAM journal on matrix analysis and applications, 24(1):1–16, 2002.
P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P. Jain. Non-convex robust pca. In Advances in Neural Information Processing Systems, pages 1107–1115, 2014.
O. Ozyesil, A. Singer, and R. Basri. Stable camera motion estimation using convex programming. SIAM Journal on Imaging Sciences, 8(2):1220–1262, 2015.
O. ¨Ozyes¸il, V. Voroninski, R. Basri, and A. Singer. A survey of structure from motion*. Acta Numerica, 26: 305–364, 2017.
A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra. Message-passing algorithms for synchronization prob- lems over compact groups. Communications on Pure and Applied Mathematics, 71(11):2275–2322, 2018.
P. Petersen. Riemannian geometry, volume 171. Springer, 3rd edition, 2016.
Q. Qu, Y. Zhang, Y. Eldar, and J. Wright. Convolutional phase retrieval via gradient descent. IEEE Transactions on Information Theory, 66(3):1785–1821, 2019.
Q. Qu, Z. Zhu, X. Li, M. Tsakiris, J. Wright, and R. Vidal. Finding the sparsest vectors in a subspace: Theory, algorithms, and applications. arXiv preprint arXiv:2001.06970, 2020.
R. Rado. A theorem on general measure. Journal of the London Mathematical Society, 1(4):291–300, 1946.
D. M. Rosen, L. Carlone, A. S. Bandeira, and J. J. Leonard. Se-sync: A certifiably correct algorithm for synchronization over the special euclidean group. The International Journal of Robotics Research, 38 (2-3):95–125, 2019.
Y. Shi and G. Lerman. Message passing least squares framework and its application to rotation synchroniza- tion. In International Conference on Machine Learning, pages 8796–8806. PMLR, 2020.
A. Singer. Angular synchronization by eigenvectors and semidefinite programming. Applied and computational harmonic analysis, 30(1):20–36, 2011.
J. Sun, Q. Qu, and J. Wright. Complete dictionary recovery over the sphere. In Sampling Theory and Applications (SampTA), 2015 International Conference on, pages 407–410, May 2015a. doi: 10.1109/ SAMPTA.2015.7148922.
J. Sun, Q. Qu, and J. Wright. When are nonconvex problems not scary? arXiv preprint arXiv:1510.06096, 2015b.
C. J. Taylor and D. J. Kriegman. Minimization on the Lie group SO(3) and related manifolds. Yale University, 16:155, 1994.
C. D. Toth, J. O’Rourke, and J. E. Goodman. Handbook of discrete and computational geometry. Chapman and Hall/CRC, 2017.
R. Tron and R. Vidal. Distributed image-based 3-d localization of camera sensor networks. In Decision and Control, 2009 held jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009. Proceedings of the 48th IEEE Conference on, pages 901–908. IEEE, 2009.
R. Tron, X. Zhou, and K. Daniilidis. A survey on rotation optimization in structure from motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 77–85, 2016.
J. W. Tukey. T6: Order statistics, in mimeographed notes for statistics 411. Department of Statistics, Princeton University, 1974.
I. Waldspurger and A. Waters. Rank optimality for the Burer-Monteiro factorization. arXiv preprint arXiv:1812.03046, 2018.
L. Wang and A. Singer. Exact and stable recovery of rotations for robust synchronization. Information and Inference, 2013.
L. Wang and A. Singer. Exact and stable recovery of rotations for robust synchronization. Information and Inference, 2013. doi: 10.1093/imaiai/iat005.
X. Yi, D. Park, Y. Chen, and C. Caramanis. Fast algorithms for robust PCA via gradient descent. In NIPS, pages 4152–4160, 2016.
T. Zhang and Y. Yang. Robust PCA by manifold optimization. Journal of Machine Learning Research, 19 (80):1–39, 2018.
Y. Zhang, Q. Qu, and J. Wright. From symmetry to geometry: Tractable nonconvex problems. arXiv preprint arXiv:2007.06753, 2020.