Accurate reconstruction of 3D scenes from multiview stereo images is one of the primary goals of computer vision. Current techniques use point correspondences to estimate either the essential or fundamental matrices between pairs of images, and then use the estimated matrices to recover the camera matrices and structure. Notable success was achieved when sequential methods were introduced [1, 20]. These methods first recover camera matrices and structure from two images. Then, adding one image at a time, they apply bundle adjustment to estimate the camera matrix (and structure) of the new image. Recent work attempts to further improve recovery by considering simul-
Figure 1: Illustration of our rank constraint. Collections of fundamental matrices estimated for pairs of images (top) are arranged in a matrix
(bottom). This matrix should be equal (up to noise) to a matrix F or properly scaled fundamental matrix, which in turn forms the symmetric part of a rank 3 matrix A.
taneously subsets of multiple images and recovering camera matrices that are consistent over the entire subsets. Indeed a number of papers have focused on the consistent recovery of either camera orientation or location [2, 19, 18, 23, 24, 16].
This paper introduces new constraints to enable the consistent recovery of fundamental and essential matrices. This is potentially advantageous since those matrices capture simultaneously the location and orientation of the cameras, along (in the case of fundamental matrices) with their internal calibration parameters. For configurations of cameras that are not all collinear, our main result establishes that, when scaled properly, the matrix formed by appending all pairwise fundamental matrices in a multiview setting is of rank 6. More tightly, this matrix forms the symmetric part of a rank 3 matrix whose factors relate directly to the entries of the corresponding camera matrices. We further show that multiview settings of collinear cameras yield a rank 4 matrix.
We use this characterization to develop an optimization formulation for estimating consistent sets of fundamental matrices. Our formulation can accept sets of estimated fundamental matrices in which some are noisy, some are outliers, and some cannot be estimated at all from image pairs (i.e., missing data). In solving this optimization we seek a set of scaled fundamental matrices that satisfy our constraints and fit the estimated fundamental matrices. Our formulation uses an L1 cost function, which is optimized with Iterative Re-weighted Least Squares (IRLS) [12], to remove outliers, and uses Alternate Direction Method of Multipliers (ADMM) [4] to incorporate rank constraints.
Our work is related to a variety of approaches to structure from motion (SfM) that utilize rank constraints. Tomasi and Kanade showed that under an orthographic projection, and after centering, projected points form a rank 3 matrix. Sturm and Triggs [21, 22] extended this to perspective projection by showing that projected points, when scaled properly, form a rank 4 matrix. Unlike their work, which uses rank constraint on tracks of points in images, our work only considers fundamental matrices and so in multiview settings it gives rise to systems with many fewer variables, relying on potentially less noisy estimates. Our approach, which seeks to recover a consistent set of fundamental matrices, is analogous to rotation or translation averaging and to loop closure [10, 6, 7]. In fact, obtaining consistent fundamental matrices can be regarded as simultaneous averaging of rotation, translation and camera calibration and as a way to close all loops. Our experiments indicate that such joint averaging performs better than a separate averaging of rotation and translation.
A number of algorithms have recently been proposed for solving unconstrained, low rank systems with outliers and missing data (e.g., [5, 13, 17]) with remarkable success. Extending such techniques to incorporate SfM constraints is an important next step.
When thousands of images are available, existing methods that use pairwise epipolar constraints or tri-focal tensors can exploit highly over-determined systems to handle noise and outliers quite accurately. However, when fewer images are available the importance of rank constraints grows, and their introduction can potentially yield more accurate estimation of camera parameters. Indeed, we provide experiments that show that using our characterization, essential matrices can be estimated more accurately than with current state-of-the-art methods, and these in turn can be translated to better estimates of camera locations.
2.1. Background
We first introduce notations and give a short summary of the relevant concepts in multi-view geometry. An extensive discussion of this topic can be found in [11]. Let denote a collection of n images of a scene and let
and
denote the location and orientation of the i’th camera in a global coordinate system. Let the
denote the intrinsic camera calibration matrix for
is nonsingular and is typically specified in the form
where, and
respectively are the focal lengths in the x and y direction,
form the principal point and
represents the skew coefficient. Let
be a scene point in the global coordinate system. Its projection onto
(expressed in homogeneous coordinates) is given by
, where
. We therefore associate with
the
camera matrix
, where I is a
identity matrix and note that scaling
does not affect projection.
Next, we consider the relations between pairs of images, and
. We can express the camera rotation and translation relating two images by
and
. Clearly,
and
. Two images are further related by epipolar line constraints, which are expressed by
, where
denotes the fundamental matrix relating
to
can be estimated up to scale from point correspondences.
is related to the rotation and translation between
and
and to their respective calibration matrices by
, where
denotes the skew-symmetric matrix corresponding to cross-product with
. In cases in which the cameras are calibrated we set
and replace the fundamental matrix with the essential matrix
. Therefore,
.
To derive our rank constraint we will need to express the essential and fundamental matrices relative to a global coordinate system. [25] derived an expression in terms of the camera matrices and
. Here we will use the more recent derivation of [2] that, as we shall see below, is amenable to factorization:
where .
2.2. Low-rank Construction
We next introduce our main result, which includes a low rank characterization of the collection of fundamental matrices in multiview settings. For our result we will construct a matrix of size , denoted F, in which each of the
blocks includes a fundamental matrix
(see Figure 1), where we assume that each of the pairwise fundamental matrices in F is scaled properly. We further define
for all
, and note that this is consistent with (3). Likewise we define the
matrix E from the essential matrices
. We refer to F (resp. E) as the multiview matrix of fundamentals (essentials).
Claim 1: F (and likewise E) is symmetric and 6. Moreover,
1. If F is produced by n cameras whose centers are not all collinear then rank(F) = 6 and there exists a matrix A with rank(A) = 3 such that
.
2. If F is produced by n cameras whose centers are all collinear then and there exists a matrix A with
such that
.
Proof: To prove the claim we begin by defining the matrix A as follows. Let , and
.
, and
are
matrices. Observ- ing (3) and recalling that
is skew-symmetric we see that
.
Next we construct the matrices U and V as
and set . Clearly, by construction,
3. Moreover,
, and so F is symmetric and
. Case 1: We show next that unless the cameras are all collinear rank(A) = 3. Clearly rank(V ) = 3. Therefore we need to show that also rank(U) = 3. We prove this by contradiction. Assume rank(U) < 3. Then
,
, s.t. Ut = 0. This implies that
for all
. Thus, all the
’s are parallel to t, violating our assumption that not all camera locations are collinear. Consequently rank(U) = 3 and therefore also rank(A) = 3.
Next we show that when the cameras are not all collinear rank(F) = 6. We recall that where
and
are non-singular. We can therefore write F =
where the
matrix K is block diagonal with blocks formed by
. This implies that rank(F) = rank(E), and so we are left to show that rank(E) = 6.
We assume WLOG that the camera locations are centered at the origin, i.e., (since E is invariant to global translation of the cameras). We further argue that each column of U is orthogonal to each column of V . This is evident from the following identities
Let denote the matrix A where we substitute
(so that
.) Denote by
the SVD of
and
are
and
is
). Since
we have that
and
. Now we can decompose E as :
Since the columns of U are orthogonal to those of V , the matrixis column orthogonal. Thus, (5) is the SVD of E. And since
is rank 3,
is full rank. Consequently, rank(F) = rank(E) = 6.
Case 2: Suppose all camera centers are collinear. WLOG assume that the origin of the global coordinate system is also collinear with the n cameras (since F is unaffected by global translation), and so we can write for
where
and
. Let
, then clearly
. Define
(so
) and let the
matrix
be formed by stacking
on top of each other then
Since T is skew-symmetric its rank is at most 2 and so is rank(A). It follows that
2.3. Tightness of our constraints
Claim 1 provides two constraints on the matrix F.
• and rank(A) = 3.
• The diagonal block of F vanishes, i.e., .
We now investigate how tight these constraints are in producing fundamental matrices that are consistent with a set of camera parameters. We show that the number of degrees of freedom allowed by these constraints is equal to the number of degrees of freedom in the camera matrices. However, we find that there exist matrices that are allowed by these constraints, but do not produce valid fundamental matrices.
Counting arguments show that our constraints allow degrees of freedom (DOFs) in defining F. Specifi-cally, since A is rank 3 it can be written as
where U and V are
, so together they have 18n entries. The constraint
, however, gives rise to a 15 DOF ambiguity that should be subtracted from the number of entries of U and V , as we explain in the next paragraph. The constraint that
requires
to be skew symmetric, yielding 6n more constraints on the entries of U and V , yielding together
DOFs.
note that we can write F as , where J is a
permutation matrix defined as J =
(so ). With this notation the ambiguity in factorizing F is obtained by introducing a
matrix Q such that
so that
has 36 entries, but the constraints
reduces its degrees of freedom to 15. Denote
and
to be skew symmetric and the sum
, providing altogether 21 constraints on the 36 entries of Q, leaving 15 DOFs.
Coincidentally, the number of DOFs in factoring F is equal to the DOFs in defining n cameras. In general, the number of DOFs in defining n perspective cameras is 15. However, each camera matrix can be scaled arbitrarily and each choice of scale will (inversely) scale the respective row and column of F. In other words, n camera matrices,
, scaled arbitrarily by non zeros
, produce a collection of equivalent multiview fundamental matrices defined by
The freedom in choosing the entries of S accounts for the n missing DOFs.
We note however that although the DOFs in factoring F with our constraints are equal to the DOFs in defining n camera matrices there exist matrices that satisfy our constraints but cannot be realized with n cameras. Specifically, these constraints do not guarantee that all the pairwise fundamental matrices are rank deficient. The constraint
restricts
to be skew-symmetric, implying that either
or
is rank deficient. If all
’s (or equivalently all
’s) are chosen to be rank deficient then so are all the
. If however some of the
’s and some of the
’s are chosen to be full rank then they may produce
blocks that are rank 3 and so they are not legal fundamental matrices. Note that the skew-symmetry of
guarantees that no more than 1/4 of the
’s can be of full rank. Indeed, our experiments (in Section 4) often produce
’s that are near rank 2; in a typical run the average ratio of the third to second largest singular value
, presumably because the problem is so over-constrained.
In conclusion, while our constraints provide a neces-
sary but not sufficient conditions for consistency, counting considerations indicate that our constraints are nearly tight. Below we develop and optimization scheme that utilizes these constraints to infer the missing scale factors for collections of estimated pairwise fundamental matrices, to recover missing fundamentals and to correct noisy ones.
In this section we formulate an optimization problem that uses the constraints derived in section 2 to achieve a better recovery of pairwise fundamental matrices. Assume we are given a set of fundamental matrices , where
and
denotes the subset of image pairs for which fundamental matrices have been estimated. (We will further assume
.) We use these matrices to construct our measurement matrix
whose
block contains
if
and is zero otherwise. Note that in the absence of errors each non-zero block is related by an unknown scale factor
to the corresponding block in the sought multiview matrix of fundamentals F, where
depends on the distance between the i’th and j’th cameras. Recovering these scale factors is essential in order to apply our constraints. Our task therefore can be expressed as:
where F is constrained to fulfill the constraints in Claim 1. Here we have chosen to minimize over the sum of Frobenius norms of each block. Such mixed L1-L2 norm minimization is expected to be robust to outlier estimates of
’s.
We note that the formulation (6) is bilinear in F and the scale factors. We could avoid this bilinearity by minimizing instead . Such minimization, however, is subject to a zero trivial solution and so it requires an additional constraint such as
. Our experience with such formulation is that it is quite sensitive to errors.
Expressing (6) with the constraints results in the following problem:
where denotes each
sub-block of A. Our solution for F then is
.
(7) introduces a number of challenges, including the mixed L1-Frobenius norms, the bilinearity, and the rank constraint. This problem is non-convex due to the latter two challenges. Below we describe how we approach these challenges with IRLS and ADMM. Our algorithm is sum- marized in Algorithm 1.
3.1. Handling Outliers with IRLS
We begin by addressing the mixed L1-Frobenius norm in the cost function. We approach this with Iterative Reweighted Least Squares (IRLS) [12]. IRLS converts the problem to weighted least squares where the weights are updated from one iteration to the next. At each iteration t of the IRLS we replace the cost function in (7) with
where
is a regularization parameters (we use
).
To clarify presentation we simplify our notations as follows. Let W and be
matrices. Denoting their
sub-blocks by
and
, we set
and
, where 1 is a
matrix with all 1’s. We further use the subscript WF to denote the weighted Frobenius norm, i.e.,
and use
to denote element-wise product of matrices. Therefore, in each IRLS iteration we seek to solve
3.2. Optimization using ADMM
Next, we wish to solve the non-convex optimization problem in (9), including the bilinearity and the rank constraint. To this end we will use a scaled version of Alternate Direction Method of Multiplier (ADMM) [4, 9]. We maintain a second copy of A, which we denote as B, and form the augmented Lagrangian of (9) as:
The last term in this objective, denotes the Lagrangian penalty;
is a constant, and
is a matrix of Lagrange multipliers of the same size as A that is updated in the ADMM steps. We next describe the ADMM steps, which are applied iteratively.
Step 1: Solving for . In each iteration, k, we solve the following sub-problems:
Since (11) is non-convex we will solve it by alternative minimization of A and
Because of the form of (11) it is useful to separate A into its symmetric and anti-symmetric parts, and
, so that
with
and
. Let
and
respectively denote its symmetric and anti-symmetric part. We can write (11) in terms of
and
and separately solve for them as follows:
To solve (12) we take the derivative w.r.t. and equate to 0. Thus we update
according to
2. Optimize w.r.t. : We minimize the following subproblem
Step 2: Solving for B. This part of the ADMM deals with the rank constraint. It requires a solution to
This is solved by
where SV P(X, r) denotes the Singular Value Projection (SVP) of X into space the of rank-r matrices. To perform SV P(X, r) we compute the SVD of X and keep its top r singular values and the corresponding singular vectors.
Step 3: Update of . The matrix
contains Lagrange multipliers that are used in the saddle-point formulation (10) to enforce the equality constraint A = B. The following update is a gradient ascent step that acts to maximize the augmented Lagrangian (10) for
For details, see [4, 9].
Empirically we observe monotonic convergence of the cost function defined in Equation 7 with each iteration of IRLS on a sample problem as shown in Figure 2. For every iteration of the IRLS we run ADMM till convergence to optimize Equation 10.
Figure 2: Convergence of our optimization algorithm
Figure 3: SfM pipelines for LUD (left) and our method (right).
To demonstrate the utility of our method we tested it in the problem of estimating essential matrices and camera locations from multiple images. Current iterative and global approaches to Structure from Motion (SfM) are often tested on large datasets when many pairwise essential matrices can be estimated, achieving outstanding performance. We argue that imposing rank constraints can be useful particularly when the number of images is relatively small. To demonstrate this we run our method on subsets of images of different sizes showing improved performance relative to the existing methods particularly with smaller subsets.
In many common SfM pipelines the intrinsic calibration parameters are recovered separately. Therefore, while our method can be applied when the calibration parameters are unknown, here we assume that the cameras are calibrated and so we apply our optimization algorithm to essential ma-
trices. Note that our derivations in Sections 2 and 3 hold also for essential matrices by setting . We next describe the tested methods:
• LUD [18]: Figure 3 shows the pipeline used by LUD to estimate camera locations and orientations from pairs of images. Starting from pairwise essential matrices estimated with SIFT [15] and RANSAC [3], this method first solves for camera orientations, denoted by in Figure 3, by iteratively applying [2] while rejecting outliers. Using camera orientations it then returns to the image keypoints to estimate pairwise camera directions, denoted by
. Using these pairwise directions it applies IRLS to solve for camera locations (
), which we compare to our method. In addition, we use the estimated camera locations and orientations to reconstruct the pairwise essential matrices
.
• ShapeKick [8]: For this method we use the same pipeline as used with LUD, except that we replace the translation recovery part of LUD with ShapeKick. ShapeKick formulates the location recovery problem as a convex optimization and solves it with ADMM. They achieved comparable performance to LUD on the dataset of [24].
• 1DSfM [24]: This method uses a pre-processing technique, based on projection in many random directions, to remove outliers in the original pairwise direction measurements. In our experiments we use their software, which uses the pipeline described in [24] and only provides camera locations.
• Our method: Figure 3 shows the pipeline used by our method. From the pair-wise essential matrices we minimize (7) using the IRLS-ADMM summarized in Algorithm 1. Since our method is not convex it requires a good initialization. We initialize it with essential matrices produced by the LUD method of Ozyesil et al. [18], denoted . Specifically
is used to initialize
and A in Algorithm 1. Our algorithm improves these essential matrix estimates, producing a collection of new pairwise estimates in E, denoted
. To further produce camera locations we first use
and the rotations obtained by the LUD pipeline,
, to solve for the pairwise camera directions
. Then we apply translation solver of LUD to the
with
to produce camera locations
. As is shown below, our improved estimates of essential matrices lead in turn to improved estimates of camera locations compared to the LUD pipeline.
We tested these methods on real image collections from [24], which comes with ‘ground truth’ estimates of camera locations and essential matrices produced with a sequential method similar to [20]. (These ground truth estimates are used also in [24, 18, 8].) For our experiments we used 14 different scenes from the dataset. For each scene we randomly selected 5 different sub-samples of N images from the dataset. We used N = 50, 100, and 150 images, resulting in 70 different trials for each N. In each trial we compared the quality of the essential matrix recovered by our method to that recovered by LUD and ShapeKick. Likewise, we compared the quality of our recovered camera locations to those obtained by the three competing methods.
Figure 4: These graphs show a comparison of the recovery error of essential matrices achieved with our method compared to LUD (in blue) and ShapeKick (in yellow), for collections of 50, 100, and 150 images from [24], The graphs on the left show the amount of relative improvement and the ones on the right show the fraction of improved trials.
Figures 4-5 show our results. Each graph summarizes the results of 70 trials with each value of N. Figure 4 shows the quality of our essential matrix estimates compared to those obtained with LUD and ShapeKick, and Figure 5 shows the quality of our camera location estimates compared to those achieved by the three competing algorithms. We measure these as follows. In each experiment k we consider the collection of pairwise essential matrices produced by our method. We first normalize each matrix and measure its error to the respective (normalized) ground truth matrix. We then take the mean (or median) of this error over all essential matrices. Denote this error by . We then produce similar error measures for each competing algorithm, denoted
. We then report:
• Relative Improvement (in %): Here we report for each N and competing algorithm the average of
Figure 5: A comparison of the recovery error of camera locations achieved with our method compared to LUD (in blue) and ShapeKick (in yellow), and 1DSfM (in red) for collections of 50, 100, and 150 images from [24], The graphs on the left show the amount of relative improvement and the ones on the right show the fraction of improved trials.
• Percent of Improved Trials: This provides the percentage of trials in which our algorithm achieved more accurate results than a competing algorithm, i.e., , where I(.) is the indicator function and K denotes the total number of trials.
We provide similar measures to assess the quality of our camera locations estimates. In Figure 6 we further show the median error of camera location estimates for all methods in all trials for N = 50.
It can be seen overall that our method leads to improved estimation of essential matrices and of camera locations. With 50 images, compared to, e.g., LUD, our algorithm improves the median essential matrix estimates by 17.69%. With 150 images a smaller overall improvement of 6.68% is achieved. This suggests that our constraints are more effective when smaller numbers of images are used. Interestingly, however, despite this reduction the fraction of trials in which our method achieved more accurate estimates compared to LUD in fact increased slightly from 87% with 50 images to 98% with 150 images, indicating that our method remains effective also with larger number of images (albeit yielding smaller improvement). Similar results are observed for camera location estimation. With 50 and 150 images our algorithms improves the median camera location error by 19.73% and 8.77% respectively, while the fraction of trials in which our method achieved more accurate estimates than LUD increased slightly from 84% with 50 images to 90% with 150 images.
In our previous experiments we applied our optimization algorithm to essential matrices, assuming calibration is given. Below we further apply our algorithm to fundamental matrices in an uncalibrated setting. Since not all the entries of a fundamental matrix are of same orders of magnitude, we normalize each of the input pairwise fundamental matrices by centering all the images and scaling them uniformly to within the [1, 1] square and then compute a normalized fundamental matrix. This does not affect our rank constraint and can be inverted at the end of the process. We tested our method on 5 subsamples of 50 images for 14 different scenes and compared it to LUD. To evaluate the quality of the recovered fundamental matrices we convert them to essential matrices by applying the known calibration matrices and further use these to recover camera locations. The results can be seen in Figure 6. Using our method to recover fundamentals (in blue) yielded comparable accuracies to our results for essential matrix recovery (yellow) and both our approaches improve significantly (10-20%) over LUD as shown in Figure 7.
We further performed bundle adjustment (using [14]) initialized by the camera parameters obtained with our method and LUD. After bundle adjustment compared to LUD our method improved camera location estimates on average by 11.52%, 3.13% and 5.43%, improving in 70.59%, 64.29% and 63.77% of all trials for 50, 100 and 150 images respectively in terms of median translation error. These results indicate that our method maintains improved accuracies over LUD also after bundle adjustment.
With 50 images the recovery of essential matrices with our method requires roughly 20 iterations of IRLS and 1000 iterations of ADMM. These take overall about 2 minute on a 2.7 GHz Intel Core i5 computer.
To conclude, these experiments indicate that our characterization of essential matrices in multiview settings can be used to improve essential matrix and cameral location estimates. The advantage of these constraints appear to be particularly pronounced when fewer images are available.
We have introduced in this paper novel rank constraints on fundamental matrices in multiview settings. We have shown in particular that with non-collinear cameras the matrix that depicts the pairwise fundamentals is of rank 6 and forms the symmetric part of a rank 3 matrix whose factors are related directly to the entries of the respective camera matrices. We have used these constraints to develop an optimization framework to efficiently recover fundamental matrices for all pairs of images and to estimate their proper
Scenes0 10 20 30 40 50 60 70 0
Figure 6: Median camera location error obtained by the four algorithms for 5 subsets of 50 images for 14 different scenes (‘Notre Dame’, ‘Montreal Notre Dame’, ‘Alamo’, ‘Piazza del Popolo’, ‘Piccadilly’, ‘NYC Library’, ‘Yorkminster’, ‘Union Square’, ‘Madrid Metropolis’, ‘Tower of London’, ‘Vienna Cathedral’, ‘Roman Forum’ and ‘Ellis Island’, ‘Gendarmenmarkt’). For clarity we terminate the median T error axis at 30.
Figure 7: Improvement of our method over LUD using fundamental matrix (in blue) and essential matrix (yellow) for 50 images.
scale factors. Our experiments indicate that our method is able to provide improved estimates of essential matrices and camera locations in global SfM settings. Moreover, these experiments suggest that our constraints are particularly useful when fewer images are available.
Our plans for the future include improving the runtime of our optimization method. We intend to explore different ways to initialize the algorithm, possibly through convex relaxations. We would further want to explore the use of our method in related applications, e.g., in camera autocalibration.
[1] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski. Building rome in a day. In 2009 IEEE 12th international conference on computer vision, pages 72–79. IEEE, 2009. 1
[2] M. Arie-Nachimson, S. Z. Kovalsky, I. Kemelmacher- Shlizerman, A. Singer, and R. Basri. Global motion estimation from point matches. In 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pages 81–88. IEEE, 2012. 1, 2, 7
[3] R. C. Bolles and M. A. Fischler. A ransac-based ap- proach to model fitting and its application to finding cylinders in range data. In IJCAI, volume 1981, pages 637–643, 1981. 7
[4] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eck- stein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends Rin Machine Learning, 3(1):1–122, 2011. 2, 5, 6
[5] E. J. Cand`es and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717–772, 2009. 2
[6] A. Chatterjee and V. Madhav Govindu. Efficient and robust large-scale rotation averaging. In Proceedings of the IEEE International Conference on Computer Vision, pages 521–528, 2013. 2
[7] Z. Cui and P. Tan. Global structure-from-motion by similarity averaging. In Proceedings of the IEEE International Conference on Computer Vision, pages 864–872, 2015. 2
[8] T. Goldstein, P. Hand, C. Lee, V. Voroninski, and S. Soatto. Shapefit and shapekick for robust, scalable structure from motion. In European Conference on Computer Vision, pages 289–304. Springer, 2016. 7
[9] T. Goldstein, B. O’Donoghue, S. Setzer, and R. Bara- niuk. Fast alternating direction optimization methods. SIAM Journal on Imaging Sciences, 7(3):1588–1623, 2014. 5, 6
[10] R. Hartley, J. Trumpf, Y. Dai, and H. Li. Rotation averaging. International journal of computer vision, 103(3):267–305, 2013. 2
[11] R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003. 2
[12] P. W. Holland and R. E. Welsch. Robust regression us- ing iteratively reweighted least-squares. Communications in Statistics-theory and Methods, 6(9):813–827, 1977. 2, 5
[13] Y. Hu, D. Zhang, J. Ye, X. Li, and X. He. Fast and ac- curate matrix completion via truncated nuclear norm regularization. IEEE transactions on pattern analysis and machine intelligence, 35(9):2117–2130, 2013. 2
[14] M. I. Lourakis and A. A. Argyros. Sba: A software package for generic sparse bundle adjustment. ACM Transactions on Mathematical Software (TOMS), 36(1):2, 2009. 8
[15] D. G. Lowe. Distinctive image features from scale- invariant keypoints. International journal of computer vision, 60(2):91–110, 2004. 7
[16] D. Martinec and T. Pajdla. Robust rotation and trans- lation estimation in multiview reconstruction. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007. 1
[17] T. Okatani and K. Deguchi. On the wiberg algorithm for matrix factorization in the presence of missing components. International Journal of Computer Vision, 72(3):329–337, 2007. 2
[18] O. Ozyesil and A. Singer. Robust camera location es- timation by convex programming. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2674–2683, 2015. 1, 7
[19] O. Ozyesil, A. Singer, and R. Basri. Stable camera motion estimation using convex programming. SIAM Journal on Imaging Sciences, 8(2):1220–1262, 2015. 1
[20] N. Snavely, S. M. Seitz, and R. Szeliski. Photo tourism: exploring photo collections in 3d. In ACM transactions on graphics (TOG), volume 25, pages 835–846. ACM, 2006. 1, 7
[21] P. Sturm and B. Triggs. A factorization based algo- rithm for multi-image projective structure and motion. In European conference on computer vision, pages 709–720. Springer, 1996. 2
[22] B. Triggs. Factorization methods for projective struc- ture and motion. In Computer Vision and Pattern Recognition, 1996. Proceedings CVPR’96, 1996 IEEE Computer Society Conference on, pages 845–851. IEEE, 1996. 2
[23] R. Tron and R. Vidal. Distributed 3-d localization of camera sensor networks from 2-d image measurements. IEEE Transactions on Automatic Control, 59(12):3325–3340, 2014. 1
[24] K. Wilson and N. Snavely. Robust global translations with 1dsfm. In European Conference on Computer Vision, pages 61–75. Springer, 2014. 1, 7, 8
[25] Z. Zhang and G. Xu. A general expression of the fun- damental matrix for both perspective and affine cameras. In Proceedings of the Fifteenth international joint conference on Artifical intelligence-Volume 2, pages 1502–1507. Morgan Kaufmann Publishers Inc., 1997. 2
6.1. Results on camera location error
In Table 1 we compare Our, LUD, ShapeKick and 1DSfM on 14 different scenes for N = 50, 100 and 150 images. For each scene and each choice of N we report the average of the median camera location error for 5 different trials. In Table 3-16 , each table compares four competing algorithms on 5 different trials for 50, 100 and 150 images.
6.2. Results on essential matrix error
In Table 2 we compare Our, LUD, ShapeKick and 1DSfM on 14 different scenes for N = 50, 100 and 150 images. For each scene and each choice of N we report the average of the median essential error for 5 different trials. Essential matrix error between two cameras is computed as the norm of their difference after they are normalized to unit norm. We multiply the essential matrix error by 100 and report it for convenience. In Table 17-30 , each table compares four competing algorithms on 5 different trials for 50, 100 and 150 images.
Table 1: Average of median camera location error for 5 trials for each scene and each choice of N
Table 2: Average of median essential matrix error for 5 trials for each scene and each choice of N
Table 3: Median translation error on ‘Alamo’
Table 4: Median translation error on ‘Ellis Island’
Table 5: Median translation error on ‘Madrid Metropolis’
Table 6: Median translation error on ‘Montreal Notre Dame’
Table 7: Median translation error on ‘Notre Dame’
Table 8: Median translation error on ‘NYC Library’
Table 9: Piazza Del Popolo
Table 10: Median translation error on ‘Piccadilly’
Table 11: Median translation error on ‘Roman Forum’
Table 12: Median translation error on ‘Tower of London
Table 13: Median translation error on ‘Union Square’
Table 14: Median translation error on ‘Vienna Cathedral’
Table 15: Median translation error on ‘Yorkminster’
Table 16: Median translation error on ‘Gendarmenmarkt’
Table 17: Median essential matrix error on ‘Alamo’
Table 18: Median essential matrix error on ‘Ellis Island’
Table 19: Median essential matrix error on ‘Madrid Metropolis’
Table 20: Median essential matrix error on ‘Montreal Notre Dame’
Table 21: Median essential matrix error on ‘Notre Dame’
Table 22: Median essential matrix error on ‘NYC Library’
Table 23: Median essential matrix error on ‘Piazza Del Popolo’
Table 24: Median essential matrix error on ‘Piccadilly’
Table 25: Median essential matrix error on ‘Roman Forum’
Table 26: Median essential matrix error on ‘Tower of London’
Table 27: Median essential matrix error on ‘Union Square’
Table 28: Median essential matrix error on ‘Vienna Cathedral’
Table 29: Median essential matrix error on ‘Yorkminster’
Table 30: Median essential matrix error on ‘Gendermenmarkt’