Incremental 3D Line Segment Extraction from Semi-dense SLAM

2017·arXiv

Abstract

I. INTRODUCTION

Reconstructing surface from a stream of images is an important yet challenging topic in computer vision. The 3D surface of the scene is essential for a variety of applications, such as photogrammetry, robot navigation, augmented reality, etc.

Given the expanding point cloud from a SLAM system, incremental surface reconstruction algorithms can be used to reconstruct a 3D surface representing the scene [1], [2]. With the recent development in SLAM, some monocular SLAM systems can produce dense or semi-dense point cloud which represents the scene more completely than the sparse point cloud produced by traditional SLAMs [3], [4]. Although the surface reconstruction algorithms can run in real-time on sparse point clouds, the statement does not hold true when the number of points is increased significantly. In this paper, we propose to represent the scene using line segments in order to simplify the result of semi-dense SLAM and achieve real-time performance in the task of surface reconstruction.

Line segments efficiently preserve more structural information of a scene than point clouds. Many line segment based 3D reconstruction algorithms have been proposed [5]–[10]. Most of these methods rely on efficient line segment detection and inter-keyframe matching, which are difficult for certain scenes. However, with the dense or semi-dense point clouds available, one can estimate 3D line segments without explicit matching and triangulation.

In this paper, we develop a novel method to incrementally extract 3D line segments from semi-dense SLAM. Those detected line segments present structural relations among points and prove to be a highly efficient form of 3D representation. Our method focuses on the accuracy of reconstructed line segments instead of the improvement to camera localization in other works [7], [10]. Specifically, we propose a novel edge aided 3D line segment fitting algorithm. We directly detect 3D line segments by taking both 2D locations and corresponding depth into consideration in the line segment detection procedure. Our method produces accurate 3D line segments with few outliers. We utilize the 3D line segments extracted by our method in incremental surface reconstruction and improve the quality of reconstructed surface with respect to a feature point based baseline.

The rest of this paper is organized as follows. In Section 2, we review some related works about 3D line segment reconstruction. In Section 3, we describe our proposed algorithm in details. We present our experimental results in Section 4 and conclude in Section 5.

II. RELATED WORKS

Existing 3D line segment detection methods can be categorized into three main classes:

1) 3D line reconstruction from matching 2D lines: Bartoli and Sturm proposed a line segment based Structure from Motion (SFM) pipeline [5]. Micusik and Wilder used relaxed endpoint constraints for line matching and developed a SLAMlike line segment based SFM system [8]. Hofer et al. matched 2D line segments detected from images of different views by pure geometric constraints [6]. In [7], Pumarola et al. proposed PLSLAM, which detect line segments using LSD [11] and match them based on LBD descriptor [12].

2) 3D line fitting directly from 3D points: Roberts proposed a compact representation for a line in [13]. Using this representation, Snow and Schaffrin developed an algorithm for solving the Total Least-Squares problem of 3D line fitting [9]. However, this kind of methods are sensitive to noise and outliers. Random Sample Consensus (RANSAC) is relatively robust to small number of outliers [14]. However, RANSAC is time consuming in 3D space and the optimal line fitting is not guaranteed in the presence of a large amount of outliers.

3) 3D line extraction aided by 2D cues: Woo et al. [15] detected 2D line segments from 2D aerial images first, and then used their corresponding 3D points on buildings’ Digital Elevation Model (DEM) to fit 3D lines. Given RGB-D sensor, Nakayama et al. [10] transformed 2D points on detected 2D line segments directly to 3D using corresponding depth image. Then, 3D line segments are fitted by RANSAC in 3D space.

Fig. 1. Workflow of our method. The input of our method is a video (image sequence) which is captured by a calibrated moving camera. The output is a line segment based 3D model of the scene.

Since SLAM systems can output semi-dense point clouds as their mapping results, we prefer to make full use of these results and extract 3D line segments using both image and point cloud information. Our method belongs to the third class because we extract line segments from the semi-dense point cloud with the help of detected 2D edges.

III. METHOD

Instead of detecting 2D line segments and finding their corresponding 3D points for fitting 3D line segments afterwards, we fit the line segment in 3D by iteratively fitting its projections in two different planes.

Specifically, in order to extract 3D line segments from semi-dense point cloud, our method performs the following steps on each new keyframe (shown in Figure 1): (1) Compute semi-dense depth map; (2) Edge aided 3D line segment fitting; (3) 3D line segment clustering and filtering.

A. Keyframe and depth map generation

Our method is based on ORB-SLAM [16] with semi-dense module [3]. ORB-SLAM is a feature based SLAM system which takes the image sequence from a moving camera and computes the camera poses in real time. Mur-Artal and Tard´os [3] presented a semi-dense module which is able to compute a semi-dense depth map for each keyframe. In principle, other keyframe based dense or semi-dense SLAM system could be used to generate the semi-dense depth maps, such as LSDSLAM [4].

B. Edge aided 3D line segment fitting

Fitting 3D line segments directly from point clouds can be difficult and time consuming. In this paper, we propose to use 2D image information on keyframes to help 3D line segment fitting from semi-dense point clouds.

Fig. 2. Line segment fitting related coordinates. C-XYZ is the camera coordinates. The X-axis and Y-axis are parallel to the image coordinates. The Z-axis is the depth direction. represent a detected 2D image line segment. Line segment pixels and their corresponding real-world points are all located on the same plane . The x-axis is defined by vector

while z-axis is parallel to the Z-axis. For each 3D line segment, we fit two 2D lines in image coordinate plane and coordinate plane.

We first extract edge segments from keyframes using Edge Drawing [17]. Edge Drawing is a linear time edge detector which can produce a set of accurate, contiguous and clean edge segments represented by one-pixel-wide pixel chains. Now the detected edge segments of the a keyframe can be expressed as where is an edge segment formulated as a pixel chain. represents a pixel which is a vector of (x, y, Z) where x and y are the image coordinates and Z is its corresponding depth in the semi-dense depth map. The number of edge segments in a keyframe and number of pixels in an edge segment are denoted by n and m respectively. It is worth noting that image pixels with high intensity gradient are more likely to be selected for computing depth value in the semi-dense SLAM system. Edge segments are detected based on pixel intensity gradients as well. Thus, the detected edge pixels are very likely to have depth values. The pixels which have no depth values will be considered as outliers in the line fitting process.

3D line segments of a keyframe are extracted from those detected image edge segments by Algorithm 1. The main idea of this algorithm is to reduce a 3D line fitting problem to two 2D line fitting problems. The coordinate frames are defined as shown in Figure 2. For each edge segment, the algorithm initially takes its first L pixels to fit two 2D lines (and ) in the image coordinate frame and the -xz coordinate frame using total least square method. The line is fitted based on the pixels’ (x, y) values while is fitted based on (D, Z). Z is the pixel’s depth and D is the distance from to the pixel’s projection on the x-axis. Total least square 2D line fitting is performed by solving Singular Value Decomposition (SVD) [18]. It is worth noting that the plane -xz is not always the same with plane , which is determined by C, and . The plane -xz is orthogonal to the image plane where is fitted. The two planes, -xz and , are the same plane only if the image center lies on the line segment . Although the 3D points may not be located on the plane - xz, it is much easier to perform line fitting on -xz rather than . Given the image plane coordinate of the points, actual

3D positions can be easily recovered using the depth of points in -xz plane.

After obtain an initial line segment, we compute the distances of the next pixel in pixel chain to and in their corresponding coordinate frames. Note that D and Z have different units. To have the same unit as D, Z is multiplied by the focal length f before distance computation. If both distances are smaller than certain thresholds, and respectively, we will add the pixel to the fitted pixel set to extend the line. Otherwise the pixel will be considered as an outlier. If L consecutive pixels are outliers, we stop the current line segment search and start a new line segment search. Another pair of total least square fittings on the two planes are performed to obtain the final 3D line for each line segment. The 3D line segments are represented by their endpoints, which are estimated by projecting the points corresponding to its first and last pixel on to the final 3D line. After traversing all of the edge segments of the keyframes, we can aggregate one 3D line segment set for each keyframe.

C. 3D line segment clustering and filtering

To obtain a consistent reconstruction of the environment, 3D line segments extracted from different keyframes are first registered to the same world coordinate system. The registered 3D line segments are denoted as . Here w denotes the total number of 3D line segments from all keyframes. Directly registering all 3D line segments will produce redundant and slightly

Fig. 3. Clustering by angle and distance. is fitted from an existing 3D line segment cluster, while is an unclustered 3D line segment.

misaligned result. We address this problem by proposing a simple incremental merging method.

The main idea of our merging method is clustering closely located 3D line segments and fitting those cluster sets with new 3D line segments. As illustrated in Figure 3, the angle and distance measures are used for clustering. The angle measure is defined as:

The distance measure d is computed as:

Specifically, we take the first 3D line segment as the initial cluster . Then, we compute the angle and distance measure between the initial cluster (single line segment) and the next 3D line segment . If the angle and distance d are smaller than certain thresholds (and respectively), we add to the cluster . Otherwise, we create a new cluster . For each cluster, if it contains more than one 3D line segments, we will fit a new 3D line segment to represent the cluster. The direction of the new line segment is determined by performing SVD on the matrix consisting of all points in , where denotes the set containing all the endpoints of line segments in this cluster. A new 3D infinite line is then determined by the direction together with the centroid of . In order to obtain a 3D line segment from this infinite line, we project endpoints in onto the generated infinite 3D line and compute the furthest projections with respect to the centroid in both directions. The 3D line segment between these two furthest projection points is taken as the fitted line segment of the cluster. This process is repeated until all the line segments in ls are clustered. Clusters with small size () are filtered out in the end. In this way, we can merge a large number of line segments into fewer clusters and generate new 3D line segments with higher quality.

IV. EXPERIMENTS

In this section, we present the results of our 3D line segment extraction method on image sequences from the TUM RGB-D dataset [19] and the EuRoC MAV dataset [20].

Fig. 4. Experimental results. Top to Bottom (four sequences): EuRoC MAV Vicon Room 101 (VR101), EuRoC MAV Machine Hall 01 (MH01), TUM RGBD fr3-large-cabinet (fr3-lc), TUM RGBD fr1-room (fr1-r).

A. Implementation

The experiments in this section are performed on a desktop computer with a quad-core Intel i7-6700k CPU. We use the open source ORB-SLAM2 [21] as our base system. We implement the semi-dense module in C++ as described in [3]. The parameters in ORB-SLAM are kept as default, and the parameters for semi-dense module are set as presented in [3]. Parameters in Algorithm 1 and incremental line segment clustering are set as follows in all our experiments: , , where w and h denotes the width and height of images respectively.

B. Qualitative Comparison

The results of our 3D line segment extraction method on some test sequences are illustrated in Figure 4e and Figure 4f. Our results accurately fit the semi-dense point clouds shown in Figure 4b. They still capture the major structures of the scene while reducing the number of 3D elements greatly.

1) Line3D++: We first compare our results with those from Line3D++ [6]. The results of Line3D++ on the test sequences are shown in Figure 4c. In our experiments, Line3D++ uses line segments detected by EDLines [22] together with the keyframe images and camera poses from ORB-SLAM to construct 3D line segments. It utilizes geometric constraint and mutual support of line segments to match them across frames. In some cases, such as complex indoor environment with repetitive textures (e.g. sequence VR101 in Figure 4), Line3D++ can produce a large number of outliers due to the ambiguity of geometric line matching. On the contrary, our method is not affected by such ambiguity since it utilizes the semi-dense depth maps.

In contrast to Line3D++, semi-dense points can cover regions with large image gradient, such as boundaries and contours, where straight lines may be absent. Since our method takes both intensity and depth information into consideration, it is robust to outliers and it can extract shorter line segments than EDLines. Thus, our results fit curves better and capture finer details than Line3D++, which is clearly shown in the result of sequence MH01 in Figure 4.

Moreover, our method is more robust to camera pose errors. Line3D++ can fail to correctly match line segments due to errors in the estimated camera poses, while our method is still able to reconstruct meaningful 3D line segments. It can be easily observed in the two TUM RGB-D sequences (fr3-lc and fr1-r) shown in Figure 4.

2) Decoupled 3D line segment fitting: To further demonstrate the capability of our method, we compare it to a decoupled 3D line segment fitting method using 2D line segment given by EDLines [22]. Given detected line segments and the depth information on some of the pixels along the line segments, we can easily estimate the 3D line segment position by performing a single 2D line fitting. In this case, there are a fixed number of pixels on the line segment since we do not

Fig. 5. Comparison of results of sequence TUM RGB-D fr3-structure-texture- near.

need to iteratively search along pixel chains and extend line segments. Therefore, we can efficiently perform RANSAC in 2D to remove outliers before the line fitting process. With the fitted line, we compute the 3D location of the endpoints and reconstruct the 3D line segment. Note the result of this method is equivalent to directly performing a RANSAC in 3D to fit all 3D points on the line segment. However, fitting a line in 2D is faster because fewer parameters are required to represent the line and the search space is much smaller.

The results of decoupled line segment fitting are presented in Figure 4d. Compared to the edge aided 3D line fitting which tries to utilize pixel position and depth simultaneously, the decoupled fitting essentially fits lines in the image plane and depth plane in two steps. The error from line fitting in the image plane will be propagated to the error of 3D line segment position, which result in an inaccurate reconstruction compared to our method, as shown in Figure 5. It is worth mentioning that the decoupled fitting tends to generate much longer segments since only the pixel position is considered in the image plane line fitting process. Longer segments will make the propagated error even worse because the total error of line segments in image space might be larger. Another source of error is that EDLines may detect long line segments which are not continuous lines in 3D space. Trying to fit a single 3D line segment onto the 2D segment in this case will result in a large error. On the other hand, in our method, if either of the two errors of line fitting grows higher than the threshold, we will stop the line fitting and start a new line fitting process. In this way, the errors accumulated from image plane and depth plane are bounded, and therefore prevent the line segments from being far away from the 3D points.

C. Quantitative Comparison

1) Distance to surface: To demonstrate the accuracy of our method, we compute the average distance of line segment endpoints to the ground truth surface in two EuRoC MAV sequences, as shown in Table I. We take the provided precise

TABLE I AVERAGE DISTANCE OF VERTICES TO GROUND TRUTH SURFACE

3D scanning of environment as ground truth. Since the output of ORB-SLAM have coordinates different from the ground truth surface data, we estimate the global Euclidean transform and scale change by performing ICP to align the semi-dense point cloud to the ground truth point cloud. The same Euclidean transform and scale change are applied to all the output line segments, so that all the distances are calculated in the coordinates of the ground truth data. It can be seen in Table I that the results of our method fit to the surfaces better than other methods.

2) Compactness: For easy handling and manipulation, it is desired to have fewer 3D elements while they can still represent most of the environment. In the surface reconstruction pipeline, a smaller number of vertices will also greatly reduce the running time. As shown in Table II, the point clouds are greatly simplified with our edge aided 3D line segment fitting algorithm. The results are simplified further using our 3D line segments clustering process. Note that although Line3D++ produces the fewest number of vertices in the reconstruction, the completeness of reconstruction is generally worse than our method as shown in Figure 4.

3) Running Time: The average running time of our edge-aided 3D line segments fitting on the sequences shown in Figure 4 is 7.40ms per keyframe, while the decoupled 3D fitting takes 10.42ms per keyframe. Our line segment fitting method is run-time efficient while utilizing large amount of depth information. Compared to the running time of edge aided 3D fitting, decoupled 3D fitting requires additional computation time for performing RANSAC. Because the segments are usually much longer in decoupled 3D line segments fitting, RANSAC is necessary in order to obtain a good fit for the larger pixel sets on line segments. Our 3D line segment fitting algorithm is linear in the number of line segments and is fast enough to be real-time. However, our clustering process is relatively slower. The complexity of clustering a single line segment is O(M), where M is the number of existing clusters. In the worst case, M can be the same as the number of line segments N. Therefore, the complexity of the clustering process in a sequence is .

D. Surface Reconstruction

The resulting line segments of our method can be used to improve the quality of surface reconstruction. We integrate our method to the incremental space carving surface reconstruction algorithm presented in [1]. The algorithm incrementally reconstructs the surface by marking discretized 3D space as

TABLE II NUMBER OF VERTICES IN THE RECONSTRUCTION

Fig. 6. Reconstructed surface of sequence EuRoC MAV Vicon Room 101 in different views.

free or occupied using the visibility information of interest points. We compare the reconstructed surface using the line segment endpoints from our proposed method versus using the map points of ORB-SLAM. The results running on EuRoC Vicon Room 101 sequence are shown in Figure 6. Thanks to the structural information and fewer outliers provided by our method, major structures in the room are much more obvious.

V. CONCLUSION

In this paper, we present an incremental 3D line segment based method that uses underlying structural information to simplify the semi-dense point cloud output by keyframe-based SLAM system. The main contribution lies in the novel edge aided 3D line segment extraction algorithm which solely relies on the image and the semi-dense depth map of individual keyframes. We show that the result of our method is accurate and can be used in incremental surface reconstruction to improve the quality of 3D surfaces.

REFERENCES

[1] D. I. Lovi, “Incremental free-space carving for real-time 3d reconstruc- tion,” Master’s thesis, University of Alberta, 2011.

[2] C. Hoppe, M. Klopschitz, M. Donoser, and H. Bischof, “Incremental surface extraction from sparse structure-from-motion point clouds,” in British Machine Vision Conference, BMVC 2013, Bristol, UK, September 9-13, 2013, 2013.

[3] R. Mur-Artal and J. D. Tard´os, “Probabilistic semi-dense mapping from highly accurate feature-based monocular SLAM,” in Robotics: Science and Systems XI, Sapienza University of Rome, Rome, Italy, July 13-17, 2015, 2015.

[4] J. Engel, T. Sch¨ops, and D. Cremers, “LSD-SLAM: large-scale direct monocular SLAM,” in Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II, 2014, pp. 834–849.

[5] A. Bartoli and P. F. Sturm, “Structure-from-motion using lines: Repre- sentation, triangulation, and bundle adjustment,” Computer Vision and Image Understanding, vol. 100, no. 3, pp. 416–441, 2005.

[6] M. Hofer, M. Maurer, and H. Bischof, “Efficient 3d scene abstraction using line segments,” Computer Vision and Image Understanding, vol. 157, pp. 167–178, 2017.

[7] A. Pumarola, A. Vakhitov, A. Agudo, A. Sanfeliu, and F. Moreno- Noguer, “PL-SLAM: real-time monocular visual SLAM with points and lines,” in 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017, 2017, pp. 4503–4508.

[8] B. Micus´ık and H. Wildenauer, “Structure from motion with line segments under relaxed endpoint constraints,” International Journal of Computer Vision, vol. 124, no. 1, pp. 65–79, 2017.

[9] K. Snow and B. Schaffrin, “Line fitting in euclidean 3d space,” Studia Geophysica et Geodaetica, vol. 60, no. 2, pp. 210–227, 2016.

[10] Y. Nakayama, H. Saito, M. Shimizu, and N. Yamaguchi, “3d line segment based model generation by RGB-D camera for camera pose estimation,” in Computer Vision - ACCV 2014 Workshops - Singapore, Singapore, November 1-2, 2014, Revised Selected Papers, Part III, 2014, pp. 459–472.

[11] R. G. von Gioi, J. Jakubowicz, J. Morel, and G. Randall, “LSD: A fast line segment detector with a false detection control,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 4, pp. 722–732, 2010.

[12] L. Zhang and R. Koch, “An efficient and robust line segment matching approach based on LBD descriptor and pairwise geometric consistency,” J. Visual Communication and Image Representation, vol. 24, no. 7, pp. 794–805, 2013.

[13] K. S. Roberts, “A new representation for a line,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 1988, 5-9 June, 1988, Ann Arbor, Michigan, USA., 1988, pp. 635–640.

[14] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981.

[15] D. Woo, S. S. Han, Y. Jung, and K. Lee, “Generation of 3d building model using 3d line detection scheme based on line fitting of elevation data,” in Advances in Multimedia Information Processing - PCM 2005, 6th Pacific-Rim Conference on Multimedia, Jeju Island, Korea, November 13-16, 2005, Proceedings, Part I, 2005, pp. 559–569.

[16] R. Mur-Artal, J. M. M. Montiel, and J. D. Tard´os, “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Trans. Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.

[17] C. Topal and C. Akinlar, “Edge drawing: A combined real-time edge and segment detector,” J. Visual Communication and Image Representation, vol. 23, no. 6, pp. 862–872, 2012.

[18] G. H. Golub and C. F. van Loan, “An analysis of the total least squares problem,” SIAM Journal on Numerical Analysis, vol. 17, no. 6, pp. 883– 893, 1980.

[19] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of RGB-D SLAM systems,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012, 2012, pp. 573–580.

[20] M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The euroc micro aerial vehicle datasets,” I. J. Robotics Res., vol. 35, no. 10, pp. 1157–1163, 2016.

[21] R. Mur-Artal and J. D. Tard´os, “ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras,” IEEE Trans. Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.

[22] C. Akinlar and C. Topal, “Edlines: A real-time line segment detector with a false detection control,” Pattern Recognition Letters, vol. 32, no. 13, pp. 1633–1642, 2011.

Designed for Accessibility and to further Open Science