TGGLines: A Robust Topological Graph Guided Line Segment Detector for Low Quality Binary Images

2020·Arxiv

Abstract

Abstract

Line segment detection is an essential task in computer vision and image analysis, as it is the critical foundation for advanced tasks such as shape modeling and road lane line detection for autonomous driving. We present a robust topological graph guided approach for line segment detection in low quality binary images (hence, we call it TGGLines). Due to the graph-guided approach, TGGLines not only detects line segments, but also organizes the segments with a line segment connectivity graph, which means the topological relationships (e.g., intersection, an isolated line segment) of the detected line segments are captured and stored; whereas other line detectors only retain a collection of loose line segments. Our empirical results show that the TGGLines detector visually and quantitatively outperforms state-of-the-art line segment detection methods. In addition, our TGGLines approach has the following two competitive advantages: (1) our method only requires one parameter and it is adaptive, whereas almost all other line segment detection methods require multiple (non-adaptive) parameters, and (2) the line segments detected by TGGLines are organized by a line segment connectivity graph.

1. Introduction

Line segment detection has been studied for decades in computer vision and plays a fundamental and important role for advanced vision problems, such as indoor mapping [3], vanishing point estimation [15, 23], and road lane line detection for autonomous driving [5]. Many line segment detection methods are developed for natural images [4, 16, 22, 2, 7]. However, those methods do not work well for technical diagram images, such as patent images – see Figure 1 for an example. Clearly the line segment detection challenge remains in technical images, especially in low quality images affected by the local zigzag “noise” introduced by the scanning process.

We present a robust topological graph guided approach for line segment detection in low quality binary (diagram) images. Specifically, our approach combines the power of topological graphs and skeletons to generate a skeleton graph representation to tackle the line segment detection challenges. A skeleton is a central line (1 pixel wide) representation of an object present in an image obtained via thinning [13, 25]. The skeleton emphasizes topological and geometrical properties of shapes [25]. In our approach, the skeleton serves as the essential bridge from pixel image representation to topological graph representation.

We compare our TGGLines approach with four mainstream and state-of-the-art line segment detection methods. The empirical results demonstrate that our TGGLines approach visually and quantitatively outperforms other methods. In addition, our method has two advantages. (1) While the parameters for most other methods are not adaptive, our method is robust as it only requires one parameter, and this parameter is adaptive. (2) As TGGLines detects line segments it orga-

Figure 1. An example of how challenging it is for machines to see line segments present in diagram images (e.g., patent images), especially those with low quality introduced by the scanning process. (a) Input image is a binary pixel-raster image. (b) Line segments (perceived) and annotated in green by humans. (c) Line segments machines see in the input image (using state-of-art line segment detection method linelet [7]).

nizes the segments into a topological graph. We call this graph the line segment connectivity graph (LSCG); which stores the topological relations (e.g., turning and junction) among detected line segments. The LSCG can be used by advanced computer vision tasks, such as shape analysis and line segment-based image retrieval.

Line detection performance is often simply evaluated by visualization, which is a non-quantitative evaluation. In this paper, we evaluate our results both qualitatively (Section 5.3) and quantitatively (Section 5.4). One of the most difficult problems for quantitative evaluation is annotating line segments in an image accurately because sometimes even humans will annotate lines differently due to the zigzag “noise” introduced by the scanning process (see the annotation examples provided in Figure 5). Typically for a scientific diagram image, there can be several hundreds of lines that must be annotated (see images #6 and #10 in Table 2 for examples). We provide a simple interface for line segment annotation as well as a quantifiable metric for line detection performance as measured with respect to inherently inexact annotations.

Here, we provide a road map to the rest of the paper. Section 2 covers related work, including existing line segment detection methods, and the topological-based image representations that our method is built on. The core of our paper is Section 3 focusing on our TGGLines approach and Section 4 focusing on algorithms. In Section 5, we present our experiments with qualitative and quantitative evaluations. The paper concludes in Section 6 with a mention of potential applications.

For readability, we provide a list of abbreviations in Appendix A.1. Background on graph theory and computational geometry, is provided in Appendix A.2. These appendices are provided in the supplementary materials.

2. Related work

Existing line segment detection methods are either edge-based or local gradient-based, whereas our TGGLines method does not rely on edge detection or local gradients. Edge-based line detectors include the Hough transform (HT) [4] which often uses the Canny edge detector [6] as pre-processing. The main drawback of HT is that it is computationally expensive and it only outputs the parameters of a line equation, not the endpoints of line segments. The progressive probabilistic Hough transform (PPHT) [16] is an optimization of the standard HT; it does not take all the points into consideration, instead taking only a random subset of points and that is sufficient for line detection. Also, PPHT is able to detect the two endpoints of each line, so it can be used to detect line segments present in images.

Local gradient-based line detectors are successful on natural images, but not diagrams (See Section 5.2). The line segment detector (LSD) [22] is a local gradient-based method that has been tested on a wide set of natural images. EDLines [2] speeds up LSD but according to our experiments and evaluation (Section 5.2), EDLines’ performance is much worse than LSD on the diagram image dataset. The linelet [7] method represents intrinsic properties of line segments in images using an undirected graph which performs well on an urban scene dataset, but does not work well for diagrams. A recent review about line segment detection methods can be found in [9, 19].

Our TGGLines method builds on topological-based

methods. TGGLines uses the skeleton graph image rep- resentation proposed in [24]. The topological graph is generated automatically from an image skeleton, which can capture the topological and geometrical properties of shapes of objects present in an image [12]. We use the well-known and robust Zhang-Suen thinning algorithm [27] to extract skeletons from images and simplify the geometries in the graph representation using the Douglas-Peucker algorithm [8] because of its simplicity and robustness. We have also tested the Visvalingam’s [21] on our dataset; however, it does not work as well as the Douglas-Peucker algorithm.

3. TGGLines approach

In this section, we elaborate the proposed TGGLines method, including image representation (Section 3.1), concepts (Section 3.2), and workflow illustrated using a simple example (Section 3.3).

3.1. TGGLines image representation

The image representation used in our TGGLines is the skeleton graph, which is a topological graph generated from a image skeleton [24]. We use Zhang-Suen thinning algorithm [27] for image skeleton extraction, as it is well-known and robust.

See Figure 2 below for an illustration of the image representation we use in our TGGLines approach (the handwritten digit image used here is taken from the MNIST dataset[14]). In the skeleton graph, each node represents a pixel in the image skeleton, and each edge indicates that the two pixels it connects are neighbors.

3.2. TGGLines concepts

Skeleton graph: A skeleton graph is an embedded graph generated from an image skeleton, where each node represents a pixel in the image skeleton, and an edge betwee two pixel nodes indicate the two pixels are neighbors.

Path: Each path is an embedded graph that is a subgraph of the skeleton graph, which consists of non-salient nodes segmented by salient nodes (e.g., junction nodes and end nodes, defined in Section 3.2.1) and edges connecting the nodes.

Line segment connectivity graph (LSCG): It is an embedded graph generated from the skeleton graph, where N is a set of nodes each representing a path, and E is a set of edge representing salient nodes (e.g., junction nodes or end nodes). each node will have an attribute that pointing to its

corresponding path it represents. LSCG is used to organized segmented paths.

Simplified LSCG: It is the LSCG that the paths it organize are simplified.

3.2.1 Node type in skeleton graph

There are three types of salient nodes in our TGGLines. We will use them to segment a skeleton graph to multiple paths for simplification.

• End node: A (pixel) node that has only 1 neighbor.

• Junction node: A (pixel) node that has n neighbors where n > 2

• Turning node: A (pixel) node that has two neighbors.

3.3. TGGLines workflow

The TGGLines workflow is presented in Figure 7 below visually illustrated by a simple and straightforward example.

4. TGGLines algorithms

In this section, we provide the algorithms we developed for the TGGLines introduced in Section 3 above.

The overview pseducode for the TGGLines algorithm is provided in Algorithm 1.

We first extract skeleton from an input diagram image. then a skeleton graph is generated from the skeleton, after that, salient points (defined in Section 3.2.1) are detected by counting the incident edges for each pixel node in the skeleton graph. Then the skeleton graph is segmented to multiple paths using the detected salient nodes; meanwhile a line segment connected graph is generated while segmenting skeleton graph to paths. Then we simplyfing paths oragzed in LSCG using the Douglas-Peucker algorithm [8], detailed in Algorithm 2.

The LSCG for the illustrative example shown in the workflow Figure 7 is given in Figure 4.

5. Experiments and evaluation

In this section, we provide details about dataset (Section 5.1), experiments and results (Section 5.2), qualitative (Section 5.3) and quantitative (Section 5.4) evaluation.

Figure 2. An example of skeleton graph image representation . Figure 2 (a) shows the input image. Figure 2 (b) shows the image skeleton extracted from the input image. Figure 2 (c) provides the skeleton graph corresponding to the skeleton present in (b).

5.1. Dataset

To evaluate TGGLines, and compare with state-of-the-art methods, we develop a simple interface to manually annotate 10 diagram images taken from the 2000 Binary Patent Image Database developed by Multimedia Knowledge and Social Media Analytics Laboratory (MKLab) [1]. The database contains images from patents maintained by the European Patent Office.

For readability, we take partial images from 10 se-

lected samples (cropped without changing the resolution of the original images), the file size ranges from 53 KB (212x171, #3 in Table 1) to 307 KB (524x566, #7 in Table 2). Our dataset is small yet representative. Samples are carefully selected from the 2000 patent images, by considering this selection criteria: (1) different complexity of line directions (e.g., vertical, horizontal, other arbitrary angles), (2) spacing between line segments (roughly equal spacing, sparse, dense), (3) topological relations (e.g., single line segment, intersection,

Figure 3. TGGLines workflow illustrated with a simple and straightforward example. Note that in (b) it is inverted for image skeleton extraction. In (c), the red node indicates junction nodes and blue nodes indicate turning node (definition of salient point types can be found in Section 3.2.1). in (g) LSCG represents line segment connectivity graph. Each node in LSCG represents a path, for example in (g) there are three different paths.

turning, circular). Figure 5 shows two annotated examples. The images in the experiment, image #01 to #10, are the partial images taken from the following images from the patent image dataset [1]: 01779, 01126, 00501, 00780, 00971, 00575, 00429, 01267, 00811, and 01140.

5.2. Experiments and results

We implement TGGLines in Python using OpenCV, Scikit-image [20], SymPy [17], and NetworkX [10]. We compare TGGLines with state-of-the-art methods: PPHT [16], LSD [22], EDLines [2] and Linelet [7]. Results can be found in the sets of figures organized in

Figure 4. Line segment connectivity graph (LSCG) for the example shown in the workflow Figure 7 above. each light blue node represents a path, if two paths are connected by a junction node or turning node, there is an edge between the nodes.

Figure 5. Annotation examples: Line segments of (a) varying angles and spacing intervals; and (b) even interval. Examples have different resolution.

Tables 1 and 2. Parameter settings (Table 3) for each method are provided in Appendix A.3, and the computing environment details can be found in Appendix A.4.

5.3. Qualitative comparison

We qualitatively compare the line segment detection methods as presented in the results Tables 1 and 2. Clearly, PPHT detects many tiny line segments and double lines for each true line segment, as it is an edge-based method. LSD and EDLines detect double lines for each line segment in the ground truth; this is not surprising, as they are local gradient-based methods.

Visually, EDLines performs the worst for our experiment dataset, many line segments cannot be detected by EDLines (especially for image #3 and #6 in Table 1 and 2, respectively). LSD detect many tiny lines, similar to PPHT, especially in image #7 and #8 in Table 2.

The performance of Linelet is not consistent. For some images, double lines are detected for each line in the ground truth (see image #7 and #9 in Table 2), whereas for other images, single lines are detected (e.g., image #6 and #8 in Table 2).

Among all methods, visually, our TGGLines performs the best. Another advantage of TGGLines is seen in Table 2 image #10 where multiple crossed lines form a circle intersection. As most methods detect lines based on edge maps, the circles from the original images leave an open circle shape in the results. While TGGLines avoids the open circle, and invariant to line width because we detect the lines based on image skeletons.

5.4. Quantitative comparison

We quantitatively compare the line segments from our TGGLines method and other four methods we benchmark against ground truth. Automatically quantifying the performance of detected line segment results is difficult, because the errors among the methods vary in nature. For example, some methods (like PPHT and LSD) detect most lines as double lines, but sometimes as single lines.

We use line detection accuracy as a simple evaluation metric, by manually counting the true positive line segments detected to calculate the accuracy. True positives are defined using the following criteria (examples of criteria are visually illustrated in Appendix A.5): (1) for double line cases (e.g., PPHT, LSD), we count a line segment as correct if (a) both line segments are correctly detected compared with the corresponding line segment in the ground truth, so we assign weight 0.5 for each line segment; or (b) if more than half is detected for one segment, we give it weight 0.5 * 0.5 = 0.25; (2) for single line cases (e.g., TGGLines), we count a line segment as correct if (a) it is fully detected correctly, we assign weight 1; or (b) if more than half is correctly detected, we give weight 0.5; (3) if many tiny line segments are detected for a line segment in the ground truth, we view it incorrect, and assign it weight 0.

The accuracy calculation is based on the manually annotated line segments in ground truth. Specifically, , where is the number of correctly detected line segments (true positive), and is the total number of line segments in ground truth. For methods that (e.g., LSD and EDLines) detect double lines for each line in the ground truth, we only count it detected correctly if it detect both line segments. The accuracy of the methods on line segment detection is provided in

Table 1. Line segment detection results on images #01 — #05 of the dataset.

Figure 6. We can see that the accuracy results match up with the visual comparison. EDLines performs the worst among the images in our dataset, and TGGLines performs the best, while Linelet is not consistent.

While manually counting line detection results, we note that TGGLines is the only method that avoids detecting double lines, and it detects lines with the least break points.

Figure 6. Line segment detection accuracy for different methods.

6. Conclusion

We have introduced a robust topological graph guided line segment detection method, TGGLines, for low quality binary diagram images, using a skeleton graph image representation. Compared with state-of-the-art line segment detection methods (specifically, PPHT, LSD, EDLines, and Linelet), on diagram images taken from a patent image database; empirical results show that our TGGLines approach outperforms other methods, both visually and quantitatively.

Beyond the accuracy of TGGLines compared with other methods, TGGLines has two competitive advantages: (1) it is robust, as TGGLines only requires one parameter and most importantly, this parameter is adaptive; and (2) the line segments detected using TGGLines are organized in a topological graph – we name it line segment connectivity graph (LSCG). Thus the topological relations among the line segments are captured and stored while detecting line segments which can be used for further topological-based image analysis.

The effectiveness and robustness of our method, especially the topological relations among detected line segments, provide an important foundation for many applications, such as text recognition in historical documents and OCR-based applications, converting rasterized line drawings to vectorized images, road lane line detection for autonomous driving, as in real world road scene images often contain similar zigzag “noise” as

Table 2. Line segment detection results on images #06 — #10 of the dataset.

06

07

08

09

10

scanned document images have, because of scenarios such as worn road markings, falling leaves and/or dirt over roads.

References

[1] Multimedia knowledge and social media analytics laboratory (MKLab). 2000 binary patent images database, 2010. Available online: https://mklab.iti.gr/results/patent-image-databases/ (accessed on October 10, 2019). 4, 5

[2] Cuneyt Akinlar and Cihan Topal. EDLines: A real-time line segment detector with a false detection control. Pattern Recognition Letters, 32(13):1633–1642, 2011. 1, 2, 5

[3] Su-Yong An, Jeong-Gwan Kang, Lae-Kyoung Lee, and Se-Young Oh. Line segment-based indoor mapping with salient line feature extraction. Advanced Robotics, 26(5-6):437–460, 2012. 1

[4] Dana H Ballard. Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition, 13(2):111– 122, 1981. 1, 2

[5] Michael Beyeler, Florian Mirus, and Alexander Verl. Vision-based robust road lane detection in urban environments. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 4920–4925. IEEE, 2014. 1

[6] John Canny. A computational approach to edge detec- tion. IEEE Transactions on Pattern Analysis and Machine Intelligence, (6):679–698, 1986. 2

[7] Nam-Gyu Cho, Alan Yuille, and Seong-Whan Lee. A novel linelet-based representation for line segment detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(5):1195–1208, 2017. 1, 2, 5, A - 2

[8] David H Douglas and Thomas K Peucker. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: the International Journal for Geographic Information and Geovisualization, 10(2):112–122, 1973. 3

[9] Rui FC Guerreiro and Pedro MQ Aguiar. Connectivity- enforcing hough transform for the robust extraction of line segments. IEEE Transactions on Image Processing, 21(12):4819–4829, 2012. 2

[10] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using NetworkX. In In Proceedings of the 7th Python in Science Conference (SciPy), pages 11–15, 2008. 5

[11] John Michael Harris, Jeffry L Hirst, and Michael J Moss- inghoff. Combinatorics and graph theory, volume 2. Springer, 2008. A - 1

[12] J Komala Lakshmi and M Punithavalli. A survey on skeletons in digital image processing. In International Conference on Digital Image Processing, pages 260– 269. IEEE, 2009. 3

[13] Louisa Lam, Seong-Whan Lee, and Ching Y Suen. Thinning methodologies-a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(9):869–885, 1992. 1

[14] Yann LeCun, L´eon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. 3

[15] Jos´e Lezama, Rafael Grompone von Gioi, Gregory Ran- dall, and Jean-Michel Morel. Finding vanishing points via point alignments in image primal and dual domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 509–515, 2014. 1

[16] Jiri Matas, Charles Galambos, and Josef Kittler. Robust detection of lines using the progressive probabilistic hough transform. Computer Vision and Image Understanding, 78(1):119–137, 2000. 1, 2, 5

[17] Aaron Meurer, Christopher P. Smith, Mateusz Paprocki, Ondˇrej ˇCert´ık, Sergey B Kirpichev, Matthew Rocklin, AMiT Kumar, Sergiu Ivanov, Jason K Moore, Sartaj Singh, et al. SymPy: symbolic computing in Python. PeerJ Computer Science, 3:e103, 2017. 5

[18] Mark Needham and Amy E Hodler. Graph Algorithms: Practical Examples in Apache Spark and Neo4j. O’Reilly Media, 2019. A - 1

[19] Payam S Rahmdel, Richard Comley, Daming Shi, and Siobhan McElduff. A review of hough transform and line segment detection approaches. In VISAPP (1), pages 411–418, 2015. 2

[20] Stefan Van der Walt, Johannes L Sch¨onberger, Juan Nunez-Iglesias, Franc¸ois Boulogne, Joshua D Warner, Neil Yager, Emmanuelle Gouillart, and Tony Yu. Scikitimage: image processing in python. PeerJ, 2:e453, 2014. 5

[21] Maheswari Visvalingam and James D Whyatt. Line gen- eralisation by repeated elimination of points. The Cartographic Journal, 30(1):46–51, 1993. 3

[22] Rafael Grompone Von Gioi, Jeremie Jakubowicz, Jean- Michel Morel, and Gregory Randall. LSD: A fast line segment detector with a false detection control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4):722–732, 2010. 1, 2, 5

[23] Yiliang Xu, Sangmin Oh, and Anthony Hoogs. A mini- mum error vanishing point detection approach for uncalibrated monocular images of man-made environments. In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1376–1383, 2013. 1

[24] Liping Yang, Diane Oyen, and Brendt Wohlberg. Im- age classification using topological features automatically extracted from graph representation of images. In Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG), 2019. 3, 5, A - 1

[25] Liping Yang, Diane Oyen, and Brendt Wohlberg. A novel algorithm for skeleton extraction from images using topological graph analysis. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019. 1, A - 1

[26] Liping Yang and Michael Worboys. Generation of navi- gation graphs for indoor space. International Journal of Geographical Information Science, 29(10):1737–1756, 2015. A - 1

[27] TY Zhang and Ching Y Suen. A fast parallel algorithm for thinning digital patterns. Communications of the ACM, 27(3):236–239, 1984. 3

A. Appendix

A.1. Abbreviations

In this appendix, we provide the abbreviations (ordered alphabetically) of terms we used in the paper. EDLines Edge drawing lines HT Hough transform LSCG Line segment connectivity graph LSD Line segment detector OCR Optical character recognition PPHT Progressive probabilistic hough transform TGGLines Topological graph guided lines

A.2. Definition of terms used

In this appendix, we provide the definitions to some concepts (ordered alphabetically; referenced [25, 26, 24, 11, 18]) in graph theory and computational geometry we used in our TGGLines.

Convex hull

The convex hull of a finite set of points S is the intersection of all half-spaces that contain S . A half space in two dimensions is the set of points on or to one side of a line. Computing the convex hull of S is one of the fundamental problems of computational geometry.

Embedded graph

An embedded graph is a graph that each node has (planar) coordinates so the graph can be drawn on a plane uniquely without any edge intersection or crossing.

Graph

A graph consists of a collection of nodes (also called vertices or points) and a collection of edges that connect the nodes.

Skeleton

The skeleton of a binary image is a central line extraction (1 pixel wide) of objects present in the image through thinning.

Skeleton graph

The skeleton graph of a binary image B is an embedded graph generated from the skeleton S of B, where each node in represents a white (i.e., non-zero) pixel in S and each edge e in indicates the two pixels represented by the end nodes of e are neighbors.

Path

nodes connected by edges. The length of a path is the number of edges traversed. node v is reachable from node u if there is a path from u to v. A graph is connected, if there is a path between any two nodes.

A.3. Parameter settings and computational time

This appendix provides the parameter settings for each method in Table 1 and 2.

Table 3 provides the parameters used for each method we compare. Computing environment for the experiments is provided in Appendix A.4 below.

A.4. Computing environment

In this appendix, we provide the computing environment that we ran our experiments. We ran all the experiments on a Windows desktop machine (Windows 10) with Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (6 cores and 12 logical processors; 32.0 GB RAM).

A.5. Visually illustrated criteria for quantitative evaluation

In this appendix, we provide the criteria for judging the correctness of line segments (Table 4) along with a visual example showing the criteria applied to detected line segments for each method that we compare during our quantitative evaluation (Section 5.4).

Using the criteria provided in Table 4, we judge and calculate how many lines are correctly detected against the ground truth. Then the accuracy is calculated. See Figure 7 for an example of the criteria used to judge the lines detected for each method and how the accuracy is calculated for image #01 from Table 1.

Table 3. Parameter settings for different line segment detection methods in Table 1 and Table 2. For other parameters not listed specifically here, default parameters provided in the tools are used.

Table 4. Judging criteria for evaluation. GT refers to ground truth. The line segments in the example in Figure 7 are color-coded by weights as shown here. For double-line cases (see Figure 7 (c) for an example), we only count the segment correct if both lines are correctly detected against the GT. Thus, our scoring process is as follows: each one of the double lines, if they are correctly detected, the weight for each line is assigned 0.5 and thus the score for the score is 0.5 * 0.5 = 0.25.

(c) LSD: # of lines detected correctly = 27 * 0.5 + 23 * 0.25 = 19.25. Accuracy = 19.25 / 25 * 100% = 77%

(d) EDLines: # of lines detected correctly = 1 * 0.5 + 16 * 0.25 = 4.5. Accuracy = 4.5 / 25 * 100% = 18%

(e) Linelet: # of lines detected correctly = 5 * 0.5 + 25 * 0.25 = 8.75. Accuracy = 8.75 / 25 * 100% = 35%

(f) TGGLines: # of lines detected correctly = 17 * 1 + 8 * 0.5 = 21. Accuracy = 21 / 25 * 100% = 84%

Figure 7. Judging criteria illustrated visually for the evaluation of different line segment detection methods for image #01 in Table 1.