Fast Projective Image Rectification for Planar Objects with Manhattan Structure

This paper presents a method for metric rectification of planar objects that preserves angles and length ratios. An inner structure of an object is assumed to follow the laws of Manhattan World i.e. the majority of line segments are aligned with two orthogonal directions of the object. For that purpose we introduce the method that estimates the position of two vanishing points corresponding to the main object directions. It is based on an original optimization function of segments that estimates a vanishing point position. For calculation of the rectification homography with two vanishing points we propose a new method based on estimation of the camera rotation so that the camera axis is perpendicular to the object plane.

The proposed method can be applied for rectification of various objects such as documents or building facades. Also since the camera rotation is estimated the method can be employed for estimation of object orientation (for example, during a surgery with radiograph of osteosynthesis implants). The method was evaluated on the MIDV-500 dataset containing projectively distorted images of documents with complex background. According to the experimental results an accuracy of the proposed method is better or equal to the-state-of-the-art if the background occupies no more than half of the image. Runtime of the method is around 3ms on core i7 3610qm CPU.

Keywords: Vanishing point, metric rectification, planar target, Manhattan world, line segments.

Increase of computational power allows solving many difficult computer vision problems on mobile devices. Amongst others there is a task of projectively distorted image analysis. Most commonly this task includes object localization and rectification step. It is important for rectification to be metric so the angles and length ratios of the original object are preserved. Otherwise a refined object image would be corrupted with some nonprojective distortions that may result errors after further analysis, for example text recognition.

If an object template is known and contains many static elements the object can be located and rectified simultaneously with local features [1, 2]. In case of a rectangular object with a poor or unknown template we can detect its borders [3, 4, 5] and then calculate homogrpahy from the found quadrangle to a rectangular with known aspect ratio. However, this approach tends to misclassify inner segments as borders so rectified image may be stretched.

The localization problem in undistorted images is well-known [6, 7] so we can isolate the problem of rectification. We assume that object’s inner structure belongs to the Manhattan World i.e. the majority of line segments has two dominant orthogonal directions. In this case common approach contains two steps: estimation of two vanishing points corresponding to the object directions and calculation of a rectification homography. A vanishing point might be found as an intersection of lines in an image. But due to data noise we have to construct a function of lines which has extrema in a vanishing point. Papers use many different approaches to select lines and to construct the function.

For lines to be detected Hough Transform [8] is widely used [9, 10, 11, 12]. In [9] vanishing points are estimated robustly as sine waves in Hough space. Paper [10] presents a method which chooses the point with the highest number of peaks of a skew projective histogram. Authors of [11, 12] present a neural network architecture based on a new FHT layer. It should be noted that in many cases when HT lines are employed the optimization function is calculated not on the image plane. But the image plane is usually preferred because the noise originates in the image [13].

Papers [14, 15] consider documents rectification based on text lines and vertical strokes of characters. They are extended to be lines and an optimization function is defined as a sum of squared distances from the lines to the sought vanishing point. Here one calculates consistency of a line and a vanishing point on the image plane. However, this approach ignores the fact that the farther the lines intersection is from the image center the higher noise it may have due to line noise. So the consistency value depends on that distance.

This problem can be avoided if one uses a model of data noise while constructing the optimization function. As there is no convenient model of line noise we can employ segments with Gaussian noise of their endpoints. This idea is used in [16, 17] where the consistency of a segment and a vanishing point is calculated as a squared distance from its endpoint to a line through the vanishing point and the segment middle point. But there is no proof of optimality of such an optimization function.

There are several different methods for rectification homography calculation with two vanishing points. In papers [9, 12] authors find four intersection points of lines through each vanishing point and 2 opposite image corners and transform them to the image corners. In [10, 15] separate homographies are constructed for each vanishing point to be transformed to infinity. Both methods don’t preserve length ratios. In papers [18, 14] the camera model is used to rectify an image. In [14] the method for metric rectification is presented but a rectified image may be flipped. Authors of [18] estimate rectification based on the camera rotation for object to be perpendicular to the camera axis. However, they assume that the camera focal length equals 1 so the method also does not preserve length ratios.

In this work we propose an algorithm for vanishing point detection based on an optimization function proved to be optimal with the segment noise model described above. For image metric rectification we introduce an original algorithm that uses estimation of camera rotation and avoids the refined image flip. Also we discuss how to perform it with the unknown camera focal length.

The proposed method can be used for rectification of document or facade images without any information about objects locations or their templates. Also it can be applied while increase the spatial resolution in computer tomography. For that purpose in [19] authors place asymmetrically reflecting crystals behind the object. But the presence of these optical elements in the optical path leads to projective distortion of the recorded projections, which must be adjusted before sending the measured projections for reconstruction. If the object under investigation has Manhattan structure the proposed method is an adequate tool for the necessary correction.

Moreover, while we estimate camera rotation it can be interpreted as the object orientation. It can be useful in many computer vision tasks of object analysis, for example camera calibration or estimation of an osteosynthesis implant pose with a radiograph during a surgery.

The projective distortions in images can be described using the pinhole camera model (Fig. 1). Let C be the camera center and CXYZ be the camera coordinate system, where the axis CZ is the camera optical axis. Then the image plane can be expressed as Z f , and f is the focal length. On the image plane we define a homogeneous coordinate system Oxy so that the axes Ox and Oy are codirectional to CX and CY respectively. We assume that the principal point p in which the optical axis and the image plane intersect is known and pixels are square. Then we can define the camera calibration matrix K [13] that transforms points in the camera system to points on the image plane Oxy :


Figure 1. The camera and image coordinate systems, parallel line in 3D world, their projections and the corresponding vanishing point


Let us consider n parallel lines in the camera system. Their projections on the image plane will have a common point v called a vanishing point. It can be infinite only if the lines are parallel to the image plane. A Manhattan object contains segments of two main directions hence two vanishing points should exist on its image.

To find the vanishing point in the image we may detect the corresponding line segments. But all detectors can provide only noisy data. Thus we cannot simply intersect the lines passing through the segments. Then the sought point should be obtained by optimizing a function of the noisy segments. In this section we derive this function with Maximum likelihood estimation method.

3.1 Probabilistic model of a noisy segments set converging to a vanishing point

For the optimization function to be constructed we need to define the model of a noisy segments set on the image

plane. Let us denote the sought vanishing point by

endpoints referred to as


and the ideal segments are unknown the distribution should depend on them. We denote all the parameters as


3.2 Optimization problem for a vanishing point estimation

Given a realization


The optimization problem is ˆ = arg min ( )l


parameters and substituting it in the function. Similar procedure is performed for the parameters { }jitand the function



through the v point. To estimate the directional vectors ie of the optimal lines il we can use the Principal component

analysis. Then the resulting optimization problem for a vanishing point estiamation can be expressed as:


The problem cannot be solved explicitly so in the next Section we propose a numerical method for vanishing point estimation based on the derived optimization function.

3.3 Vanishing points detection on the image plane

We estimate the position of a vanishing point for the noisy set of line segment numerically. Thus the consistency

measure ( , )D v s


set does not imply the presence of outliers. Then let us say that if the consistency measure exceeds a predefined threshold

DT then the segment is assumed to be an outlier for the point v


The resulting output of the algorithm should be a pair of orthogonal vanishing point. Thus in this Section we construct the set of probable candidate points from which a pair could be chosen. First of all we obtain the set E of rough candidates. It contains intersections of the segments which are long enough. The length threshold LT depends on the mean length of the segments. For each rough candidate c we obtain the inliers set. Then the candidate set is inspected for repeated points. If two points have similar inliers the point with fewer of them is discarded from E .

Each rough candidate e Eis used as an initial point to evaluate numerically an optimal solution

The optimization is conducted using the conjugate gradient method for the inlier segments of e . For each accurate


points which are located too close to the principal point p (in which the optical camera axis and the image plane intersect). It allows eliminating wrong candidates corresponding to too strong projectivity. After this discarding

procedure we obtain the resulting set

3.4 Selection of two orthogonal vanishing points

First of all we should derive the requirements for two candidate vanishing points to be orthogonal. If the positions of two vanishing points 1v  and 2v are given absolutely accurate we can check their orthogonality explicitly. In case of the

known camera focal length f the calibration matrix K defined in Section 2 is also determined. Let v be a vanishing

point on the image plane defined in the homogeneous coordinate system Oxy . A vector

system is a directional vector of the parallel lines corresponding to the vanishing point. Thus for vanishing points 1v and


But often the focal length f is undefined. If the vanishing points are orthogonal we can calculate f from the above:


However, the vanishing points are estimated approximately. Due to inaccuracy the angle α on nondistorted images may be slightly less than the right angle. So the resulting constrain may be formulated as 190º  T   .

Wider angles α between vanishing points lead to higher level of projective distortion. Images with very strong projectivity are hardly readable and thus not used for object analysis so we can constrain the vanishing point pairs to

have angles less than a threshold

Having the restrictions on vanishing point pairs we should choose one pair from the set of vanishing point candidates *E. For that purpose we construct all possible point pairs * * *{{ , , 1..| |, }}|i jW e e i j E i j  . Pairs which don’t

satisfy the angle constrains are discarded. Then the pair with maximum sum of inliers lengths is chosen as the resulting pair of vanishing points he  and ve .

3.5 Algorithm for detection of a vanishing point pair

The Algorithm 1 for vanishing points detection can be summarized as follows:


1. Construct the set of rough candidates ( ) , ( ),{ }| ,ij i ji j i j L Llength T length TE p S s ss s s s    

2. For each e Eevaluate an inliers set ( , ){ | }i ie DD TeInl s S s  

3. Sort E by a number of inliers 4. Discard repeated candidates ie E if, 0 : , ( )\


5. Construct the set of accurate candidates


6. For each


7. Discard repeated candidates from

8. Discard candidates


10. Discard pairs


In this paper we detect only one pair of vanishing points. However, the algorithm can be easily extended to obtain several alternative pairs. It may be useful if we seek for the rectifications for several objects on a single image (two book pages) or if an image background also contains vanishing points with many segments.

Since we have the pair of the orthogonal vanishing points

rectifies the source image. For that purpose we construct a new rectifying camera coordinate system ' ' 'CX Y Z . Let us denote representations of vanishing points in the homogeneous coordinate system Oxy  by ,h vv v. As shown in Section

3.4 the corresponding vectors ,h vV V are the 3D directional vectors of source horizontal and vertical lines on the object.

Then we rotate the camera such as these vectors become collinear to the camera axes. If we determine the new camera xand y-axes as codirectional to the vectors ,h vV V the corresponding optical axis may have a reversed direction. It leads to rectified but flipped image. In this case we just invert the axes 'CY and 'CZ .

To calculate the vectors ,h vV V we should know the camera focal length. If it is not predefined we can calculate it as shown in Section 3.4. However, this problem is ill-conditioned, which results in high inaccuracy of the rectification process. Then we can use an empirical approximation of the focal length: diagonal length of the source image. In this case the vectors ,h vV V may be not orthogonal. So we calculate a new directional vector of y-axis orthogonal to the obtained x- and z-axes. Then the new camera system is Cartesian but leads to an affine skew with known angle. We can rectify it with an affine transform A . Thus the sought metric rectification homography H that transforms the image

plane Oxy to a new image plane of the rotated camera can be calculated as

matrix from old camera system to the new one. The algorithm for estimation of the rectification homography can be summarized as follow.

Algorithm 2




5. Return

Given the set of segments in the image we can obtain the metric rectification homography following the Algorithm 1 for vanishing point detection and Algorithm 2 for homography estimation. It should be noted that the Algorithm 1 does not determine which of two vanishing points is horizontal or vertical. So rectification is estimated up to rotation of a refined image on 90º . In the next section we evaluate the speed and accuracy of the proposed method on real images.

While there are many different papers proposing algorithms for rectification images of Manhattan structured objects only a few of them use open datasets to evaluate their methods. For comparison we chose the recent paper [12] that uses neural network (NN) approach and shows the high accuracy of rectification on the open MIDV-500 dataset [20].

To obtain the initial set of segments in the image any method can be used, for example LSD [21]. However, in the experiments we applied the method [22] where 3 types of segments are detected. There were edges which could describe the document borders, ridges which correspond to fill lines, table borders, etc, and text baselines.

5.1 Dataset

The MIDV-500 dataset contains 15000 projectively distorted images of planar documents (ID cards) captured using a mobile camera. Many document templates have no inner vertical segments while our algorithm implies the presence of many vertical segments on the object for vertical vanishing point to be found. So we used only following document types: 2, 3, 12 - 17, 19, 23, 26, 30, 31, 36, 38, 39, 49. Also we removed images in which at least one document corners was out of the image borders. It was done to ensure that vertical sides are visible.

The dataset images typically have complex backgrounds such as keyboards, another document sheets etc that may contain many converging segments sets. In our experiments we wanted to evaluate a maximum relative background area (RBA) with which our algorithm still works similar to the NN method. RBA of the dataset images varies from 70% to 90% so we augmented the data to achieve lesser values: 30-60%. For that purpose we cropped images corresponding to the given value γof RBA. We calculated a cropping rectangle as follows. We obtained a rectangle R with sides parallel to the image borders circumscribed around the document quadrangle. This rectangle was being extended equally to the right and left until the value γis reached. If one side of R ran into the image border only the other side proceeded to be extended. If the width of R became equal to the image width but a current value of RBA was still less than  γ, wesimilarly extended the rectangle to the up and down.


Figure 2. Angles and side lengths explanation for inaccuracy measures rectd , rotd and ard

5.2 Measure of rectification inaccuracy

Given a document quadrangle Q in a source image, an aspect ratio t of its template and an estimated homography H we calculate three distances for a rectified quadrangle 'Q HQ :


from left top corner we know which sides are horizontal even if image is rotated on 90º . This value can show how much the 'Q width differs from the template one if their heights are equal.

5.3 Experimental results

With the experiments we wanted to estimate the maximum RBA that allowed our algorithm to reach good rectification results. For that purpose the mean distances were calculated separately for images with RBA from 30% to 60%. For the NN method only angle distances were measured. The authors proposed the algorithm for vanishing points detection and used the simplest method of rectification that did not preserve the aspect ratios. Thus the values of the arddistance could be random for such rectified images.

Table 1 presents the obtained results of inaccuracy measurements. It can be seen that for the NN method the higher RBA the lower the angle distances values. Probably the reason is that the train dataset contained only real images from MIDV-500 dataset without any augmentation for other RBA values. Because of that we also should take into account the distances values presented in the paper. For the test part of the dataset 1.71rectd  ,1.13rotd  .


the RBA=50% the results could be seen as roughly the same while higher RBA leads our method to be more inaccurate. Thus we assume that 50% is the maximum relative area of complex background that allowed our method to perform well.

Table1. Measurements of rectification inaccuracy



Figure 3. Left: original document image (RBA = 50%), right: the result of document image rectification

From Table 1 it can also be seen that our method stretched rectified document quadrangles by only 4-5%. Such distortion should not result in text rectification errors.

For images with RBA < 50% the rectification inaccuracy of our method is lower than one of the NN method. With the RBA=50% the results could be seen as roughly the same while higher RBA leads our method to be more inaccurate. Thus we assume that 50% is the maximum relative area of complex background that allowed our method to perform well. From Table 1 it can also be seen that our method stretched rectified document quadrangles by only 4-5%. Such distortion should not result in text rectification errors.

Also we measured the mean runtime for both algorithms on core i7 3610qm CPU. For that purpose the proposed method was implemented in C++. It should be noted that the NN method input is an image itself whereas our method employs segments. Therefore we also calculated the mean runtime of the applied segment detection algorithm to compare the time of the whole rectification processes. The runtime of the NN method equals 1.48s, while the time of our method is only 2.23 ms and the time of segment detection is 20.5ms. So the overall process of image rectification is 65 times faster. The example of document rectification with our method is shown in Fig 3.

In this paper we introduced the new method for rectification of projectively distorted Manhattan objects images with complex background which occupies no more than half of the image area. For that purpose we derived the new optimization function of segments for vanishing point position to be estimated. Based on the obtained function the robust method that allows detecting two orthogonal vanishing points was presented. Also we proposed the original method that estimates camera rotation and corresponding rectification homography with vanishing points.

According to the experiments the proposed method shows rectification quality as good as the state-of-the-art method on the distorted document images dataset MIDV-500. The inaccuracy in length ratio preservation is around 5%. Experimental results also show that the method runtime is near 3ms on core i7 3610qm.

This work is partially supported by Russian Foundation for Basic Research (projects 18-29-26035 and 17-29-03236).

[1] N.S. Skoryukina, D.P. Nikolaev, V.V. Arlazarov, "2D Art recognition in uncontrolled conditions using one-shot learning,"Proc. SPIE 11041, Eleventh International Conference on Machine Vision (ICMV 2018), 110412E, (2019). DOI: 10.1117/12.2523017.

[2] N. Skoryukina, J. Shemyakina, V. L. Arlazarov and I. Faradzhev, “Document localization algorithms based on feature points and straight lines,” Proc. SPIE 10696, Tenth International Conference on Machine Vision (ICMV 2017), 106961H, 1-8, (2018). DOI: 10.1117/12.2311478.

[3] A. Zhukovsky et al., “Segments Graph-Based Approach for Document Capture in a Smartphone Video Stream,” 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 337-342, (2018). DOI: 10.1109/ICDAR.2017.63.

[4] N. Skoryukina, D.P. Nikolaev, A. Sheshkus, D. Polevoy, "Real Time Rectangular Document Detection on Mobile Devices," Proc. SPIE 9445, Seventh International Conference on Machine Vision (ICMV 2014), 94452A, (2015). DOI: 10.1117/12.2181377.

[5] Z. Zhang, L.-W. He, “Whiteboard scanning and image enhancement,” Digit. Signal Process., Vol. 17(2), 414–432, (2007). DOI: 10.1016/j.dsp.2006.05.006.

[6] D. Matalov, S. Usilin, V.V. Arlazarov, "Modification of the Viola-Jones approach for the detection of the government seal stamp of the Russian Federation," Proc. SPIE 11041, Eleventh International Conference on Machine Vision (ICMV 2018), 110411Y (2019). DOI: 10.1117/12.2522793.

[7] S. Gladilin, A. Kotov, D.P. Nikolaev, S. Usilin, "Postroenie ustoychivyh priznakov detektsii i klassifikatsii ob'ektov, ne obladayushchih kharakternymi yarkostnymi kontrastami," ["Construction of robust features for detection and classification of objects without characteristic brightness contrasts"], Informatsionnyye tekhnologii i vychislitel'nyye sistemy [Information Technologies and computing systems], Vol. 1, 53-60, (2014).

[8] R.O. Duda, P.E. Hart, “Use of the hough transformation to detect lines and curves in pictures,” Commun. ACM 15(1), 11–15 (1972).

[9] Y. Takezawa, M. Hasegawa, S. Tabbone, "Robust perspective rectification of camera-captured document images," 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 6, 27–32, (2017). DOI: 10.1109/ICDAR.2017.345.

[10] C. Simon, Williem, I.K, Park, "Correcting geometric and photometric distortion of document images on a smartphone," Journal of Electronic Imaging, Vol. 24(1), 013038, (2015). DOI: 10.1117/1.JEI.24.1.013038.

[11] A. Sheshkus, A. Ingacheva, and D. Nikolaev, “Vanishing points detection using combination of fast hough transform and deep learning,” Proc. SPIE 10696, Tenth International Conference on Machine Vision (ICMV 2017), 106960H, (2018). DOI: 10.1117/12.2310170.

[12] A. Sheshkus, A. Ingacheva, V. Arlazarov, D. Nikolaev, "HoughNet: neural network architecture for vanishing points detection," arXiv preprint arXiv:1909.03812, (2019).

[13] R. Hartley, A. Zisserman. Multiple View Geometry in Computer Vision (2nd Edition), Cambridge University Press, New York (2004).

[14] J. Liang, D. DeMenthon, D. Doermann, "Geometric Rectification of Camera-Captured Document Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30(4), 591-605, (2008). DOI: 10.1109/TPAMI.2007.70724.

[15] M. Pilu, "Extraction of illusory linear clues in perspectively skewed documents," Proc. of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), (2003). DOI: 10.1109/CVPR.2001.990498.

[16] J.-P. Tardif, "Non-Iterative Approach for Fast and Accurate Vanishing Point Detection," 2009 IEEE 12th International Conference on Computer Vision, (2010). DOI: 10.1109/ICCV.2009.5459328.

[17] H. Wildenauer, M. Vincze, "Vanishing Point Detection in Complex Man-made Worlds," 14th International Conference on Image Analysis and Processing (ICIAP 2007). DOI: 10.1109/ICIAP.2007.4362845.

[18] D. Santana-Cedrés et al., "Automatic correction of perspective and optical distortions," Computer Vision and Image Understanding, Vol. 161, 1-10, (2017). DOI: 10.1016/j.cviu.2017.05.016.

[19] A.V. Andreev et al., "Two-dimensional image magnification in an asymmetric-reflection X-ray microscope," Journal of Experimental and Theoretical Physics Letters, Vol. 85(1), 98–100, (2007). DOI: 10.1134/S0021364007010213.

[20] V.V. Arlazarov, K. Bulatov, T. Chernov, and V.L. Arlazarov, “Midv-500: A dataset for identity documents analysis and recognition on mobile devices in video stream,” arXiv preprint arXiv:1807.05786, (2018).

[21] R.G. von Gioi, J. Jakubowicz, J.-M. Morel, G. Randall, "LSD: a Line Segment Detector," Image Processing On Line, Vol. 2, 35–55, (2012). DOI: 10.5201/ipol.2012.gjmr-lsd.

[22] D. Tropin, J. Shemiakina, I. Konovalenko, I. Faradjev, "O lokalizacii ploskih ob"ektov na izobrazheniyah so slozhnoi strukturoi proektivnyh iskazhenii," ["Localization of planar objects on the images with complex structure of projective distortion"], Informacionnie processi [Information Processes], Vol. 19(2), 208-229, (2019).