The amount of publicly available mapping information in web services, like Google Maps and OpenStreetMap (OSM), is large, covering great part of the existing human settlements in the world. Although mapping information of buildings and several other man-made structures are largely available for urban areas, a significant amount of rural buildings is not mapped in any of the aforementioned systems. Rural building mapping information is important to assist demographic studies and help Non-Governmental Organizations to plan actions in response to crises . There is therefore a need for creating (or at least updating) urban footprint vector databases in rural areas.
Several works in the literature have approached this problem as the one of detecting buildings in remote sensing images using shape, color, edge, and texture knowledge-based features [1, 2]. More recently, Convolutional Neural Networks (CNNs, for a review in remote sensing see [3]) in combination with other image processing methods have been used to detect and delineate buildings in urban areas with successful results [4, 5, 6]. Most commonly, the pixel (or region) level detections are merged into vectorial shapes in a post-processing step. In [7], a CNN model was proposed to avoid this postprocessing step: vector footprints of buildings are learned directly, by defining the building outline definition as an active contour model, whose parameters are learned with a CNN. The investigation of building detection using deep learning is a field of growing interest, also supported by recent data processing competitions in this
direction, e.g. DeepGlobe [8].
Irrespectively of the strategy chosen, the main drawback of using CNN methods in remote sensing is the need of large amount of labeled data for training. In recent research, OSM annotations have been used as repositories of large labeled data collections. In GIScience, this source of data has proven to be very powerful, and several works have proposed methods to automatically predict attributes of OSM objects. For example in [9], the authors proposed a methodology for automatic prediction of street labels (e.g., motorway and residential). In [10], authors proposed a method using geometrical properties of the OSM annotation polygons to predict the types of buildings (e.g., residential, industrial and commercial). In [11], OSM data was used to improve robot navigation for autonomous driving and in [12] OSM data was used for 3D building modeling, allowing visualization of indoor and outdoor environments in 3D maps. Authors in [13] use Google Street View pictures to predict the landuse of the footprints. They use OSM annotations as labels to train a deep learning model. Within the remote sensing building segmentation field, OSM annotations of urban areas have been recently used in [14] and [15] as label information to perform semantic segmentation of buildings and roads. The INRIA building detection challenge uses corrected OSM footprints as labels [16].
Despite the appeal of using OSM data for training deep learning models, the quality of these data is uneven. Usually CNNs trained with this type of reference data can learn to predict the location of the object but not the exact object extent [4]. Several works proposed methods that can be useful to improve the quality of the OSM data, both for attribute classification and positional inaccuracies. Authors in [17] detect errors in OSM annotations of roads using patterns extracted from GPS tracking data. For instance, indoor corridors wrongly classified as tunnels can be detected using tracked trajectories of cars and pedestrians. In [18], distance, directional, and topological relationship of OSM objects are used to detect inconsistencies.
OSM has gathered and made publicly available large amounts of building annotation data. But if the quality of OSM data has been judged sufficient for urban areas [19], the same does not hold in rural areas, especially because of the lower update rate and the drop in the number of volunteers out of cities. By analyzing available OSM data in rural areas, we observed that the annotations performed by the volunteers suffer from three main issues, mostly due to infrequent imagery updates and incomplete/inaccurate volunteer annotations [17, 20]:
In order to deal with inaccurate reference building data, the authors in [4] propose a loss function to reduce the effect of this problem, while the authors in [22] use a Recurrent Neural Network to improve the classification maps with a small set of perfectly and manually annotated data. However, as mentioned above, for rural buildings the problem of inaccurate annotations is more severe, since buildings are smaller and scarcer than urban buildings in OSM [23]. As one can see in Figure 1, there exists considerable overlapping areas between urban buildings and the misaligned OSM annotations, while some rural buildings in the image and the OSM annotations do not overlap.
In this work, we propose a methodology to correct OSM rural building annotations. We tackle the three problems above simultaneously, with a three-stage strategy based on the predictions of a fully convolutional deep learning model that estimates the likelihood of presence of buildings.
Figure 1: Misaligned OSM building annotations (in orange) superimposed on the imagery ob- tained from Bing maps: a) For urban building misaligned annotations, there is a considerable overlap with the object in the imagery; b) For the case of rural building misaligned annotations, some buildings in the imagery and their corresponding annotations do not overlap.
In Section 2 we present the proposed methodology to correct OSM rural building annotations. Section 3 shows the dataset and the setup of our experiments and Section 4 compares the results of our proposed method with other baseline methods. Section 5 concludes the paper.
Figure 2: Proposed methodology to correct OSM rural building annotations: a) predict a building probability map from an aerial image using a CNN trained for per-pixel classification; b) correct alignment errors in the OSM annotations using a MRF-based method and a building probability map; c) remove OSM annotations based on the aligned annotations, a building probability map, and a thresholding method; d) add new annotations selected from a set of candidates obtained by a CNN that predicts rural buildings with predefined shapes.
Our methodology to correct OSM annotations of rural buildings requires a fully convolutional neural network (CNN) model trained to generate a building probability map for the overhead image (Figure 2a): this method is detailed
Figure 3: Neighboring system of the proposed MRF method. Groups of rural buildings are used as nodes of the MRF graph.
in Section 2.1. Once this classifier is trained, the building correction module consists of three main tasks, as described in Section 1. Figures 2b-d illustrate them, from top to bottom. In sections 2.2 to 2.4 we detail these methods.
2.1. Computing building probability maps
In order to correct OSM rural building annotations, we use a building probability map obtained by a CNN model that performs pixel classification. In this work we use a CNN model based on [29] that is trained on a small set of manually verified/corrected rural building OSM annotations. The CNN model performs four convolutional blocks (convolution followed by spatial pooling, nonlinear activation and batch normalization operations) but, differently from [29] that uses deconvolutions to upsample the feature map, we apply the concept of hypercolumns [30] to perform pixel classification. We modified the original hypercolumn model in the same way as for the baselines of [31]: the hypercolumns are obtained by upsampling the outputs of previous convolutions to the size of the input image using bilinear interpolation. This makes the training of the CNN more efficient and with similar performance. These activations are then stacked to a single tensor which is used to train a Multi-layer Perceptron classifier to perform pixel classification. The architecture of the described CNN is presented in Figure 2a, while the details of the specific architecture are presented in Section 3.
2.2. Aligning OSM rural building annotations
The building registration problem is considered as the problem of aligning the vector shapes from OSM to the predictions of the CNN (Figure 2b). Such alignment is performed by estimating alignment vectors, basically shifting every OSM polygon to an area of high building probability in the CNN map.
In order to compute these alignment vectors, we need to measure how well a given shift performs. To this end, we use the correlation between the aligned annotations and the building probability map obtained previously using the image on which the annotations need to be registered. Making the hypothesis that rural buildings are gathered in small groups where each building has the same misalignment error, we align groups of buildings instead of individual buildings. This reduces greatly the computational load and is numerically more effective (see the results Section 4). Moreover, using groups of buildings instead of single ones makes the results less dependent on the quality of the building probability map.
Additionally, we observed that nearby groups of buildings have similar registration errors. Based on this observation, we build our building registration module on a MRF model using this prior together with the evidence provided by the building probability map. Our method aims at finding the alignment vectors that need to be applied to the annotation locations x based on the a probability map y. Groups of buildings, or sites, are used as nodes of the MRF graph (See Figure 3), where sites i and j are neighbors (i.e., ) in the graph if they are spatially close (see Section 3.2 for more details on the MRF graph definition).
In our MRF formulation, the unary term is obtained by using the normalized correlation ) between the annotation after alignment ) and the building probability map . This term is equal to the average of the predicted probability values of the pixels contained in the aligned annotation ). The pairwise term is defined by the dissimilarity (vector norm of the difference of two vectors) between the alignment vector of the annotation i and the alignment vectors of neighboring annotations [26]. The optimal set of
alignment vectors for the annotations is defined by:
where is the set of all possible m alignment vec-
tors, is the spatial regularization parameter and Z is a normalization factor, defined as the maximum possible distance between two alignment vectors in D. To compute the optimal by minimizing the energy function U, we use the Iterative Conditional Modes (ICM) [24] algorithm initialized with arg max). As this initialization is already a good heuristic (see Section 4), the ICM algorithm allows to obtain a good solution in a few iterations. The inclusion of a distance-based weight in the pairwise term does not lead to better performances, so it is omitted for clarity. We presented preliminary results of our proposed method for alignment of OSM annotations in the conference paper [32]. Algorithm 1 summarizes the proposed method for aligning OSM annotations.
2.3. Removing incorrect building annotations
In order to remove OSM annotations that no longer exist in the updated imagery (Figure 2c), we compute the mean building probability value of the pixels contained in the aligned annotations. We observe that the histogram of these average probability values roughly follows a bimodal distribution. The group of annotations close to the first local maximum corresponds to some of the few annotations that have average probability values close to zero (showing high evidence that there is no longer a building in that location of the imagery) while the other group of annotations gathered around the second and most prominent local maximum corresponds to the majority of the aligned annotations that have higher average probability values. Since Otsu’s thresholding method [33] is known not to perform well for unbalanced distributions [34] we use the Minimum
threshold method [28]. This method iteratively smooths the histogram until only two local maxima are found. After that, the minimum value between the two local maxima is selected as the threshold. We then remove annotations, which have an average probability value below this threshold.
2.4. Add new building annotations
The last task is the addition of new building footprints (Figure 2d). We observed that rural buildings appear with very few different shapes in the imagery (e.g., circles and rectangles), as compared to urban buildings. Therefore, we make the hypothesis that a restricted number of shapes is sufficient to represent most buildings in rural areas. Inspired by this, we compile a set of 18 commonly appearing shapes and propose a CNN model that predicts if a building with one of these predefined shapes is present in a particular location of the imagery (see Figure 4). Based on our observations, we select 6 basic geometrical
Building candidates Output convolutional
Figure 4: CNN model for adding new annotations of buildings that appear for the first time in the updated imagery.
shapes: a circle of radius 3.3 meters, a square of side 4.8 meters, a rectangle of sides 3.6 and 6 meters, and the same rectangle rotated by 4590and 135. Furthermore, for each base shape we generate two more scaled versions, by approximately increasing its area by a factor of 2 and 4, resulting a total of 18 considered shapes (see Figure 4).
The architecture of the proposed CNN model is depicted in Figure 4: we apply two convolutional blocks followed by one convolutional layer to the input image of size 256 256, leading to a 61 61 feature map with 512 activations per location (details of the specific architecture are presented in Section 3). Afterwards, we apply a 1 1 convolutional layer that outputs a matrix of size 61 61 and 36 bands. This operation is performed to compute scores for the two classes of interest (presence or absence of buildings) with the 18 different shapes in each location of the 61 61 grid. This means that we have a different classifier for every building shape. Every pixel in the 61 61 grid corresponds to one location in the original 256 256 input image. Therefore, the location of our building predictions will have an additional approximation error of less than four pixels.
For training the CNN model, we use a cross entropy loss on the sum of the
binary shape classification problems. We consider as positive samples of a given building shape, rural buildings with more than 0.75 Intersection over Union (IoU) value with a shape mask. The rural buildings with less than 0.30 IoU value with a shape mask are considered as negative samples for that particular building shape. The threshold values are chosen empirically based on the object detection method presented in [35]. Note that if we choose a higher value for the positive sample’s threshold, we might ignore some buildings that have very similar desired shape and if we use lower values for that threshold, we would take the risk of including buildings whose shape does not fit with the desired building shape.
The output of this CNN model is a set of rural building candidates that have predefined shapes. We select a subset of these candidates based on the building probability map and the aligned building annotations, obtained after the annotation removal process. We filter out all the candidates that have average probability values (as obtained by the CNN model that performs per-pixel classification) and detection probability values (obtained by the CNN model described in this section) lower than a certain threshold t. In case of overlapping candidates, we select the one with the highest sum of average probability and detection probability values.
3.1. Datasets
We evaluate our method with OSM rural building data from two differ-ent countries, namely the United Republic of Tanzania and the Republic of Zimbabwe. The evaluation data collected from these two countries have different characteristics: while the Tanzania’s evaluation region contains severe misaligned and incomplete annotations, the evaluation region in Zimbabwe contains more accurate annotations. The Bing imagery utilized for the two datasets were acquired between 2004 and 2014, while the annotations obtained from OSM were performed by volunteers between 2013 and 2018. Bing maps provides an API to obtain aerial imagery (red, green and blue channels) at different spatial resolutions (e.g., 119 cm, 60 cm, 30 cm). In this work, for the training and testing datasets, we use Bing maps imagery of 30 cm spatial resolution since we wanted to obtain accurate building classification maps with the CNN. The lower the spatial resolution, the higher are the chances to obtain inaccurate building classification maps, with missing buildings and false positives. Therefore, we recommend the use of imagery with 60 cm or higher spatial resolution that can be obtained from pansharpened images of satellites such as QuickBird, GeoEye, Pl´eiades, WorldView-2, WorldView-3, and WorldView-4.
In order to train the CNN model that predicts the building probability maps (Section 2.1), we use 3134 OSM rural buildings annotations. These OSM annotations were manually verified/corrected on a set of Bing aerial images, that cover 23.75 km, acquired over the Geita, Singida, Mara, Mtwara, and Manyara regions of Tanzania. In order to obtain the building probability maps for the Zimbabwe dataset, we finetune the CNN model trained on Tanzania’s annotations with a small dataset of 559 building annotations obtained from the region of Matabeleland North in Zimbabwe.
In order to evaluate our methodology, we create validation datasets spatially disconnected from the training regions. The first one is composed of 1094 manually corrected misaligned building annotations located close to the city of Mugumu in Tanzania, where we found OSM annotations with different misalignment orientations. The second dataset is composed of 811 manually corrected misaligned annotations located in the region of Midlands in Zimbabwe. The validation dataset from Tanzania consists of three rural areas, for which we obtained Bing images of sizes (in pixels) 79368192, 81928192 and 71683840, respectively. The validation dataset from Zimbabwe consists of four rural areas that were covered by Bing images of sizes 40963328, 40963584, 51204352 and 5120 4352 pixels, respectively.
3.2. Model setup and evaluation procedures
- Building probability CNN. For the CNN model that obtains the build-
- MRF graph. As mentioned in Section 2.2, we use groups of buildings
- Alignment with MRF. The alignment vectors
- Building generation by CNN. For the CNN model that detects build-
We evaluated the performance of the proposed method using the Precision, Recall and F-score metrics with a pixel-level evaluation of the predictions.
We compare the proposed method for alignment of OSM annotations (MRFGroups) with the original annotations (‘without alignment’) and the following competitors from the literature:
- , a deformable registration method trained using an un-
- , the fully convolutional CNN-based segmen-
In addition to the competitors from the literature, we report results obtained by our model in varying conditions:
- . When selecting the alignment vectors that maximize
- . When obtaining the alignment vectors that maximize the
- . When performing the alignment with the proposed
- . When obtaining the alignment vectors that minimize
- . When obtaining the alignment vectors that maximize the
4.1. Numerical results
Tables 1 and 2 present the performances and processing times of several alignment methods for the Tanzania and Zimbabwe evaluation datasets respectively. For the Tanzania dataset (Table 1), we can observe that the original misaligned annotations poorly match the actual building footprints visible in the image. All the alignment methods drastically improve the performance of the misaligned annotations. MRF-based methods show better performances than methods based only on correlation. This shows that adding the prior knowledge of smoothness of the alignment vectors helps to improve the results. We can also observe that the alignment methods based on groups of buildings are more effective and efficient than the ones based on individual buildings. For the case of the Zimbabwe dataset ( Table 2), the performances of the original misaligned annotations are considerably better than the ones of the Tanzania dataset. As
Table 1: Pixel-based performance of alignment correction methods for the Tanzania evaluation dataset.
Table 2: Pixel-based performance of alignment correction methods for the Zimbabwe evalua- tion dataset.
in the Tanzania dataset, all the alignment methods considerably improve the performances of the misaligned annotations and the proposed method based on MRF spatial logic applied on groups of buildings outperforms the other baseline alignment methods, as well as the state-of-art semantic segmentation approach
Table 3: Pixel-based and object-based performance of the removal and building addition methods for the Tanzania evaluation dataset.
Table 4: Pixel-based and object-based performance of the removal and building addition methods for the Zimbabwe evaluation dataset.
in terms of precision and recall.
Tables 3 and 4 show the performance of the proposed methods for the removal of incorrect annotations and the addition of new annotations in the two datasets. As a starting point, they use the proposed MRFGroups. In order to evaluate the performance of the methods at the object level we consider that a building is detected if its IoU (Intersection over Union) with the ground truth is greater than 0.5. This value corresponds to a misalignment of 2 pixels (60 cm) in both axes when considering the smallest shape (circle) in our dataset.
In the Tanzania dataset, the removal of incorrect annotations considerably improves the precision of the method while maintaining the recall. When the method that used shape priors for adding new buildings annotations is applied, the recall considerably increases. This is at the cost of a slight decrease in precision because of some false positive predictions. However, the gain in recall is larger in the pixel-level evaluation, which is reflected in the improvement of the F-score. Overall in the Zimbabwe dataset the results of the aligned polygons and the result of removing and adding new polygons to the aligned polygons are equivalent. This happens because most of the buildings in the imagery are already well detected and considerably well delineated by the aligned annotations. Thus, few candidates are removed and new building candidates, as predicted by the proposed CNN, are already at their pre-annotated locations. Therefore, very few new candidate buildings are added.
4.2. Analysis of shape priors
In Tables 3 and 4 we also compare our proposed methods with the fully convolutional semantic segmentation approach proposed in [5] (line ). As it can be observed, in both datasets the proposed methods achieve better performances than this baseline. Alternatively, one could also use a semantic segmentation method (e.g. [5]) to add new building footprints after running MRFGroups and removing incorrect footprints: this result is reported in the last line of both tables (see ). In this case, we observe similar numerical performances to our proposed method in terms of F-score. Our proposed method is more precise, while this baseline obtains higher recall values (possibly related to oversegmentation). However, our method has the advantage of returning an output that can be easily converted into vectorial data. As it can be observed in the visual comparisons in Section 4.3.2 (Figure 9), our method obtains building predictions with shapes that fit better to the ground truth, not oversegmenting. Also, in cases of objects with shared or very close boundaries, the buildings outlines are easily disentangled, while they cannot be recovered from the semantic segmentation results, since both objects are included in a single blob.
We also evaluate how accurate our method based on shape priors is in differ-entiating building shapes. To do so, we consider all the newly added buildings showing a considerable overlap (IoU > 0.3) with a building in the ground-truth map. Considering as classes the six basic primitive shapes, the predicted shapes obtains an accuracy of 90.0 %. If we consider as classes the 18 shapes (therefore shape and size of the object) an accuracy is 38.3 % is reached. Most common errors are cases where the correct shape is predicted, but not the correct size.
For the evaluation of the geometrical accuracy of the new buildings, we use the average symmetric surface distance metric (ASSD). This metric computes the average distance between all the pixels in the boundary of the predicted object to the closest pixel in the boundary of the ground-truth object. A perfect building prediction will obtain an ASSD value of 0 (the lower the value the better it is). We have computed this metric for all the building predictions that have some overlap with the ground-truth. The average ASSD value for the predictions of the proposed method is 2.54 in the Tanzania dataset, while the method that add buildings based on semantic segmentation obtains an average ASSD value of 2.56.
4.3. Visual comparisons
4.3.1. Alignment of footprints
Figure 5 presents five examples of groups of rural buildings from the Tanzania dataset. For each example, we show the image, the building probability maps obtained by the hypercolumn model, the original OSM annotations (in yellow) and the aligned annotations obtained by different methods (in other colors). For the proposed method, MRFGroups, only the alignment is performed and no removal / addition component is considered in the figure.
- Example 1 (first row). Figure 5c shows in green circles the aligned anno-
Figure 5: Examples of alignment results (the original misaligned annotations are presented in yellow) from the Tanzania dataset.
- Example 2 (second row). Figure 5g shows the alignment results obtained
- Example 3 (third row). Figure 5k presents the results obtained by the
- Example 4 (forth row). Figures 5o and 5p present the results obtained by
- Example 5 (fifth row). Figure 5t presents the result obtained by DeformableReg
Figure 6: Examples of alignment results in the Zimbabwe dataset using MRFGroups.
Although the proposed MRF based method is more robust to inaccurate building probability maps than the other alignment methods, the quality of the building probability map remains the main factor to compute accurate alignment vectors.
Figure 6 illustrates the alignment results for the proposed MRFGroups in three examples. In the first case, no alignment is necessary, and MRFGroups result is equivalent to the original labels. In the two other cases, MRFGroups aligns the buildings correctly, and the removal and addition of footprints is not necessary. This is in line with expectations from this dataset, as we observe that the Zimbabwe dataset has better quality OSM annotations, only requiring geometric alignment. Missing building annotations or incorrect annotations after alignment are rare. This is also reflected in Table 2, in which the alignment of the original annotations considerably improved the performance, but the removal and addition of building annotations did not improve the final performance.
4.3.2. Including footprint removals and additions
Figure 7 presents results of the methods for alignment (orange), removal of incorrect annotations (green) and addition of new annotations (blue) in the Tanzania and Zimbabwe datasets. For Tanzania dataset example, on the top row, an incomplete set of annotations (Figure 7b) is first geometrically aligned so that the large buildings correspond to structures in the image (Figure 7c); then, the small structure at the bottom is removed, since there is no evidence that a small building would be located there (Figure 7d). One could argue that the removed building corresponds to a small structure at the bottom, but given the relative configuration of the annotations, this is against the image evidence learned by the CNN model. Finally, the second CNN adding new footprints succeeds in adding the two missing large buildings in the right side (Figure 7e). For the example from the Zimbabwe dataset (Figure 7f), the original OSM annotations (Figure 7g) are already well aligned. As a consequence, the alignment correction (Figure 7h) and the removal of incorrect annotations (Figure 7i) do not change the location of the original annotations. However, two new footprints of missing buildings are correctly added using the second CNN (Figure 7j).
Figure 8 compares the results obtained by our proposed method (MRFGroups followed by the removal and addition of building annotations) with the result of a CNN-based method trained for building segmentation [5]. We can observe that, despite detecting most buildings, the prediction of the CNN segmentation model is not precise, containing several false positive pixels, while our proposed method obtains a better result, more coherent with the shapes of the buildings to be detected.
Figure 9 shows three examples of comparisons of the results of adding buildings using a semantic segmentation method [5] and our proposed method for adding building annotations, based on shape priors. The shape of the output of the semantic segmentation method can be very irregular, while our proposed methods obtain predictions that fits better to the ground truth (see examples 1 and 2). In some cases, the prediction of the semantic segmentation method can obtain higher values of IoU with the ground truth than our proposed method since it tends to predict more pixels as buildings (oversegmentation). However, it can also obtain some undesirable results like in Figure 9e. Overall, the proposed method leads to a more precise outlining of buildings, easily exportable to vector footprints, and also can disambiguate effectively with polygons with very close boundaries.
(a) Image (b) Original (c) MRFGroups (d) MRFGroups+ (e) MRFGroups+ annotations removal removal+addition
(f) Image (g) Original (h) MRFGroups (i) MRFGroups+ (j) MRFGroups+ annotations removal removal+addition
Figure 7: Results of our method (the original misaligned annotations are presented in yellow) for the Tanzania and Zimbabwe dataset.
(a) Image (b) Original OSM (c) Semantic (d) MRFGroups+ segmentation [5] removal+addition
Figure 8: Results of our method compared with semantic segmentation [5]: a) Imagery of groups of buildings b) Original OSM annotations (yellow circles) c) Results obtained by using a CNN model trained for building segmentation (orange circles) and d) Annotations, in blue circles, obtained using the propose method (MRF alignment followed by removal and addition
Figure 9: Visual comparison of two methods for adding new building annotations, after the alignment and removal of annotations. 1) Add new buildings using the semantic segmentation method proposed in [5] and 2) the proposed method based on shape priors.
We presented a methodology for correcting rural building annotations in OpenStreetMap. Our methodology consists of three steps: alignment of the original annotations, removal of incorrect annotations, and addition of new annotations of buildings that appear for the first time in the updated imagery. In order to solve the problem of misaligned OSM annotations, we proposed an MRF-based method that encodes the dependency of the alignment vectors of neighboring buildings and maximizes the correlation of aligned annotations and a building probability map learned by a fully convolutional neural network. We used the evidence provided by a building probability map to remove annotations of buildings that no longer exist in the updated imagery. In order to add new building annotations, we learn a second CNN model that predicts building annotations with predefined shapes candidates. We evaluated our methodology in a region of Tanzania that contains misaligned and incomplete/inaccurate annotations and in a region in Zimbabwe that contains mostly misaligned annotations. We observed that the alignment process drastically improves the accuracy of the annotations in the two evaluated datasets. We observed, specially in the Tanzania dataset, that the proposed method for the removal of annotations improves the precision of the annotations and the proposed method for the addition of new annotations considerably improves the recall of the annotations. The proposed methodology will be helpful to reduce the large human effort required to correct existing rural building OSM annotations. As future work, we plan to improve the building delineation results by combining building probability maps learned by CNNs, graph-based segmentation methods, and shape priors.
This research was funded by FAPESP (grant 2016/14760-5, 2017/10086-0 and 2014/12236-1), the CNPq (grant 302970/2014-2) and by the Swiss National Science Foundation (grant PP00P2-150593).
[10] H. Fan, A. Zipf, Q. Fu, Estimation of Building Types on OpenStreetMap
[11] P. Fleischmann, T. Pfister, M. Oswald, K. Berns, Using OpenStreetMap for
[12] Z. Wang, A. Zipf, Using OpenStreetMap Data to Generate Building Models
[13] S. Srivastava, S. Lobry, D. Tuia, J. E. Vargas-Mu˜noz, Land-use characteri-
[14] N. Audebert, B. L. Saux, S. Lef`evre, Joint learning from Earth Observation
[15] P. Kaiser, J. D. Wegner, A. Lucchi, M. Jaggi, T. Hofmann, K. Schindler,
[16] E. Maggiori, Y. Tarabalka, G. Charpiat, P. Alliez, Can semantic labeling
[17] A. Basiri, M. Jackson, P. Amirian, A. Pourabdollah, M. Sester, A. Winstan-
[18] P. Hashemi, R. A. Abbaspour, Assessment of Logical Consistency in Open-
[19] J. Estima, M. Painho, Exploratory analysis of OpenStreetMap for land use
[20] C. Barron, P. Neis, A. Zipf, A comprehensive framework for intrinsic Open-
[21] C. C. Fonte, L. Bastin, L. See, G. Foody, F. Lupia, Usability of VGI for
[22] E. Maggiori, G. Charpiat, Y. Tarabalka, P. Alliez, Recurrent neural net-
[23] J. Chen, A. Zipf, DeepVGI: Deep Learning with Volunteered Geographic
[24] J. Besag, On the statistical analysis of dirty pictures, Journal of the Royal
[25] B. Glocker, A. Sotiras, N. Komodakis, N. Paragios, Deformable medical
[26] D. Marcos, R. Hamid, D. Tuia, Geospatial correspondences for multimodal
[27] M. Vakalopoulou, K. Karantzalos, N. Komodakis, N. Paragios, Graph-
[28] C. A. Glasbey, An analysis of histogram-based thresholding algorithms,
[29] M. Volpi, D. Tuia, Dense semantic labeling of subdecimeter resolution im-
[30] B. Hariharan, P. Arbel´aez, R. Girshick, J. Malik, Hypercolumns for object
[31] D. Marcos, M. Volpi, B. Kellenberger, D. Tuia, Land cover mapping at
[32] J. E. Vargas-Mu˜noz, D. Marcos, S. Lobry, J. A. dos Santos, A. X.
[33] N. Otsu, A threshold selection method from gray-level histograms, IEEE
[34] S. U. Lee, S. Y. Chung, R. H. Park, A comparative performance study of
[35] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object
[36] B. D. de Vos, F. F. Berendsen, M. A. Viergever, M. Staring, I. Iˇsgum, End-