The primary challenge for any autonomous system operating in realistic, rather unconstrained scenarios is to manage the complexity and uncertainty of the real world. In robotics this holds, as soon as the robots leave the carefully engineered production environments in which they have been so successful in the past decades.
The typically high degree of uncertainty in real-world environments, that makes a robot’s life so hard, comes from the following sources: the limited measurement accuracy and other limitations of the system’s sensors, modeling errors and purposefully made simplifications in the system’s internal representations, unobserved environment dynamics and random effects in action execution. While it is unclear how exactly humans and other higher animals master these problems, it seems evident, that abstraction plays an important role. The
use of abstract concepts allows to define the system behavior on higher levels and independently of the exact setting of the environment and the exact sensor readings.
In this study we address the first two of the problems mentioned above, in that we provide the system with a limited capability of abstraction allowing for a higher-level understanding of its environment. In addition, we directly address the uncertainty related issues by strictly following a probabilistic approach that explicitly models and keeps track of the uncertainty associated with any variables of the problem.
As a by-product, the system’s capability to use prede-fined concepts will ease cooperation in mixed humanrobot tasks, since a common language used by both the human and the robot is a precondition for efficient exchange of information between both parties. This is however not addressed in this paper.
To illustrate the general idea, we use an example from an indoor navigation scenario, namely the semantic analysis of the commonly used occupancy grid maps. The objective of the presented method is to provide an abstracted, semantically annotated but still probabilistic map of the indoor environment. For this purpose, we
Figure 1: A typical occupancy grid map of an indoor environment, obtained from the Robotics Data Set Repository (Radish) [3].
first use a robot – equipped with a 2D laser scanner – to build an occupancy grid of the environment using a standard SLAM method [1] and then employ the procedure described in the reminder of this document to extract the semantic information. To do this, we use a Markov Chain Monte Carlo (MCMC) based sampling technique [2] to generate samples from the probability density function capturing the distribution of probable worlds the robot could encounter. The maximum posterior solution could then be used as an estimate of what the world semantically looks like.
Most of todays’ mapping approaches aim to construct a globally consistent, metric map of the robot’s operating environments. See Fig. 1 for a typical result. Such maps enable the robot to localize itself with respect to the environment and thus determine its global pose in an assumed flat world with an accuracy of typically a few centimeters in translation and below one degree in rotation. Based on this capability, the robot can also plan a path and navigate towards a goal, that will also be specified by its metric position in the global map reference frame. However, the robot does not understand its environment in terms of typical semantic concepts like rooms, corridors or functionally enriched concepts like a kitchen or living room. Furthermore, the robot does not understand relations like adjacency, connectivity via doors, or properties like rectangularity that – if known to be relevant to the given environment – could help to build the maps in the first place.
Our work aims at extracting such semantic models of the environment from the more or less raw sensor data. In the context of this paper, we assume, that a map, like the one depicted in Fig. 1, was already constructed using one of the proven methods available for this purpose [1].
Assigning semantics to spatial maps in robotics has not been looked at as intensely as the metric or topological mapping. Still, several important contributions to the field have already been made. They can be clustered into two major groups. The first group consists of methods based on place labeling, some notable examples are [4, 5, 6, 7, 8, 9, 10, 11]. These methods assign semantic labels to places or regions of the accessible work space of the robot. They are very much in the tradition of [12] or [13].
A second group is formed by approaches assigning semantic labels to parts or objects of the perceived structure of the environment, like traversable terrain, trees or similar structures in outdoor environments or walls, ceilings, and doors in indoor settings [14, 15, 16, 17, 18, 19, 20, 21, 22].
In addition to the two groups mentioned above, there are also other approaches. The approach of [23] semantically models places via objects. In [24], a method is proposed, which explores the environment in a room-by-room style and fits the explored map part into polygons. Tapus and Siegwart [25] build a map of the environment based on so called fingerprints of explored places. Lim et. al. [26] introduce an ontology-based method that integrates low-level data with high-level constraints to represent the knowledge as a semantic network.
Different from those methods mentioned above, we aim to construct a probabilistic generative model of the world around the robot, that is essentially based on abstract semantic concepts but at the same time allows to predict the continuous percepts that the robot obtains via its noisy sensors. This abstract model has a form similar to a scene graph, a structure which is widely used in computer graphics. The scene graph (see Fig. 2 c) in our case consists of rooms and doorways connecting the rooms and can be visualized as a classical floor plan(see Fig. 2 b).
The scene graph and thus also the semantically annotated world state is denoted by a vector of hidden parameters W specifying the world state, that generated the occupancy map M we are currently looking at. In the Bayesian framework we can use a maximum posterior approach to infer the most probable state Wfrom the space of probable worlds
given the map M.
with
Figure 2: a) A simplified occupancy grid map: Unexplained area is drawn in grey, free space is drawn in white. Occupied area is drawn in black. b) A possible floor plan represented as a scene graph (W): The world is divided into four rooms and the corresponding unexplained area. The connectivity is given by the wall types (dwall: a wall that has one or more doors on it; nwall: a wall that separates two rooms but does not contain a door on it; bwall: a wall that just serves as boundary). A partially dotted line in light-gray indicates a dwall, where the dotted part is the door, and the solid line part is the rest of the wall. A light-gray line (without dots) shows one nwall, and black stands for a bwall. c) The semantic description of the world in form of the scene graph: Directed links connect nodes. The dashed lines represent connectivity. Like room 4, each room has three child nodes: walls, free space, and doors. Note that the lowest level of node in the tree structure is the grid cell that belongs to walls, free space and doors.
Here p(W|M) is the posterior distribution of W given the known map M and p(W) is the prior specifying, which worlds W are possible at all. p(M|W) is the likelihood function describing how probable the observed map M is, given the different probable worlds represented by a parameter vector W. The actual semantic model is represented in the structure of the parameter vector W, while semantically relevant constraints go into the prior p(W).
In our case W contains the scene graph, i.e. the parameters of a floor plan: number of rooms, their dimensions and connectivity, the location of doorways. Here, rooms are defined as rectangular space that is enclosed by four walls, and walls are line segments defined by two end points. Connectivity indicates the spatial relationship of different rooms, and it is expressed in the types of walls: wall with door (dwall), neighbor wall (nwall) and boundary wall (bwall). A dwall is a wall that has one or more doors on it. The term door acutally refers to a door opening in a wall, which is a line segment made up by free space that can be passed so as to enter another room. A nwall separates two rooms but does not contain a door on it, whereas a bwall is a wall that just serves as the boundary of the world.
Certain a-priori assumptions about some properties of the structured world are made based on context knowledge as follows:
1) a room has four walls and possesses a rectangular shape. 2) a room has at least one door, and a door is placed
on a wall. 3) each cell in the map should only belong to one room.
These a-priori constraints are enforced by means of the prior p(W) in our generative model (2). The prior penalizes worlds that are not fully compliant with the above assumptions:
where and
are the corresponding penalization terms for the point 1), 2) and 3) of the prior information respectively, and they are defined as follows:
where and
are penalization terms with
(0, 1).
is the number of pairs of adjacent walls whose included angle is not 90 degree (
tolerance). c(x, y) denotes one grid cell in the map M.
(c(x, y)) indicates the number of rooms, to which c(x, y) belongs.
is a cell-wise penalization of the overlap between different rooms, i.e. if there is no overlap in one cell c(x, y), then
(c(x, y)) is equal to 0 or 1, in which case
(c(x, y)) is 0 (no penalization in cell c(x, y)). Otherwise, if
(c(x, y)) is bigger than 1, which means the cell c(x, y) belongs to more than one room, then
(c(x, y)) is bigger than 0 (penalization in cell c(x, y)). For the experiments shown in Section 5, the penalization terms are set as follows:
0
6.
For our generative model, we need to specify the likelihood function p(M|W) additionally. Since M is represented by an occupancy grid with statistically independent grid cells c M, we only need to come up with a model p(c|W) for all cells at their locations (x, y) in the map M:
For our model p(c(x, y)|W), we first discretize the cell state M(x, y) by classifying the occupancy values into three classes “occupied=2”, “unexplained=1” and “free=0” so as to generate the classified map CM(x, y) according to:
where ho and hu are the intensity thresholds for occupied and unexplained grid cells. Based on our world model W we can also predict expected cell states CW(x, y) accordingly:
where S w, S u and S f are the set of all the wall cells, unknown cells and free space cells in the world W respectively. p(c(x, y)|W) can then be represented in the form of a lookup-table.
In principle the likelihood p(c(x, y)|W) plays the role of a sensor model. In our case it captures the quality of the original mapping algorithm producing the grid map (including the sensor models for the sensors used during the SLAM process), and could be learned from labeled training data. However, for the experiments described in section 5 we used the empirically determined values given in Table 1.
For solving equation (1) we need to efficiently search the large and complexly structured solution space . Here we adopt the approach of [2], who propose a data driven Markov chain Monte Carlo (MCMC) technique for this purpose. The basic idea is to construct a Markov Chain that generates samples Wi from the solution space
according to the distribution p(W|M) after some initial burn-in time. One popular approach to construct such a Markov chain is the Metropolis-Hastings (MH) algorithm [27, 28]. In MCMC techniques the Markov chain is constructed by sequentially executing state transitions (in our case from a given world state W to another state W
) according to a transition distribution
(W
W) of the sub-kernels. An example of
(W
W) is given in Table 2. In order for the chain to converge to a given distribution, it has to be reversible and ergodic [27]. The MH algorithm achieves this by generating new samples in three steps. First a transition is proposed according to
(W
W), subsequently a new sam-
Table 1: The lookup table for p(c(x, y)|W).
ple Wis generated by a proposal distribution Q(W
W), and then it is accepted with the following probability:
The resulting Markov chain can be shown to converge to p(W|M). However the selection of the proposal distribution is crucial for the convergence rate. Here, we follow the approach of [2] to propose state transitions for the Markov chain using discriminative methods for the bottom-up detection of relevant environmental features (e.g. walls, doorways) and constructing the proposals based on these detection results. [2] created the term data driven MCMC for this procedure.
4.1. MCMC Kernels
In order to design the Markov chain in form of the Metropolis-Hastings algorithm, the kernels that modify the structure of the world are arranged as reversible pairs. Currently, we use four pairs of kernels, and these include:
• Kernel pair 1: ADD or REMOVE one room.
– ADD: draw one new room from certain candidates, then try to add this room to the world.
– REMOVE: try to cancel one existing room from the world.
• Kernel pair 2: SPLIT one room or MERGE two rooms.
– SPLIT: try to decompose one existing room into two rooms.
– MERGE: try to combine two existing rooms, and generate one new room out of them.
• Kernel pair 3: SHRINK or DILATE one room.
– SHRINK: try to move one wall of one room along certain orientation, so that the room becomes smaller.
– DILATE: similarly to SHRINK, move one wall of one room, so that the room becomes bigger.
• Kernel pair 4: ALLOCATE or DELETE one door
– ALLOCATE: draw one door from the door candidates that are provided by door detector, then try to assign it to two existing rooms.
– DELETE: cancel one assigned door.
Fig. 3 shows an example of the four reversible MCMC kernel pairs, using the same simplified occupancy grid map as in Fig. 2. The world W can transit to W, W
, W
and W
by applying the sub-kernel REMOVE, MERGE, SHRINK and DELETE, respectively. By contrast, the world W
, W
, W
and W
can also transit back to W using corresponding reverse sub-kernels.
4.2. Discriminative Generation of Chain Transition Proposals
As mentioned above, we use discriminative methods to propose candidates for MCMC kernels, so as to accelerate the convergence of the Markov chain. In the following, we define and explain the discriminative methods used with respect to corresponding MCMC kernels.
4.2.1. ADD
Currently, two room detectors are adopted to propose room candidates for ADD: a wall based room (WBR) detector and a free space room (FSR) detector. In the context of topological navigation several approached for the online detection of rooms have been proposed [29, 30]. Here, we detect the rooms offline based on the input map.
WBR detector. This room detector makes use of the well known Hough Transform [31] for edge based wall detection and generates room candidates using the detected line segments according to the following procedure: We extend the detected line segments so as to divide the map into sub-areas, then these sub-areas are
Figure 3: Reversible MCMC kernel pairs: ADD/REMOVE, SPLIT/MERGE, SHRINK/DILATE and ALLOCATE/DELETE.
recombined to give new rooms which obey our prior information. One example of the WBR room detector is demonstrated in Fig. 4. Here, the map is divided into six sub-areas: a1, a2, a3, a4, a5 and a6, and these sub-areas are recombined to generate 18 room candidates, which are: a1, a2, a3, a4, a5, a6, a1a2, a3a4, a5a6, a1a3, a3a5, a2a4, a4a6, a1a3a5, a2a4a6, a1a2a3a4, a3a4a5a6 and a1a2a3a4a5a6. We sample from the set of all candidate rooms in a resampling style [32]. Each of the generated room candidates are weighted according to how well their walls match the observations provided by the occupancy grid map. The weight of a room is defined as the lowest wall weight
among its four walls, where j
r1, r2, r3, r4}, indexes the wall, with ri
, indicating the ith wall of room r:
The wall weight is calculated as:
where l(wj) indicates the length of wall wj and can be computed from the coordinates of its two end points (xw jyw j
(xw j
yw j
):
The term n(wj) counts the number of wall cells that match with the map:
Having obtained the weights of all the room candidates, we implement Q(WW) by sampling from the candidates according to their cumulative weights Ar. First, we normalize their weights
:
where B indicates the set of all the room candidates generated by the WBR detector. Then, we calculate the cumulative weights Ar for room r:
Finally, we can draw a room candidate n out of B, by generating a random number k[0, 1),
FSR detector. Sometimes rooms will be missed by the edge-based procedure mentioned above. This is often the case for rooms only partially explored during grid map construction or when walls have been obstructed by furniture and thus not have been perceived by the laser scanner. We therefore use an alternative method for room detection. This detector works on the basis of connected-components analysis and is referred to as free space room (FSR) detector. If there are still regions of the map which are not explained by the world after many MCMC steps (4000 steps in the experiments), then we try to find them using the FSR detector as follows: 1) make a copy of the classified map, which we denote as C; 2) cancel the regions that are already ex- plained by the current semantic world W from C
, we denote the rest of C
as C
; 3) detect unexplained re- gions in C
based on connected-components analysis [33], and generate room candidates out of them. Fig. 5 shows one example of the FSR detector. Here, the world W does not cover the shaded area in the map, then we detect it using the FSR detector and generate room candidates out of it. Having generated the room candidates, we use the same sampling technique as done with the WBR detector to propose room candidates ((11) to (18)). The number of MCMC steps after which the FSR detector is activated should be big enough so that each of the unexplained regions is relatively small and can be easily used to generate new room candidates.
4.2.2. SPLIT
For SPLIT we again use the Hough transformation based line detection to propose splitting options. Hough line detection is applied within rooms that already exist in the world W (member rooms of W). First, a room r is randomly chosen according to a uniform distribution, which means every room contained in the current world has the same chance to be chosen. Then, the Hough line detector is applied within the room r to detect possible room splits. Let Er denote the set of the detected line segments within room r. Each detected line segment eEr is weighted, using its length l(e):
where the length l(e) is similarly calculated as done in (13). Then we normalize the weights and build the cumulative distribution of Er. Furthermore, we draw one line segment out of Er, as done with the WBR detector ((16) to (18)).
Once a line segment is chosen, it is extended, so that it intersects with the walls of the room. Currently, only the case is accepted that two opposite walls of the room are intersected. The case that two neighbor walls are intersected is neglected. With the extended line segment, we propose to split the room into two rooms. The MH algorithm then decides whether this action is accepted. A typical example of the SPLIT sub-kernel is shown in Fig. 6.
4.2.3. MERGE
The sub-kernel MERGE is the inverse of SPLIT. It tries to combine two member rooms of the current world W, so as to generate a new room, then the MH algorithm decides whether the proposed new room is accepted. To do this, the first room r is drawn from the set of all member rooms RW of world W according to a uniform distribution, which means that each member room has the same possibility to be chosen. Additionally, a second room s needs to be selected from the rest of the member rooms, s RW \ r. For sampling s, we define a new weight ar(s), which is the reciprocal of the distance d(cr, cs) between the center point cr of room r and the center point cs of room s:
where (cr.x, cr.y) and (cs.x, cs.y) are the grid cell index of the two center points. The weight ar(s) is calculated as follows:
Subsequently, we normalize the weights ar(s), calculate the cumulative probability and draw the second
Figure 4: The WBR detector. a) A simplified occupancy grid map. b) The line segments detected by the Hough line detection. c) Divide the map into sub-areas a1, a2, a3, a4, a5 and a6, by extending the detected lines. Generate room candidates out of these sub-areas. The detected line segments are shown in light-gray.
Figure 5: FSR room detector. a) A simplified occupancy grid map. b) The current world W: the shaded area is not explained. c) C: cancel the already explained regions from C
and detect unexplained regions based on connected-component analysis, then generate room candidates out of these detected regions. In this example, a room candidate is generated from the shaded area.
room, as done in (16) to (18). Once the two rooms are obtained, we try to combine them into one room. The underlying idea for using ar(s) in the sampling is that the closer two rooms are spatially located, the more likely they can be combined. Fig. 7 illuminates an example of MERGE.
4.2.4. SHRINK and DILATE
The kernel pair SHRINK/DILATE tries to move a wall wj of a member room r in the current world W, so that this room can better match the map. Here, j{r1, r2, r3, r4}, indexes the wall, with ri
, indicating the ith wall of room r. For selecting the room r from the set of all member rooms RW, we define a new weight br:
where is the room weight defined in (11). hb is a predefined threshold for the weight. Using br, a room is drawn according to (16) to (18). The reason for this is that the worse a room matches the map, the more likely it should be changed by SHRINK/DILATE.
Once the room is selected, one wall w j needs to be drawn from its four walls. Following the same idea, we define a new weight vw j for sampling the wall:
where is the wall weight defined in (12), and hv is a predefined threshold. Again, the wall is drawn according to vw j, as done in (16) to (18). After the wall is selected, we try to shift it parallel to its original orientation using a bias that is drawn from a zero-mean Gaussian distribution. In principle, the algebraic sign of the selected bias decides whether a SHRINK or a DILATE is proposed, e.g. if a positive sign proposes a SHRINK, then a negative sign will propose a DILATE. In general, SHRINK and DILATE sub-kernel have both 50% chance to be proposed. An example of the SHRINK/DILATE kernel pair is shown in Fig. 8.
Figure 6: SPLIT sub-kernel. a) The current world W. b) The room 1 (light-gray dashed rectangle) is chosen for SPLIT. Two line segments are detected (black thick lines) as splitting possibilities. One of them is selected and extended to split the room (the black thin line). c) After the accepted SPLIT action, the new world Wis created.
Figure 7: MERGE sub-kernel. a) The current world W. b) The room 6 is selected as the first room, the room 7 is selected as the second room. Here, room 7 and room 5 have better chance to be chosen as the second room than room 1, 2, 3 and 4, because room 5 and 7 are closer to room 6. The four vertices of the new room are surrounded by gray circles. c) After the accepted MERGE action, the world Wis generated.
4.2.5. ALLOCATE
A door detector which is based on connected-components analysis [33] proposes door candidates for the sub-kernel ALLOCATE. We draw one door candidate from the set of all candidates according to their weights. Here, the weight of a door g is similar to the weight of walls
that is defined in (12):
where l(g) is calculated the same as in (13), and n(g) is computed as follows:
where
Using the weight , one door candidate is drawn from the set of all detected candidates, as done in (16) to (18). Then, the MH algorithm decides whether this door will be accepted. In the following, we detail how to detect door candidates.
According to the indisputable fact that doors must be located on walls in the real world, we search doors along walls in the structured world in the following steps:
1) Expand each wall in its perpendicular direction, so that a rectangular area is created out of each wall. Note that each wall should be shortened before the expansion, so that the extended area does not overlap each other within one room.
2) Detect the overlap of these rectangles using connected-components analysis, because the overlap area indicates on which wall the potential door candidates can be found. Localize the wall part that has caused the overlap.
3) Divide the localized wall part averagely into several small segments, so that each segment is
Figure 8: SHRINK/DILATE kernel pair. a) The current world W. b) The room 5 is selected. Here, the two gray walls are more likely to be chosen, because they match the map worse than the other two black walls. We assume that the left gray wall is selected for SHRINK/DILATE. The light-gray arrow points to one DILATE possibility, whereas the black arrow points to a SHRINK possibility. c) After the accepted DILATE action, the world Wis generated.
equally long. Because each of these segments could be a part of a door, we weight them according to (24) and try to combine the verified segments to build a door (we define a segment as a verified segment, if its weight is bigger than certain weight threshold).
4) Combine the verified segments on each wall, if the distance between them is lower than certain distance threshold. Find the corresponding part of the combined segments on both walls and use them as door candidates.
The above process is demonstrated in Fig. 9.
After a successful ALLOCATE action, one door is assigned to two rooms, and each assigned door contains the following information: ID of the two rooms, ID of the walls on which the door is located, grid cell indices of the door.
4.2.6. REMOVE and DELETE
The sub-kernel REMOVE and DELETE have similar functionality, which is to cancel one existing member room and one assigned door respectively. There are no special discriminative methods used for these two sub-kernels. They just draw one member from the corresponding set (existing rooms or assigned doors) and propose to cancel this member, then the MH algorithm decides whether this proposal is accepted. Following the idea that the worse a member matches the map, the more likely it should be canceled, we use the weight br defined in (22) for room sampling. Similarly, we define a new weight zg for door sampling:
where is the door weight defined in (24), and hg is a predefined threshold.
4.3. Proposal Probability Q(W′|W) and Q(W|W′)
The proposal probability Q(WW) describes how probable it is that the world W transits to the world W
, and by contrast, Q(W
) is the probability for transiting back to the world W, given the world W
. Intuitively, Q(W
W) is the product of the normalized weight of the selected elements (room candidate, splitting line, wall etc.) in the corresponding MC sub-kernel defined in the previous section. For instance, in the ADD or REMOVE sub-kernel, Q(W
W) is equal to the corresponding normalized weight of the selected room candidate or that of the selected member room. Q(W
W) in ALLOCATE and DELETE can be calculated similarly to that in ADD and REMOVE respectively. In SPLIT, Q(W
W) is the product of the corresponding normalized weight of the selected member room and that of the selected splitting line. In SHRINK/DILATE, Q(W
W) is product of three terms: the corresponding normalized weight of the member room, that of the selected wall and that of the generated Gaussian bias. Similarly, Q(W
W) of MERGE is calculated as the product of the corresponding normalized weight of the first room and that of the second room.
Compared with Q(WW), the calculation of Q(W
) is less intuitive, because the back transition is virtual and must be defined. For ADD, Q(W
) should perform the same function as the sub-kernel REMOVE, namely, the world W
transits back to the world W by canceling the room that is added in the transition from W to W
, thus Q(W
) of ADD should be the normalized weight of the added room in the sub-kernel
Figure 9: ALLOCATE sub-kernel. a) The occupancy grid map. b) The current world W with no assigned doors. c) Step 1: each wall is shortened and expanded. The expanded areas are marked by light-gray dashed rectangles. The rectangles filled with gray show the overlap areas. d) Step 2: the two pairs of walls that have caused the overlap areas are localized. Note that the black line indicates a pair of overlapping walls, and the other pair of walls is shown in light-gray. e) Step 3: divide the two pairs of walls into six segments, using the black lines. Weight each segment according to how well they match the map. Verified segments are shown in dashed lines, other segments are drawn in solid lines. f) Step 4: combine those verified segments, between which the distance is under certain threshold. Find the corresponding part of the combined segments and use them as door candidates (light-gray lines). Note, all the segments on one wall are detected as verified segment in step 3, which means this whole wall forms a big combined segment, but the final door candidate must correspond to the door candidate on the other wall. That is the reason why only a small door candidate is detected on this wall pair.
REMOVE. Q(W) of REMOVE can also be calculated as the normalized weight of the room, that is canceled in the transition from W to W
, in the sub-kernel ADD. Analogously, Q(W
) of SHRINK, DILATE, MERGE, SPLIT, ALLOCATE and DELETE can be calculated in a style similar to Q(W
W) in their corresponding reverse sub-kernels. In addition, the SHRINK/DILATE pair just tries to move one wall of the selected room using a relatively small bias, thus the resulting world W
is similar to W. For computational simplicity, we assume that Q(W
W) and Q(W
) are equal in the SHRINK/DILATE pair.
We apply our algorithm to several occupancy grid maps. The selection probabilities of the MC sub-kernels are listed in Table 2. Here, the selection probabilities depend partially on the iteration index . At the beginning (
1000), the world W does not contain much information about the map, so we mainly apply ADD to propose new rooms into the world, using the WBR detector. For
1000, the selection probability of ADD is set to be very low (0.05), because most part of the map is already explained during the initial exploration (
1000). In addition, we activate SPLIT, MERGE, SHRINK and DILATE to change the form of the member rooms of the world. For
4000, we activate ALLOCATE and DELETE, so that the connectivity information is explored and attached to the world. Moreover, the FSR detector is also activated for
4000 to detect left-over free space regions. Table 2 effectively implements a heuristic scheduling policy.
Fig. 10 depicts the process of Markov chain convergence by showing the evolution of the log posterior log(P(W|M)) along the development of the Markov chain on the left side. In an initial burn-in process the chain quickly approaches its target distribution P(W|M). This is indicated by the rapid increase of the posterior in the beginning. We can also see the discontinuities at
Table 2: Transition probabilities ) of MC sub-kernels.
Figure 10: The typical development of the posterior probability P(W|M) (left) for the input map shown in Fig. 11 and the acceptance rate of the proposed state transitions (right) along the Markov chain development in terms of iterations.
the iteration 1000 and 4000 which show the effect of the scheduling policy. The jump at iteration 1000 is a consequence of the activation of new transition kernels that greatly improve the system’s capability to structurally adapt the world state W to the observations in the map. After the initial phase, the chain samples from P(W|M) and produces samples that are slight variations of the world W and do not significantly improve the situation any more.
This convergence process can also be seen by looking at the development of the acceptance rate of the Metropolis-Hastings algorithm (see Fig. 10, right). In the early phases, the acceptance rate is comparatively high, which means that most of the transitions proposed by Q(WW) are accepted, since they correspond to a significantly improved explanation of the map M by the model W
. Towards the end, the acceptance rate stabilizes on a low level.
Fig. 11 shows a typical result of the overall process. Here, part a) shows an original input occupancy map M [3]. Part b) shows the classified map CM(x, y) that is defined in (8), with black, gray and white indicating occupied, unexplained and free cells respectively. Part c) visualizes the world state W representing our structured semantic model. Here black, gray, white and light-gray show the wall, unknown, free and door way cells respectively. In part d), walls (gray) and doors (light-gray) of the world W are directly plotted onto the original input map M, so as to give a more intuitive comparison.
Since we effectively produce samples from P(W|M) representing the distribution of probable worlds W given the observation M, different samples represent different explanations of the data using the modeling structures available for constructing W. In our case, these are rectangular rooms and doorways. For demonstration purposes we purposefully chose a map, that is not fully compliant with these assumptions. Therefore various alternative explanations should produce equally good results.
Fig. 12 and Fig. 13 show two high-likelihood samples drawn from P(W|M), i.e. two alternative explanations of the input map used in Fig. 11. Three such samples are compared in Fig. 14, where the main differences are pointed out by arrows. A human observer, using a more complex understanding of typical architecture and also of furniture that is currently not modeled, would proba-
Figure 11: Analysis of the “ubremen-cartesium” dataset [3]. a) The occupancy grid map M. b) The classified map CM(x, y) with three intensity values (black=wall, grey=unexplained, white=free). c) The analyzed world W (black=wall, gray=unknown, white=free, light-gray=door). d) Walls (gray) and doors (light-gray) of the world W drawn into the original map M.
bly select the model in Fig. 13.
A result for a different input map [3] is illustrated in Fig. 15. Compared with the map used in Fig. 11, this map is relatively simple.
As before Fig. 16 shows the progression of chain convergence in terms of log-posterior and acceptance rates. Again, we can clearly identify the burn-in phase and the effects of our scheduling policy at iterations 1000 and 4000.
The computational cost strongly depends on the size of the occupancy grid map, and the major part of the computation is spent in the evaluation of the generative model. The computation speed of analyzing the map in Fig. 11 (size:1237672) is around 30 iterations per second (ips), and in general it takes about 10000 to 20000 iterations, until the Markov chain reaches a good state, so the computation time on the current PC is around 5 to 10 minutes. By contrast, the map used in Fig. 15 is much smaller (size:556
322), and the same computer reaches around 140 ips, which leads to an overall computation time of 1 to 2 minutes. Currently, we use a single-threaded implementation, where at each iteration only one sub-kernel is tested for the sampling. One of the most important features of the Markov chain is that the current state is only dependent on the previous one, therefore, it is theoretically possible to do multiple tests using different sub-kernels at each iteration, then only the successful test results are saved for the sampling. Using today’s powerful off-the-shelf multi-core CPUs, this idea can be easily realized and should lead to a much less computation time.
This paper proposes a new approach for automatically extracting semantic information from more or less preprocessed sensor data. We propose to do this by means of a probabilistic generative model and MCMCbased reasoning techniques. Our work differs from previous semantic mapping approaches, that mostly use various classification methods in a bottom-up fashion to label either spatial regions or places based on context or that assign semantic labels directly to portions of the observations. Instead we construct an abstracted semantic and top-down representation of the domain under the consideration: a classical indoor environment consisting of several rooms, that are connected by doorways.
We use Bayesian reasoning to build this semantic map, so that it is aligned with the preprocessed sensor observations, that a robot made during an environment exploration and mapping stage. This introduces a bottom-up path into the approach and employs data driven discriminative environment feature detectors to analyze the continuous noisy sensor observations. The semantic environment model that we generate, is structured similarly to a scene graph and is perfectly suited
Figure 12: Alternative explanation 1 of the “ubremen-cartesium” dataset [3].
Figure 13: Alternative explanation 2 of the “ubremen-cartesium” dataset [3].
for any higher level reasoning and communication purposes.
Currently, we assume rooms with a rectangular shape, which however does not imply that our approach is restricted to this room type only. The general idea behind this is to demonstrate the exploitation of abstract (uncertain) rules on how the environment might be structured. Adhering to these rules helps the robot to interpret the noisy sensor data more correctly. In fact, other room types can be introduced to the approach in that we update the prior, add discriminative methods for proposing rooms of other types and implement functionalities for carrying out new geometrical operations (e.g. for SHRINK/DILATE and SPLIT/MERGE). However, the general approach will be the same.
While we currently generate representations that more or less resemble a classical floor plan (including semantics however), the extension of our work to more functionally enhanced representations (e.g. differentiating several room types based on the context, adding other types of concepts like general objects or furniture) will be pursued in the future. It is also straight forward to extend the concept towards 3D environment representations. A second line of research will address the integration of this type of semantic knowledge into the perception procedures at the run time of the robot.
This work is accomplished with the support of the Technische Universit¨at M¨unchen - Institute for Advanced Study, funded by the German Excellence Initiative.
We thank the anonymous reviewers for their valuable comments that were greatly helpful for improving the quality of the paper.
The “ubremen-cartesium” dataset and the “albert-b-laser” dataset used in the experiments were obtained from the Robotics Data Set Repository (Radish) [3]. We thank Cyrill Stachniss for providing these two datasets.
[1] G. Grisetti, C. Stachniss, W. Burgard, Improved techniques for grid mapping with rao-blackwellized particle filters, IEEE Transactions on Robotics 23 (1) (2007) 34–46.
[2] S. Zhu, R. Zhang, Z. Tu, Integrating top-down/bottom-up for object recognition by data driven markov chain monte carlo, in: IEEE Computer Vision and Pattern Recognition (CVPR), 2000.
[3] A. Howard, N. Roy, The robotics data set repository (radish) (2003). URL http://radish.sourceforge.net/
[4] D. Wolf, G. Sukhatme, Semantic mapping using mobile robots, IEEE Transactions on Robotics 24 (2) (2008) 245–258.
Figure 14: Three explanations of the “ubremen-cartesium” dataset [3].
[5] A. Pronobis, P. Jensfelt, K. Sj¨o¨o, H. Zender, G. M. Kruijff, O. M. Mozos, W. Burgard, Semantic modelling of space, in: Cognitive Systems, Vol. 8 of Cognitive Systems Monographs, Springer Berlin Heidelberg, 2010, pp. 165–221. [6] S. Friedman, H. Pasula, D. Fox, Voronoi random fields: Extract- ing the topological structure of indoor environments via place labeling, in: Proc. of the International Joint Conference on Arti-ficial Intelligence (IJCAI), 2007. [7] N. Goerke, S. Braun, Building semantic annotated maps by mo- bile robots, in: Proceedings of the Conference Towards Autonomous Robotic Systems, Londonderry, UK, 2009. [8] O. Mozos, W. Burgard, Supervised learning of topological maps using semantic information extracted from range data, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2006, pp. 2772–2777. [9] O. Mozos, R. Triebel, P. Jensfelt, A. Rottmann, W. Burgard, Supervised semantic labeling of places using information extracted from sensor data, Robotics and Autonomous Systems 55 (5) (2007) 391–402. [10] P. Viswanathan, D. Meger, T. Southey, J. Little, A. Mack- worth, Automated spatial-semantic modeling with applications to place labeling and informed search, in: Canadian Conference on Computer and Robot Vision, 2009, pp. 284 –291. [11] C. Nieto-Granda, J. Rogers, A. Trevor, H. Christensen, Semantic map partitioning in indoor environments using regional analysis, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp. 1451 –1456. [12] S. Thrun, A. B¨ucken, Integrating grid-based and topological maps for mobile robot navigation, in: Proceedings of the AAAI Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, 1996. [13] A. Kurz, Constructing maps for mobile robot navigation based on ultrasonic range data, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 26 (2) (1996) 233–242. [14] B. Limketkai, L. Liao, D. Fox, Relational object maps for mobile robots, in: International Joint Conference on Artificial Intelligence, Vol. 19, 2005, p. 1471.
[15] B. Douillard, D. Fox, F. Ramos, Laser and vision based outdoor object mapping, in: Proc. of Robotics: Science and Systems (RSS), 2008, pp. 9–16.
[16] A. N¨uchter, J. Hertzberg, Towards semantic maps for mobile robots, Robotics and Autonomous Systems 56 (11) (2008) 915– 926.
[17] J. Tong, D. Chen, Y. Zhuang, W. Wang, Mobile robot indoor se- mantic mapping using 3d laser scanning and monocular vision, in: 8th World Congress on Intelligent Control and Automation (WCICA), 2010, pp. 1212 –1217.
[18] T. Wang, Q. Chen, Object semantic map representation for in- door mobile robots, in: International Conference on System Science and Engineering (ICSSE), 2011, pp. 309 –313.
[19] A. Krishnan, K. Krishna, A visual exploration algorithm us- ing semantic cues that constructs image based hybrid maps, in: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2010, pp. 1316 –1321.
[20] M. Persson, T. Duckett, C. Valgren, A. Lilienthal, Probabilis- tic semantic mapping with a virtual sensor for building/nature detection, in: International Symposium on Computational Intelligence in Robotics and Automation, 2007, pp. 236 –242.
[21] I. Jebari, S. Bazeille, E. Battesti, H. Tekaya, M. Klein, A. Tapus, D. Filliat, C. Meyer, Sio-Hoi, Ieng, R. Benosman, E. Cizeron, J.-C. Mamanna, B. Pothier, Multi-sensor semantic mapping and exploration of indoor environments, in: IEEE Conference on Technologies for Practical Robot Applications, 2011, pp. 151 –156.
[22] C. Case, B. Suresh, A. Coates, A. Y. Ng, Autonomous sign read- ing for semantic mapping, in: IEEE International Conference on Robotics and Automation (ICRA), 2011, pp. 3297 –3303.
[23] A. Ranganathan, F. Dellaert, Semantic modeling of places using objects, in: Robotics: Science and Systems, 2007.
[24] K. Laviers, G. Peterson, Cognitive robot mapping with poly- lines and an absolute space representation, in: IEEE International Conference on Robotics and Automation, Vol. 4, 2004, pp. 3771 – 3776 Vol.4.
[25] A. Tapus, R. Siegwart, Incremental robot mapping with finger-
Figure 15: Analysis of the “albert-b-laser” dataset [3]. a) The occupancy grid map. b) The classified map (black=wall, grey=unexplained, white=free). c) The analyzed world (black=wall, gray=unknown, white=free, light-gray=door). d) Walls (gray) and doors (light-gray) of the world drawn into the original map.
Figure 16: The typical development of the posterior probability P(W|M) (left) for the input map shown in Fig. 1 and the acceptance rate of the proposed state transitions (right) along the Markov chain development in terms of iterations.
prints of places, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2005, pp. 2429 – 2434.
[26] G. H. Lim, I. H. Suh, H. Suh, Ontology-based unified robot knowledge for service robots in indoor environments, IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 41 (3) (2011) 492 –509.
[27] C. M. Bishop, Pattern recognition and machine learning, Springer, 2007.
[28] S. Chib, E. Greenberg, Understanding the metropolis-hastings algorithm 49 (4) (1995) 327–335.
[29] P. Buschka, A. Saffiotti, A virtual sensor for room detection, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. 1, IEEE, 2002, pp. 637–642.
[30] G. v. Wichert, Can robots learn to see?, Control Engineering Practice 7 (1999) 783–795.
[31] P. Hough, Method and means for recognizing complex patterns, in: U.S. Patent 3069654, 1962.
[32] G. Kitagawa, Monte carlo filter and smoother for non-gaussian nonlinear state space models, Computational and Graphical Statistics 5 (1) (1996) 1–25.
[33] F. Chang, C. jen Chen, C. jen Lu, A linear-time component- labeling algorithm using contour tracing technique, Computer Vision and Image Understanding 93 (2004) 206–220.
Ziyuan Liu received his B.E. degree in Mechatronics from the TongJi University, Shanghai, China, in 2008. He received his M.S. degree in 2010 from the Institute of Automatic Control Engineering at Technische Universit¨at M¨unchen, Munich, Germany. Currently, he is a Ph.D. candidate at the Institute of Automatic Control Engineering at Technische Universit¨at M¨unchen. His research interests are semantic perception and sampling based inference methods.
Georg von Wichert received his Diploma (MSc) in Electrical and Control Engineering from Darmstadt University of Technology in 1992. From 1992 to 1998 he was a research and teaching assistant at the Institute of Control Engineering at Darmstadt University of Technology. In Darmstadt he also received the Ph.D. degree in Electrical Engineering in 1998. Since 1998 his is with Siemens Corporate Research and Technologies, where he currently holds the position of the program manager for Cognitive Autonomous System. At the same time he is a fellow of the Institute for Advanced Study at Technische Universit¨at M¨unchen.