Localization is an important task for autonomous vehicles and robots as well as for hand-held mobile devices with Location-Based Services (LBS). Finding the agent pose using GNSS-based localization methods is not always possible for various reasons: in indoor and closed areas where LBS are of high interest, the GNSS signals cannot be received. Besides, in dense urban scenarios, multipath effect usually deteriorates the accuracy of GNSS receivers, and even when there is reception, the accuracy of consumer GNSS-based devices is not good enough for sensitive and safety-critical tasks such as autonomous driving in which we need submeter accuracy. The more accurate devices like dGPS or RTK-GPS [11], [4] are too expensive to be equipped on every passenger car or mobile device. This motivates the need for landmark-based localization in which we use the coordinates of landmarks plus distances of agent to landmarks to estimate the agent’s pose. To this end, however, we need to have a map of the area including the (accurate) coordinates of the landmarks. In practice, however, the landmarks locations are not known beforehand and needed to be estimated themselves first. This is usually done based on the signals received/observed from them by an agent. Conventional datadriven methods for estimating the coordinates of landmarks
need to collect measurements from landmarks together with ground-truth information (being the coordinates of measurement locations) in order to estimate the landmarks locations. For instance, in case of WiFi localization this can be done by methods such as Weighted Centroid Localization (WCL) [2], [20] or using parametric methods based on propagation model [17], [22]. Collecting ground-truth labeled data is very often time consuming, expensive, and labor-intensive. For instance, to collect ground-truth coordinates of measurement points in the outdoor environments, we need expensive GNSS receivers with high accuracy. In indoor localization, e.g., WiFi-based localization, in order to estimate the location of access points, the data collection is carried out labor intensively by recording the coordinates at which the measurements are collected in addition to the RSS heard by WiFi receiver carried by the agent at each location.
To overcome this problem, in this paper a method is proposed for unsupervised landmark localization up to a rotation, transformation, and scale, or in other words, the topological map of landmarks. We design and train a network which learns the relative positions of all landmarks. The idea is akin to the celebrated Word2Vec idea for word embedding [10], and therefore we call it Landmark2Vec.
Paper organization: The rest of this paper is organized as follows: In Section II the problem and its assumptions are explained and formulated. Section III is the core part of the paper where the Landmark2Vec method is introduced in details. In Section IV, we provide a metric for performance evaluation of the proposed method in terms of map reconstruction and propose a heuristic stopping criterion for training. Some numerical experiments is provided in Section V. Finally the paper is concluded in Section VI.
Notation: For the sake of consistency, we adopt the following notations throughout this paper: denotes definition while = denotes equation. Vectors are denoted by small boldface letters (e.g., v) and matrices are denoted by capital boldface letters (e.g., denotes the second norm. Coordinates are shown by boldface small letter . To distinguish between the coordinates of landmarks and agent, we denote the coordinates of agent at location i by using subscript and the coordinates of landmark l by using superscript .
Consider an area deployed with L landmarks and the ultimate goal is to use the measurements/signals from landmarks for positioning. But, to this end, we first need to know the positions of landmarks. Without loss of generality, we assume that all the coordinates are two-dimensional (2-D). In conventional methods, to find the position of landmarks, we collect a relatively large number of measurements, say , from landmarks in the area. Let denote the set of measurements by:
where denotes the 2-D coordinates of i-th measurement, and is the measurement vector taken at , whose l-th entry is the observation from landmark l. This can be any signal which is (not necessarily linearly) proportional to the distance of the landmark to the pose ; it can be the range from a pole (landmark) along the road measured by a lidar sensor or camera, or the received signal strength (RSS) heard from a WiFi transmitter (landmark) by a mobile device.
Conventional methods use the ground-truth location information, i.e., coordinates , together with the measured signals , to estimate the positions of landmarks. In the following we review two existing approaches for landmarks and agent pose estimation.
A. Weighted Centroid Localization
Then, having all the estimates , during the test (agent positioning) phase, if an agent receives a measurement vector at a pose j, the coordinates of j can be estimated as 1
B. Multilateration
If it is easy to convert the sensor measurements to distances, then multilateration is a promising technique for positioning [8], [5], [3], [18]. This is the case for example when we use a Lidar or a camera for range measurement. Once the distance of a given landmark is known to three or more points with known coordinates (i.e., some ground-truth information) then we can use the multilateration technique to estimate the location of that landmark; see, e.g., [5].
Fig. 1. Landmark2Vec network architecture: The number of neurons in the input and output layers both equal the number of landmarks. The number of neurons in the middle layer is 2 for 2-dimensional localization and 3 for 3-dimensional localization. The activation functions of the middle layer is linear and of output layer is softmax.
We re-emphasize here that in both WCL and Multilateration, as well as other conventional techniques, the ground-truth information is required for performing landmark positioning.
Next, we introduce our proposed unsupervised method for landmark positioning which estimates the topological map of landmarks only from measurements .
In this section, the Landmark2Vec method is presented. The general NN architecture is introduced, the data collection and preparation for training is explained, the training is presented, and then we will describe how the trained model can be used for landmark positioning. The connection of the method to the problems of word embedding and graph embedding will be also described briefly.
A. Architecture
The general architecture for the network is as follows:
1) The neural network architecture used for unsupervised landmark localization is a butterfly shaped fullyconnected network as depicted in Fig 1.
2) The number of input neurons equals the number of landmarks.
3) The number of output neuron also equals the number of landmarks.
4) The number of neurons in the middle layer (bottleneck) equals the number of dimensions (2 if we are doing 2-dimensional localization, 3 if we are doing 3-dimensional localization).
5) The activation layer of the middle layer is linear.
6) The activation functions of the output layer should be softmax so as to generate numbers between 0 and 1 which sum up to 1.
B. Data Collection, Preparation, and Training
To collect data for training the network, we measure the signals/observations from all landmarks at as many locations as possible in the area of interest. Assuming that there are L landmarks, at each location we measure L signals/observations, one from each landmark, which can be put in an L-dimensional vector, where the first element of the vector is the signal received or observed from landmark number 1, the second element of the vector is the signal received or observed from landmark number 2, and so on and so forth. I emphasize again that we do not need to record the coordinates of the measurement points as the proposed method is an unsupervised one. This will hugely reduce the cost and labor time needed for landmark positioning, which is the main advantage of the proposed method.
As mentioned in subsection III-A, the number of neurons in both input layer and output layer equal the number of landmarks, i.e., L. Therefore, to train the network, we must provide L-dimensional vectors to feed both as input to the input layer and as target to the output layer. We build both input and target vectors from the collected data vectors. In other words, from each L-dimensional collected data vector we build one L-dimensional input vector and one L-dimensional target vector. Denoting a measurement by , in order to build an input vector and a target vector from it, we follow the following steps:
1) We choose a number n, where . This isakin to n in n-gram2 concept in Natural Language Processing (NLP), and basically is used to only select n landmarks whose measured values are more similar at each measurement, or in other words are more correlated location-wise. From , we then keep n largest (in magnitude) entries and set the rest equal to zero. Let call this vector .
2) Input vector is an L-dimensional vectors with one element equals 1 and the other elements equal 0. The only 1 element is where the collected data vector has its largest value.
3) Now to build the target vector , we set the largest entry of to zero (i.e., the one corresponding to the only nonzero element of input vector ) and then normalize the other nonzero entries of such that they sum up to 1.
To provide a toy example, assume that there are L = 6 landmarks and at a (unknown) location the 6-dimensional measured vector is . Assuming that we have chosen n = 4, then . The input vector built from will be and the 6-dimensional target vector will be .
During the training, we fed the input layer by the input vectors built as described above. The network then spits out an output vector in its output layer. The loss function then will be the cross entropy between this “output vector” and the corresponding “target vector” built from training data as explained above. The training is done through the back-propagation algorithm.
C. Inference
After the training is finished, the last step is to infer the location of landmarks from the trained model. In order to find the location of a landmark, say landmark l, we feed the trained network by a vector whose l-th element equals 1 and the other elements are 0. What network generates in response to this vector in the output of its middle layer (bottleneck), is the coordinates of landmark l. If we are doing the localization in 2-dimensional space, then there will be 2 neurons in the middle layer. The first one represents x-coordinate of the landmark’s location and the second one represents the y-coordinate. Similarly, if the space is 3-dimensional, there will be three neurons in the middle layer whose output will be (x, y, z) coordinates of the landmarks.
Equivalently, we can say that the trained weight between l-th input neuron and the first middle neuron is the x coordinate of landmark l and the weight between it and the second middle neuron is the y-coordinate of landmark l.
D. Analogies between Word2Vec and Landmark2Vec
To shed some more light on the proposed landmark2vec method, in this section we provide some analogies between Landmark2Vec and the celebrated Word2Vec algorithm for word embedding in NLP [10]. Word2Vec exploits the cooccurrences of words in n-grams to find their relative positions in an embedded space. Similarly in Landmark2Vec, the relative position of landmarks is inferred based on the similarity of their values in recorded measurements.
The ingredients of the two algorithms have analogies as described below 3:
• A “landmark” in Landmark2Vec is analogous to a “word” in Word2Vec. Both are to be embedded to a low-dimensional space.
• The number of distinct words in corpus in Word2Vec is analogous to the number of landmarks L in Land-mark2Vec.
• A measurement is analogous to a sentence.
• The n-gram role is played by the n nonzero elements of vector as defined in subsection III-B. The diffrence is that in Landmark2Vec we only have one n-gram per measurement while in Word2Vec we can have multiple n-grams per sentence.
• The center word in n-gram is like the strongest landmark in terms of received/observed signal in a measurement.
• Context words are analogous to the other nonzero landmarks in .
E. Connection to Graph Neural Networks
Graph Neural Networks (GNN) is the extension of Neural Networks for graph data [13], [9], [21], [7]. Graph embedding is the task in GNNs where the goal is to embed a graph or part of it (a subgraph, subset of nodes, or subset of links) into a low-dimensional space. Landmark2Vec can also be formulated as a graph embedding problem in the following way:
The nodes of the graph are all the measurements plus the landmarks. In other words, the graph has nodes. Each of the L nodes corresponding to landmarks is connected with a link to all measurement nodes, where the weight of a link between a landmark node and a measurement node is the received/observed signal form that landmark in that measurement. There is no links between the measurement nodes, and also there is none between the landmark nodes. The Landmark2Vec can be then thought of as embedding a subset of nodes, namely the nodes corresponding to L landmarks, to a 2-dimensional space.
In this section, a metric for evaluating the proposed algorithm is introduced and based on that discuss the over-fitting problem and suggest a heuristic stopping criterion for avoiding overfitting.
A. Sum of Squared Matching Errors
As mentioned earlier when introducing the proposed method, the output of the proposed algorithm is a topological map of landmarks, i.e., landmarks positions, up to a scale, translation, and rotation. In other words, what the method retrieves from the landmarks measurements is the relative positions of landmarks. To compare the true landmarks locations and the estimated scaled-translated-rotated landmarks locations, we need a metric which is independent of scale, location, and translation. Without loosing the generality, let assume that we are carrying out the localization in 2-D space. Denoting the true 2-D coordinate vector of l-th landmark byand the estimated coordinate vector by, in case of perfect recovery, we expect these two to be related through the following equation:
where accounts for the rotation and scale, and accounts for the translation. An imperfect recovery then can be represented by an error term w which we call the recovery error. Equation (4) then can be rewritten as:
To evaluate the method we then use the following Sum of Squared Matching Errors (SSME) metric:
where
Fig. 2. The validation loss (top) versus the matching error (bottom) calculated as in (6). As it can be seen while the model has not yet overfit in terms of validation loss, it overfits in terms of matching error.
B. Stopping Criterion
To avoid overfitting, the training must stop at some point. Early stopping based on validation data is not suitable here as the loss function (cross entropy between targets and outputs) and SSME formulated in (6) are not directly related. In other words, overfitting in landmark positioning may happen even if the loss on validation dataset is still decreasing. This can be seen in figure 2. As it can be seen, although the loss value on validation dataset is (slowly) decreasing (top), after a point the SSME starts to increase (bottom). In other word, although the model has not yet started to overfit in terms of loss function, it starts to overfit in terms of SSME. The point after which the model start to overfit depends on the size of the training dataset: the bigger is the training dataset size, the sooner overfit happens. Therefore we cannot choose a fixed number of epochs for training to end as it depends on training size.
We observed that in practice, the overfit usually happens after the speed of decrease in validation loss decreases itself. And this is something which is independent of the size of training dataset. Therefore, we use the heuristic criterion which ends the training as soon as the decrease speed of loss becomes lower than a pre-specified threshold, typically a small number close to zero, e.g., 0.1, times the biggest decrease in loss. Mathematically, denoting the validation loss at epoch e with , the training stops when
In this section, we study the performance of the proposed unsupervised landmark positioning method through numerical studies. To this end, we use synthetic data generated via simulation. We consider two different models for observed signals which models two important types of sensors used in real-life for localization.
A. Pathloss Model and Received Signal Strength
The first measurement model studied in the simulation is when the signal received by the agent is Received Signal Strength (RSS) heard from some landmarks. Landmarks in this case are radio transmitters, for example WiFi transmitters (a.k.a Access Points in the context of WiFi localization) or mobile base stations. The receiver (agent) is typically a mobile device which can measure the strength of radio signals transmitted by these WiFi Access Points (landmarks). The goal is to determine the position of landmarks only using the RSSs heard from them. This is an application which illustrates the benefits of the proposed method very well. Since WiFi access points are usually deployed for network coverage (and not positioning), therefore their exact locations is not known when we want to take advantage of them for positioning. Estimating their position using conventional methods like Weighted Centroid Localization (WCL) [2], [20].
To model the wave propagation between a landmark, say landmark l, and a receiver position, say position i, we use the pathloss model [14], [6], [12] as follows:
where is the RSS heard from landmark l at location i, is the transmit power of landmark is the pathloss exponent for landmark is the distance between position i and landmark l, and is the noise.
B. Inverse Linear Model
The second model used for simulation is an inverse linear model. It is inverse linear in that the observation of the landmark is linearly proportional to its inverse distance to the measurement location. The most famous sensor obeying such a model is a camera: the size of the image of an object (landmark) in camera is (approximately) linearly proportional to its inverse distance to the camera [16].
C. Experiments and Results
Experiment 1: The goal of the first experiment is to provide a simple visualization of the ability of the proposed method to reconstruct a map of landmarks. For the sake of visual clarity, we consider the hypothetical situation where the landmarks are equally separated on a circle. The model used for generating synthetic data is pathloss model as in (10). The number of landmarks is L = 30 and the number of measurements is where 80% of measurements have been used for training.
The result is shown in figure 3. The training terminated when (8) is satisfied with . Here the training has
Fig. 3. Reconstruction of landmark positions using Landmark2Vec: top: the true place of landmarks; bottom: their estimated positions. Each landmark has an ID between 0 and 29. As it can be seen the relative positions of landmarks have been preserved by landmark2vec. Only a scale, an orientation, and a translation differ between the two figures.
ended only after 50 epochs with an almost perfect recovery of relative positions of landmarks. The top figure depicts the true positions of landmarks while the bottom figure depicts the estimated positions using landmark2vec. Each landmark has been specified by a label from 0 to 29. As it can be seen the relative positions of landmarks (their order on the circle) is the same in both figures. In other words, landmark2vec has retrieved the map of landmarks up to a scale, a rotation, and a translation.
Experiment 2: In this experiment we study a more objective measure, namely the SSME as defined in Section IV-A. We compare the matching error (6) for the two above mentioned models (pathloss and linear). The number of landmarks is L = 30 and the number of measurements is where 20% of measurements have been used
Fig. 4. Reconstruction error of landmark positions using Landmark2Vec for linear and pathloss models. As it can be observed, both the SSMEs and the stopping times of the two algorithms are similar.
for training. The training has ended when (8) is satisfied with . As it can be seen although pathloss is a more complicated model, both the SSME and the number of epochs for stopping the training are almost the same as the ones of the linear model.
Both experiments 1 and 2 above were implemented in Python [19] version 3.6 and Tensorflow [1] version 1.10.
A neural network based method, called landmark2vec, for unsupervised positioning of landmarks was proposed, where no ground-truth information is required to estimate the position of landmarks up to a scale, rotation, and shift. The NN architecture is a shallow one comprising of just one hidden layer whose size is the same as dimensionality of space (2 for 2D positioning and 3 for 3D positioning). The training was explained and an evaluation metric was provided in order to assess the performance of the proposed method in landmark positioning. The performance was briefly illustrated and studied through numerical examples.
[1] Mart´ın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pages 265–283, 2016.
[2] Jan Blumenthal, Ralf Grossmann, Frank Golatowski, and Dirk Timmermann. Weighted centroid localization in zigbee-based sensor networks. In 2007 IEEE international symposium on intelligent signal processing, pages 1–6. IEEE, 2007.
[3] ID Coope. Reliable computation of the points of intersection of n spheres in ANZIAM Journal, 42:461–477, 2000.
[4] Ahmed El-Rabbany. Introduction to GPS: the global positioning system. Artech house, 2002.
[5] Bertrand T Fang. Trilateration and extension to global positioning system navigation. Journal of Guidance, Control, and Dynamics, 9(6):715–717, 1986.
[6] Julius Goldhirsh and Wolfhard J Vogel. Handbook of propagation effects for vehicular and personal mobile satellite systems. 1998.
[7] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855–864. ACM, 2016.
[8] Willy Hereman. Determination of a position in three dimensions using trilateration and approximate distances.
[9] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
[10] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[11] Pratap Misra and Per Enge. Global positioning system: signals, measurements and performance second edition.
[12] Henri Nurminen, Jukka Talvitie, Simo Ali-L¨oytty, Philipp M¨uller, Elena-Simona Lohan, Robert Pich´e, and Markku Renfors. Statistical path loss parameter estimation and positioning using rss measurements in indoor wireless networks. In 2012 International Conference on Indoor Positioning and Indoor Navigation (IPIN), pages 1–9. IEEE, 2012.
[13] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
[14] Theodore S Rappaport et al. Wireless communications: principles and practice, volume 2. 1996.
[15] Alireza Razavi, Mikko Valkama, and Elena-Simona Lohan. K-means fingerprint clustering for low-complexity floor estimation in indoor mobile localization. In 2015 IEEE Globecom Workshops (GC Wkshps), pages 1–7. IEEE, 2015.
[16] Toni Schenk. Introduction to photogrammetry. 2005.
[17] Jushang Shen, Baoqi Huang, Yu Tian, and Long Zhao. On the reliable localization of wifi access points. IEEE Access, 7:90931–90940, 2019.
[18] Federico Thomas and Llu´ıs Ros. Revisiting trilateration for robot localization. IEEE Transactions on robotics, 21(1):93–101, 2005.
[19] Guido Van Rossum and Fred L Drake Jr. Python tutorial. Centrum voor Wiskunde en Informatica Amsterdam, The Netherlands, 1995.
[20] Jun Wang, Paulo Urriza, Yuxing Han, and Danijela Cabric. Weighted centroid localization algorithm: theoretical analysis and distributed implementation. IEEE Transactions on Wireless Communications, 10(10):3403–3413, 2011.
[21] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596, 2019.
[22] Yuan Zhuang, Bruce Wright, Zainab Syed, Zhi Shen, and Naser El-Sheimy. Fast wifi access point localization and autonomous crowdsourcing. In 2014 Ubiquitous Positioning Indoor Navigation and Location Based Service (UPINLBS), pages 272–280. IEEE, 2014.