Neural Gas (NG) [1] is a robust algorithm for vector quantization with a well-known analytical model that has been first described in [2]1 and then expanded in other subsequent works [3], [4]. According to this model, the Neural Gas algorithm performs a stochastic gradient descent over an energy function and converges to a final configuration whereby the density of NG units relates to the density of the underlying input data distribution via a power law that is usually called magnification (see e.g. [5]). Further studies have been conducted about the topographic mapping properties of the NG [6], namely the capability to represent an implicit neighborhood relation – which can be made explicit via Delaunay triangulation – that describes faithfully the topology of input data distributions. Other studies focused on NG variants that enable controlling the power law explicitly [5], [7]–[9], in particular to obtain an exponent of value one that makes the two densities be the same, except for a scaling factor.
The experimental study presented here is motivated by the objective of using NG for geometric inference and shape detection, as it happens with the NG-related algorithm SOAM [10], that reconstructs a surface from a scattered point cloud of samples with topological and geometrical guarantees of faithfulness of the result obtained. For instance, this algorithm has been used for excluded volume computations in [11]. More precisely, the focus of the study presented here is assessing the effects of shape in input data distributions over the NG behavior. For doing this, we consider input data distributions of the kind U(shape
namely a uniform probability over a given shape convolved with some (isotropic) noise with specific parameters .
Clearly, uniform shape sampling and isotropic noise are indeed limited representations of a real-world scenario for practical applications (e.g. for 2D or 3D image deconvolution). Nevertheless, even within these limits, the experimental evidence collected reveals the substantial relevance of shape in determining the overall NG behavior and shows an interesting potential in this direction; as it will be discussed, while the relevance of power law is substantially confirmed, the effects of shape in data distributions can be even stronger in some circumstances and determine NG configurations that cannot be explained by the power law alone.
Fig. 1. A point sample (a) drawn with (almost) uniform probability from the surface of the Stanford bunny is then convolved (b) with isotropic Gaussian noise. Presented with the latter and with appropriate parameter settings (see text), the Neural Gas (c) attains a final configuration from which the original surface can be reconstructed faithfully (d).
In particular the focus of this work is providing an experimental assessment of NG behaviors like those exemplified in Fig. 1, while giving some preliminary explanations. Although a deeper study is required, i.e. for finding better analytical models describing the NG behavior in these cases, the evidence presented suggests that this study can be feasible and worth performing.
A. Neural Gas
Given an input data distribution described by probability P(v), where , a neural gas (NG) [2], [6] is a set W of k units, each associated to a reference vector in
:
The NG algorithm adapts progressively the reference vectors in W to the input probability P by repeating the following iteration:
1) receive one signal v distributed as P(v); 2) update the reference vectors in W (see below); 3) return to step 1.
In each iteration, the reference vectors in W are updated by
where denotes the cardinality),
is a real parameter,
(throughout this paper, denotes the usual Kronecker delta function).
1) Convergence: It is proven in [2] that the NG algorithm performs a stochastic gradient descent (SGD) over the energy function:
where V is the support of probability P(v) and
In particular, in [2] it is proved that
which makes equation (1) an SGD update law. In keeping with this, an NG can be made to converge to a steady configuration by choosing values of that decrease exponentially with the iterations of the algorithm:
where and
are the initial and final values, respectively, T is the total number of iterations and t is the current iteration.
2) Relation with K-means: When , equation (1) becomes equivalent to
where k(v) is the function that returns the index of the closest neighbor in W to the input signal v. This is the update law of the well-known algorithm named K-means [12], [13], which performs an SGD over the distortion error
3) Magnification: Considering the average position update of a unit with respect to P
under specific assumptions to be discussed below, in [2] it is also proven that:
is the density of units in W at
and
are the gradients with respect to
of, respectively, the density
and the probability P, and d is the dimension of the support V . The solution at equilibrium, i.e. when
, entails a power law:
In many related works (see for instance [5], [7]–[9], [14]) this particular power law is called magnification, while is deemed the magnification factor.
From a technical standpoint, the derivation of (5) relies on three main assumptions:
A.1: is analytic in
A.2: P is analytic in
A.3: and small with respect to the curvature of
and P.
Assumption A.1 means considering the discrete set W in the limit of a continuous density while A.2 is in often true ‘per se’, with typical input probabilities. Altogether, A.1 and A.2 allow expanding and P into Taylor series, whereas assumption A.3 allows considering only the first terms in these. From another point of view, A.3 entails ‘flatness’ of P, at least in the scale of
, hence ruling out higher order contributions from its geometry.
B. Numerical Experiments
1) Estimating Probability and Density: The estimation of and
is a crucial aspect for interpreting the results of numerical experiments and therefore requires an appropriate technique. In other works, like [15], the ambient space is divided into bins of suitable size, while in [9] a Parzen window estimator is used. The estimation technique adopted here is described in [16]. It is based on a weighted k-distance defined in [17] and that, in the discrete case, takes the form
where C is a finite point cloud in is a parameter, and
is the i-th nearest neighbor to x in C. In the case discussed here, C can either represent an empirical sample of points drawn from P or the set W of NG units. In [17] it is shown that (6) is distance-like, i.e. it shares many properties of a distance function, and that its level sets preserve the geometrical and topological properties of the corresponding level sets of the probability P from which the point cloud has been drawn. Based on (6), the estimator proposed in [16] is
where N is the number of points in C and is the volume of the D-dimensional unit ball. Reportedly, the estimator (7) is optimal for
In this work (7) has been used for estimating and
respectively; the latter, in fact, can be normalized into the probability
For simplicity, however, in this work we will continue to refer to the unnormalized density .
2) Entropy: Given a final NG configuration, the probability of a generic unit i of being ‘a winner’ in W, i.e. of being the nearest neighbor to a generic input signal v having probability P(v), can be estimated by
where s is the total number of input signals and is the number of input signals in the Voronoi cell of unit i or, in other words, is the number of input signals having
as their closest neighbor in W.
The Shannon entropy function is defined as
The maximum of H is attained when all k units have equal probability of being the nearest neighbor of a random input signal sampled from P. In that case, the value of Shannon entropy is
3) Implementation and execution: The software for the experiments has been written in Java; all experiments have been performed with a 64-bit Java virtual machine running on an IntelXeon
CPU E-1240 v3 @ 3.40 GHz computer with 8GB of RAM. The Meshlab software tool [18] has been used for 2D and 3D processing of point clouds and for the geometrical and topological validation of results.
In the experiments decribed here, the initial positions of NG units in W have been randomly chosen with probability P. For this reason, the only parameters in (1) that determine the NG behaviour are and k.
In this study, the decaying law (3) has been adopted for parameter , with
and
, to obtain steady NG configurations that make it easier the assessment of relevant properties. The combined effect of the remaining parameters
and k, with different input data distributions, has been studied in more than 3500 experiments, each intended as a complete run of the NG algorithm.
Due to the general field of interest of this work, only 2D and 3D ambient spaces have been considered. All input data distributions for the experiments were generated from probabilities in the general form of the convolution described:
Fundamental shapes like individual points, the unit circle , the unit disk D, the unit sphere
and the unit ball B(0, 1) have been considered. As an example of a more complex shape, the Stanford bunny2, which is a classical benchmark in the field of geometrical shape reconstruction, has been also adopted. For the noise model to be convolved, the class of multivariate normal distributions
with isotropic covariance matrix, i.e.
has been considered as the first option.
Two other noise models have been considered. One is a sinusoidal probability with density
whenever and 0 elsewhere in
. Another is the uniform density over a ball of given radius r:
To ensure the possibility of estimating probability densities at the points attained by NG units in the final configurations, all data distributions were generated in advance, as standalone point clouds, and then presented to the NG algorithm.
For reasons of space and readability, in the rest of this section, the results obtained will not be described systematically but rather by following a logical pathway that illustrates the most relevant findings through the more relevant steps.
A. Decaying lambda?
Assumption A.3 about the parameter (see section II-A3) does not translate immediately into numerical constraints and is therefore difficult to enforce in practical experiments. Partly for this reason, it is suggested in [2] to make
decay exponentially as the algorithm progresses, with the same law adopted for
:
Fig. 2. An experiment with an NG with decaying M) presented with a 2D uniform sample from the unit circle convolved with normal noise (a). After 150K iterations, the NG (b) is very close to the unit circle then, after 500K iterations (c), as
decreases, the NG begins dispersing until it reaches its final configuration (d), after 4M iterations.
Note however that this decay law is not part of the model described by (4). In passing, the same decay law is also suggested in Kohonen’s Self-Organizing Map (SOM) algorithm since ‘it produces an improvement in the global order of units’ [19].
Fig. 2 shows the typical behavior of an NG with a decaying . In this experiment the 2D-embedded input probability is
where is the unit circle and the N is 2D isotropic multivariate normal, with
(see (9)). Initially, when the value of
is large, the NG assumes a configuration that is closer to
; then, as the algorithm progresses and
decreases, the NG attains a more dispersed configuration that resembles closely the input distribution. As it can be seen, during the process, the NG undergoes substantial changes of ‘shape’ which are not necessarily reflected in the final configuration.
Given the objectives of this work, these intermediate NG configurations are indeed relevant and therfore we focused on experiments with constant values.
B. Shape is relevant
The changes in NG configurations and the relevance of geometrical features in the input probability P are shown by the experiments in Fig. 3, which describes four final NG configurations, each corresponding to different values of , for the same 3D-embedded input probability
where U is the uniform distribution and B(0, 1) is the 3D unit ball centered in the origin.
As it can be seen, depending on , the final NG config-urations can be grouped into three main classes, or ‘phases’,
Fig. 3. Four final NG configurations for the same uniform probability over a 3D unit ball (shown by the external ‘trackball’), with k = 512 and different (constant) values. For
(a) the NG distributes uniformly as well. With greater values of
, an outer sphere (in orange) separates from an inner ball (in dark green), as shown for
NG disposes itself on the surface of an empty sphere.
defined in terms of the distinct describing topologies (ball; sphere with an inside ball; empty sphere)3.
In addition, further experiments show that these transitions also depend on k and occur with remarkable regularity; in all experiments performed with the input probability above, with integer values ranging from 0 to 32 and k equal, respectively, to 32, 64, 128, 256, 512, 1024 and 2048, the transition between the topology in Fig. 3(c) and that in Fig. 3(d) always occurs at
.
This 3D phenomenon is much less evident in 2D, as described by Fig. 4. In the experiments performed, in the final NG configurations with a 2D-embedded probability of U(D), where D is the unit disk, the changes of the describing topology are much less dramatic than in the 3D case.
C. Magnification
Further experiments have been performed with the noise model alone, i.e. without a shape, using 2D and 3D multivariate normal probabilities with isotropic covariance matrix. The diagrams in Fig. 5 compare on a log-log plot the densities
attained in final NG configurations and the probabilities
above, both estimated with (7) and the suggested optimal value of
.
Different data series represented in the diagrams correspond to values ranging from 0 to 18 (data series with higher
values are shown in warmer colors). All of these have been
Fig. 4. A 2D uniform probability over a unit disk (shown by the outer circle) does not induce an NG a behavior similar to that in Fig. 3, i.e with a uniformly-sampled 3D ball as input. The final NG configurations represented here are for respectively.
obtained with k = 2048. As shown in the log-log plots, as grows larger the power law becomes evident. More precisely, for
there exists a limit upper density to which the densest 5% NG units tend asymptotically, while the remaining 95% NG units follow the power law. In addition, the estimated values of
is in substantial agreement with the value predicted by (5); e.g. for
the estimated
values for d = 2 and d = 3 are 0.5132 and 0.5995 respectively, versus predicted values of 0.5 and 0.6.
In passing, Fig. 6 shows the log-log plots obtained in the same conditions as for the experiemnts in Fig. 5 but with values that decay following (11) from initial
in the range 1–18 to
in 5M iterations. Although the log-log plots in Fig. 6 are not very different from those in Fig. 5, it can be seen by comparing the two figures that when
is kept constant the region occupied by NG densities tends to shrink as
grows larger.
D. Interplay between shape and magnification
The interplay between the effects produced by shape in data distributions and the power law (5) is effectively exemplified by the behavior shown in Fig. 7. The 3D data distribution used in this case is sampled from the convolution between a uniform probability over the unit sphere and a multivariate normal probability with
(see (9)). Here, the overall NG behavior can be better discussed with the help of the following qualitative diagram that represents an idealization of the log-log density plots shown in Fig. 8:
Fig. 5. Log-log density plots of final NG configurations with, respectively, 2D (a) and 3D (b) multivariate, isotropic normal probabilities. with constant values ranging from 0 to 18 (shown with increasingly warm colors).
When is low, NG units tend to disperse like in Fig. 7(b) and, away from the surface of the sphere, their densities tend to obey the power law whereas, close to the surface, these densities tend to reach a limit value. Overall, on the log-log plot, NG densities occupy a region that goes from ‘magnification’ to ‘hotspot’ in the qualitative diagram, having an extent that also depends on k; this region tends to shrink towards the ‘hotspot’ as
increases, as shown in Fig. 8(a). Interestingly, due to the curvature of the surface, NG densities close to the internal (i.e. ‘concave’) side of the sphere are slightly higher than the densities on the external (i.e. ‘convex’) side; this effect generates the ‘needle’s eye’ that can be observed in Fig. 8(b). When
is sufficiently large, the region occupied by NG densities shrinks to a point-like area (i.e. in the ‘hostspot’), as shown in Fig. 8(c), and all NG units dispose in close proximity to the surface of the sphere, as in Fig. 7(d). Further increases in the value of
cause the NG to assume the configuration of a sphere with shrinking radius – while maintaining constant density, as shown in Fig. 8(d).
Fig. 9(a) shows that, with k = 1024, the Shannon entropy
Fig. 6. Log-log density plots for the same experimental settings as in Fig. 5 but with decaying values (see text).
becomes maximal for the same value for which the NG disposes on the sphere, and remains maximal for larger
values. In this case, in fact, the combined effect of spherical shape and constant NG densities makes the Voronoi cells corresponding to NG units to have (almost) the same volume and contain the same probability mass. From another standpoint, Fig. 9(b) describes the NG behavior in terms of geometrical proximity to the sphere, as measured by
As it can be seen, the NG configuration is proximal to the sphere for and then gradually distances itself as
increases, due to the shrinking effect. The NG behavior in point is also remarkably regular: with k values of 128, 256, 512 and 1024 the minimum of the proximity function occurs when
is approximately equal to
.
E. More complex shapes
Fig. 1 shows the final NG configuration for an input data distribution obtained by convolving the Stanford bunny point cloud with isotropic gaussian noise with , where l is the side length of the smallest cube containing all the points in the cloud. In the experiment shown, k = 4096 and
. The log-log density plots in Fig. 11(a) show that, as
increases, the NG densities tend to concentrate in a more restricted area. On the other hand, the log-log plots do not show a clear behavior as in the case of the sphere, possibly due to the much richer geometrical features of the bunny, both in terms of curvature
Fig. 7. A uniform sample (a) from a 3D unit sphere (i.e. shown by the ‘trackball’) convolved with Gaussian noise. Presented with this input, an NG with k = 1024 remains somewhat dispersed for while, for
(d), the NG disposes itself on the surface of a sphere.
and distance between different parts of the surface4. Due to the same reason, the Shannon entropy diagram in Fig. 12(a) shows a few heights in the range 3–7 and then decreases slowly but steadily; this suggests that with more complex shapes, with different local geometries, there will be no unique and optimal choice for the values of and k.
Nevertheless, the triangulated shape in Fig. 1(d) shows the existence, in this case, of a suitable range of and k values that makes it possible a faithful reconstruction of the original surface from the final NG configuration5. A critical parameter for this is the Hausdorff distance, defined as
where is the surface of the bunny and h is the proximity function in (13). A sufficiently low
, in fact, guarantees the possibility of reconstructing from W a triangulation that is both homeomorphic and geometrically close to the original 2D surface. The diagram in Fig. 11(b) shows that, for the case in point, the minimal
value is attained for
, making this NG configuration the most likely candidate.
In general, the suitable range of both and k values for surface reconstruction will be limited. With the Stanford bunny, for instance, it is not possible to reconstruct the surface for
, as the density of NG units becomes too low in critical areas, like e.g. on the ears of the bunny. Likewise, as
Fig. 8. Piecewise justification of the qualitative diagram (see text) describing the NG behavior with the ‘noisy sphere’ in Fig. 7. With k = 4096, the log-log density plot (a) for values in the range 0–18 shows that the power law is dominant for all units sufficiently away from the sphere while the effects of shape are more pronounced in its proximity; in particular, the detail for
(b) shows the ‘eye of the needle’. With
the NG disposes on a sphere, with uniform density (i.e. the red ‘dot’). With
(d) the NG disposes itself with uniform density on a sphere (blue ‘spot’ on the left) that shrinks progressively, as
increases (in warmer colors).
Fig. 9. Variations in Shannon entropy (a) and proximity function (b) w.r.t. for final NG configurations in Fig. 7. For
the value of entropy attains the theoretical maximum H = 6.9314 (dashed line in red) and remains at the same level even with larger
values. The proximity function (b)
reaches a minimum for
in the range 10–13 and then gradually increases as the NG configurations begin to shrink.
shown in Fig. 10, for the final NG configuration is too dispersed and for larger
values the shrinking tendency of the NG entails the loss of important topological details.
On the other hand, this particular NG behavior is quite robust with respect to different (isotropic) noise models. In the experiments performed, the Stanford bunny has been convolved with the sinsuoidal probability in (10) with , where
is the same value as above. With k = 4096 and
the reconstruction is again successful. Further experiments have also been performed by convolving the same point cloud with the uniform probability U(B(0, r)) with
; for k = 4096 and
the reconstruction is successful as well.
The experimental evidence presented confirms the validity of the magnification model (5) for the Neural Gas, even with relatively large values of , whenever the shape of the input probability P is not particularly relevant. Vice versa,
Fig. 10. Final NG configurations for the noisy sample of the Stanford bunny (see Fig. 1(b)); with (a) the configuration is too disperse for the purposes of surface reconstruction while with the optimal value
is possible (see Fig. 1(d)). With
(c), however, the NG configuration alters significant shape features: as it can be seen in the detail (d), the ears are no longer hollow and the NG disposes on a surface.
Fig. 11. With the noisy Stanford bunny and k = 4096, the log-log density plots (a) for final NG configurations with different values of in the range 0–18 reflect multiple effects at once: it can be seen however that NG densities tend to be confined in a smaller area as
increases; in particular, the detail (b) shows the densities for the optimal value
when the geometrical and topological features of P are more pronounced, the NG behaves in ways that are not explained by the magnification law alone. As already stated, this further evidence is not in contrast with the original NG model in [2] as its analytical derivation relies in fact on neutralizing the effects of shape.
The experiments performed show that the effects of shape in P, when present, become dominant in determining the final NG configurations. Although at present these phenomena can be assessed only empirically, the NG response to geometric
Fig. 12. Variations in Shannon entropy (a) and Hausdorff distance (b) w.r.t. for the NG experiments with the noisy sample of the Stanford bunny in Fig. 1(a). The entropy reaches its top value, close to the theoretical maximum (dashed line in red), for
and then decreases with larger
The Hausdorff distance (b) reaches a minimum for the same value and then increases gradually.
features seems to be describable in precise terms, at least with specific and controlled benchmark shapes.
For practical purposes, the experiments also show that the NG behavior described has the interesting potential for performing a specific sort of online blind deconvolution [22]; online as the NG adapts its configuration via SGD in response to a stream of incoming pointwise signals, and blind since the noise model needs not be known beforehand.
Further work is clearly necessary to assess the usefulness of the NG behavior with more realistic and complex examples, for instance with the non-uniform sampling of shapes and/or with anisotropic noise models. In addition, another direction for further studies is about the impact on the NG response to shape with the various techniques for magnification control described in the literature [5], [7]–[9], [15], i.e. having the objective of controlling the factor in the power law (5).
This work is funded by the national governments and the European Union through the ENIAC JU project DeNeCoR under grant agreement number 32425. The second author gratefully acknowledges support by the Italian FIRB “Futuro in Ricerca” grant RBFR10DGUA.
[1] T. Martinetz, K. Schulten et al., A” neural-gas” network learns topologies. University of Illinois at Urbana-Champaign, 1991.
[2] T. Martinetz, S. Berkovich, and K. Schulten, “Neural-gas network for vector quantization and its application to time-series prediction,” IEEE Trans. Neural Networks, vol. 4, no. 4, pp. 558–569, Jul 1993.
[3] T. Villmann, B. Hammer, and M. Biehl, “Some theoretical aspects of the neural gas vector quantizer,” in Similarity-Based Clustering. Springer, 2009, pp. 23–34.
[4] A. Witoelar, M. Biehl, A. Ghosh, and B. Hammer, “Learning dynamics and robustness of vector quantization and neural gas,” Neurocomputing, vol. 71, no. 7, pp. 1210–1219, 2008.
[5] T. Villmann and J. C. Claussen, “Magnification control in self-organizing maps and neural gas,” Neural Computation, vol. 18, no. 2, pp. 446–469, 2006.
[6] T. Martinetz and K. Schulten, “Topology representing networks,” Neural Networks, vol. 7, no. 3, pp. 507–522, 1994.
[7] A. Jain and E. Mer´enyi, “Forbidden magnification? i.” in ESANN. Citeseer, 2004, pp. 51–56.
[8] J. C. Claussen and T. Villmann, “Magnification control in winner relaxing neural gas,” Neurocomputing, vol. 63, pp. 125–137, 2005.
[9] B. Hammer, A. Hasenfuss, and T. Villmann, “Magnification control for batch neural gas,” Neurocomputing, vol. 70, no. 7, pp. 1225–1234, 2007.
[10] M. Piastra, “Self-organizing adaptive map: Autonomous learning of curves and surfaces from point samples,” Neural Networks, vol. 41, pp. 96–112, 2013.
[11] M. Piastra and E. G. Virga, “Octupolar approximation for the excluded volume of axially symmetric convex bodies,” Physical Review E, vol. 88, no. 3, p. 032507, 2013.
[12] S. Lloyd, “Least squares quantization in pcm,” Information Theory, IEEE Transactions on, vol. 28, no. 2, pp. 129–137, 1982.
[13] J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14. California, USA, 1967, pp. 281–297.
[14] C. M. Bishop, M. Svens’ en, and C. K. Williams, “Magnification factors for the som and gtm algorithms,” in Proceedings 1997 Workshop on Self-Organizing Maps, 1997.
[15] E. Mer´enyi, A. Jain, and T. Villmann, “Explicit magnification control of self-organizing maps for forbidden data,” Neural Networks, IEEE Transactions on, vol. 18, no. 3, pp. 786–797, 2007.
[16] G. Biau, F. Chazal, D. Cohen-Steiner, L. Devroye, C. Rodriguez et al., “A weighted k-nearest neighbor density estimate for geometric inference,” Electronic Journal of Statistics, vol. 5, pp. 204–237, 2011.
[17] F. Chazal, D. Cohen-Steiner, and Q. M´erigot, “Geometric inference for probability measures,” Foundations of Computational Mathematics, vol. 11, no. 6, pp. 733–751, 2011.
[18] P. Cignoni, M. Corsini, and G. Ranzuglia, “Meshlab: an open-source 3d mesh processing system,” Ercim news, vol. 73, pp. 45–46, 2008.
[19] T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.
[20] F. Chazal, D. Cohen-Steiner, and A. Lieutier, “A sampling theory for compact sets in euclidean space,” Discrete & Computational Geometry, vol. 41, no. 3, pp. 461–479, 2009.
[21] N. Amenta and M. Bern, “Surface reconstruction by voronoi filtering,” Discrete & Computational Geometry, vol. 22, no. 4, pp. 481–504, 1999.
[22] M. Hirsch, S. Harmeling, S. Sra, and B. Sch¨olkopf, “Online multi-frame blind deconvolution with super-resolution and saturation correction,” Astronomy & Astrophysics, vol. 531, p. A9, 2011.