Universal consistency of the $k$-NN rule in metric spaces and Nagata dimension

2020·Arxiv

Abstract

Abstract

The k nearest neighbour learning rule (under the uniform distance tie breaking) is universally consistent in every metric space X that is sigma-finite dimensional in the sense of Nagata. This was pointed out by C´erou and Guyader (2006) as a consequence of the main result by those authors, combined with a theorem in real analysis sketched by D. Preiss (1971) (and elaborated in detail by Assouad and Quentin de Gromard (2006)). We show that it is possible to give a direct proof along the same lines as the original theorem of Charles J. Stone (1977) about the universal consistency of the k-NN classifier in the finite dimensional Euclidean space. The generalization is non-trivial because of the distance ties being more prevalent in the non-Euclidean setting, and on the way we investigate the relevant geometric properties of the metrics and the limitations of the Stone argument, by constructing various examples.

. La r`egle d’apprentissage des k plus proches voisins (sous le bris uniforme d’´egalit´e des distances) est universellment consistente dans chaque espace m´etrique s´eparable de dimension sigma-finie au sens de Nagata. Comme indiqu´e par C´erou et Guyader (2006), le r´esultat fait suite `a une combinaison du th´eor`eme principal de ces auteurs avec un th´eor`eme d’analyse r´eelle esquiss´e par D. Preiss (1971) (et ´elabor´e en d´etail par Assouad et Quentin de Gromard (2006)). Nous montrons qu’il est possible de donner une preuve directe dans le mˆeme esprit que le th´eor`eme original de Charles J. Stone (1977) sur la consistence universelle du classificateur k-NN dans l’espace euclidien de dimension finie. La g´en´eralisation est non-triviale, car l’´egalit´e des distances est plus commune dans le cas non-euclidien, et pendant l’´elaboration de notre preuve, nous ´etudions des propri´et´es g´eom´etriques pertinentes des m´etriques et testons des limites de l’argument de Stone, en construisant quelques exemples.

The k-nearest neighbour classifier, in spite of being arguably the oldest supervised learning algorithm in existence, still retains his importance, both practical and theoretical. In particular, it was the first classification learning rule whose (weak) universal consistency (in the finite-dimensional Euclidean space )) was established, by Charles J. Stone in [20].

Stone’s result is easily extended to all finite-dimensional normed spaces, see, e.g., [6]. However, the k-NN classifier is no longer universally consistent already in the infinite-dimensional Hilbert space . A series of examples of this kind, obtained in the setting of real analysis, belongs to Preiss, and the first of them [17] is so simple that it can be described in a few lines. We will reproduce it in the article, since the example remains virtually unknown in the statistical machine learning community.

There is sufficient empirical evidence to support the view that the performance of the k-NN classifier greatly depends on the chosen metric on the domain (see e.g., [9]). There is a supervised learning algorithm, Large Margin Nearest Neighbour Classifier (LMNN), based on the idea of optimizing the k-NN performance over all Euclidean metrics on a finite dimensional vector space [21]. At the same time, it appears that a theoretical foundation for such an optimization over a set of distances is still lacking. The first question to address in this connection, is of course to characterize those metrics (generating the original Borel structure of the domain) for which the k-NN classifier is (weakly) universally consistent.

While the problem in this generality remains still open, a great advance in this direction was made by C´erou and Guyader in [2]. They have shown that the k-NN classifier is consistent under the assumption that the regression function ) satisfies the weak Lebesgue–Besicovitch differentiation property:

where the convergence is in measure, that is, for each 0,

The probability measure above is the sample distribution law. The proof extended the ideas of the paper [4], in which it was previously observed that Stone’s universal consistency can be deduced from the classical Lebesgue– Besicovitch differentiation theorem: every )-function f on satisfies Eq. (1), even in the strong sense (convergence almost everywhere). See also [8].

Those separable metric spaces in which the weak Lebesgue–Besicovitch differentiation property holds for every Borel probability measure (equivalently, for every sigma-finite locally finite Borel measure) have not yet been characterized. But the complete separable metric spaces in which the strong Lebesgue–Besicovitch differentiation property holds for every such measure as above have been described by Preiss [18]: they are exactly those spaces that are sigma-finite dimensional in the sense of Nagata [13, 16]. (For finite dimensional spaces in the sense of Nagata, the sketch of a proof by Preiss, in the sufficiency direction, was elaborated by Assouad and Quentin de Gromard in [1]. The completeness assumption on the metric space is only essential for the necessity part of the result.) In particular, it follows that every sigma-finite dimensional separable metric space satisfies the weak Lebesgue–Besicovitch differentiation property for every probability measure.

Combining the result of Preiss with that of C´erou–Guyader, one concludes that the k-NN classifier is universally consistent in every sigma-finite dimensional separable metric space, as was noted in [2].

The authors of [2] mention in their paper that The aim of this article is to show that at least Stone’s original proof, including Stone’s geometric lemma as its main tool, can be extended from the Euclidean case to the sigma-finite dimensional metric spaces. In fact, as we will show, the geometry behind Stone’s lemma, even if it appears to be essentially based on the Euclidean structure of the space, is captured by the notion of Nagata dimension, which is a purely metric concept. In this way, Stone’s geometric lemma and indeed the original Stone’s proof of the universal consistency of the k-NN classifier, become applicable to a wide range of metric spaces.

In the absence of distance ties (that is, in case where every sphere is a -negligible set with regard to the underlying measure ), the extension is quite straightforward, indeed almost literal. However, this is not so in the presence of distance ties: an example shows that the conclusion of Stone’s geometric lemma may not hold. Another example shows that even in the compact metric spaces of Nagata dimension zero, the distance ties may be everpresent. We also show that an attempt to reduce the case to the situation without distance ties by learning in the product of Ω with the unit interval (an additional random variable used for tie-breaking) cannot work, because already the product of a zero-dimensional space in the sense of Nagata with the interval (which has dimension one) can have an infinite Nagata dimension. Stone’s geometric lemma has to be modified, to parallel the Hardy–Littlewood inequality in the geometric measure theory.

We do not touch upon the subject of strong universal consistency in general metric spaces. The main open question left is whether every metric space in which the k-NN classifier is universally consistent is necessarily sigma-finite dimensional. A positive answer, modulo the work of [2] and [18], would also answer in the affirmative an open question in real analysis going back to Preiss: suppose a metric space X satisfies the weak Lebesgue– Besicovitch differentiation property for every sigma-finite locally finite Borel measure, will it satisfy the strong Lebesgue–Besicovitch differentiation property for every such measure?

1. Setting for statistical learning

Here we will recall the standard probabilistic model for statistical learning theory. The domain, Ω, means a standard Borel space, that is, a set equipped with a sigma-algebra which coincides with the sigma-algebra of Borel sets generated by a suitable separable complete metric. (Recall that the Borel structure generated by a metric on a set Ω is the smallest sigma-algebra containing all open subsets of the metric space (Ω).) The distribution laws for datapoints, both unlabelled and labelled, are Borel probability measures defined on the corresponding Borel sigma-algebra.

Since we will be dealing with the k-NN classifier, the domain, Ω, will actually be a metric space, which we also assume to be separable.

In statistical terminology, is the regression function. Together with the Borel probability measure on Ω, the regression function allows for an alternative, and often more convenient, description of the joint law ˜. Namely, given Ω,

and

which allows to reconstruct the measure ˜on Ω .

Figure 1. Labeled domain and the projection : Ω Ω.

Let ) denote the collection of all Borel measurable binary functions on the domain, that is, essentially, the family of all Borel subsets of Ω. Given such a function f : Ω (a classifier), the misclassification

error is defined by

It is a simple exercise to verify that the Bayes error is achieved on some classifier (and thus is the minimum), which is called a Bayes classifier. For instance, every classifier satisfying

is a Bayes classifier. The Bayes error is zero if and only if the learning problem is deterministic, that is, the regression function is equal almost everywhere to the indicator function, , of a Ω, a Borel subset of the domain. A learning rule is a family of mappings , where

and the functions satisfy the following measurability assumption: the associated maps

are Borel (or just universally measurable). Here ) is a labelled learning sample.

The data is modelled by a sequence of independent identically distributed random elements () of Ω , following the law ˜. Denote an infinite sample path. In this context, only gets to see the first n labelled coordinates of . A learning rule L is weakly consistent, or simply consistent, if errin probability as . If the convergence occurs almost surely (that is, along almost all sample paths ), then L is said to be strongly consistent. Finally, L is universally (weakly / strongly) consistent if it is weakly / strongly consistent under every Borel probability measure ˜on the standard Borel space Ω .

The learning rule we study is the k-NN classifier, defined by selecting the label by the majority vote among the values of y corresponding to the nearest neighbours of x in the learning sample .

If k is even, then a voting tie may occur. This is of lesser importance, and can be broken in any way. For instance, by always assigning the value 1 in case of a voting tie, or by choosing the value randomly. The consistency results usually do not depend on it. Intuitively, if voting ties keep occurring asymptotically at a point x along a sample path, it means that 2 and so any value of the classifier assigned to x would do.

It may also happen that the smallest closed ball containing k nearest neighbours of a point x contains more than k elements of a sample (distance ties). This situation is more difficult to manage and requires a consistent tie-breaking strategy, whose choice may affect the consistency results.

Given k and , we define ) as the smallest radius of a closed ball around x containing at least k nearest neighbours of x in the sample :

means that there is an injection f : [] such that

A k nearest neighbour map is a function

with the properties

The mapping k-NNcan be deterministic or stochastic, in which case it will depend on an additional random variable, independent of the sample.

An example of the former kind is based on the natural order on the sample, . In this case, from among the points belonging to the sphere of radius ) around x we choose points with the smaller index: k-NN) contains all the points of in the open ball, ), plus a necessary number (at least one) of points of ) having smallest indices.

An example of the second kind is to use a similar procedure, after applying a random permutation of the indices first. A random learning input will consist of a pair (), where is a random n-sample and is a random element of the group of permutations of rank n. An equivalent (and more common) way would be to use a sequence of i.i.d. random elements of the unit interval or the real line, distributed according to the uniform (resp. gaussian) law, and in case of a tie, give a preference to a realization over provided the value is smaller than .

Now, a formal definition of the k-NN learning rule can be given as follows:

Here, is the Heaviside function, the sign of the argument:

The empirical measure is a uniform measure supported on the set of k nearest neighbours of x within the sample , and the label is seen as a function . The expression appearing under the argument,

is the empirical regression function. In the presence of a law of labelled points, it is a random variable, and so we have the following immediate, yet important, observation.

Proposition 1.1. Let () be a learning problem in a separable metric space (Ω, d). If the values of the empirical regression function, , converge to in probability (resp. almost surely) in the region

then the k-NN classifier is consistent (resp. strongly consistent) under ().

We conclude this section by recalling an important technical tool.

Theorem 1.2 (Cover-Hart lemma [3]). Let Ω be a separable metric space, and let be a Borel probability measure on Ω. Almost surely, the function (Eq. (2)) converges to zero uniformly over any precompact subset supp .

Proof. Let A be a countable dense subset of supp . A standard argument shows that, almost surely, for all and each rational 0, the open ball ) contains an infinite number of elements of a sample path. Consequently, the functions : Ω almost surely converge to zero pointwise on A as . Since these functions are easily seen to be 1-Lipschitz and in particular form a uniformly equicontinuous family, we conclude.

2. Example of Preiss

Here we will discuss a 1979 example of Preiss [17]. Preiss’s aim was to prove that the Lebesgue–Besicovitch differentiation property (Eq. (1)) fails in the infinite-dimensional Hilbert space . However, as already suggested in [2], his example can be easily adapted to prove that the k-NN learning rule is not universally consistent in the infinite-dimensional separable Hilbert space either.

Recall the notation [n] = {1, 2, . . . n}. Let () be a sequence of positive natural numbers 2, to be selected later. Denote by

the Cartesian product of finite discrete spaces equipped with the product topology. It is a Cantor space (the unique, up to a homeomorphism, totally disconnected compact metrizable space without isolated points).

Let denote the canonical cordinate projections of Q on the k-dimensional cubes ]. Denotea disjoint union of the cubes , and let ) be a Hilbert space spanned by an orthonormal basis () indexed by elements ¯n of this union.

For every ¯define

The map f is continuous and injective, thus a homeomorphism onto its image. Denote the Haar measure on Q (the product of the uniform measures on all []). Let ) be the direct image of , a compactlysupported Borel probability measure on H. If r > 0 satisfies 2, then for each ¯Q,

Now, for every k and each ¯define in a similar way

Note that the closure of ) contains f(Q) (as a proper subset). Now define a purely atomic measure supported on the image of under f, having the following special form:

The weights 0 are chosen so that the measure is finite:

Assuming in addition that

Clearly, the conditions (4) and (5) can be simultaneously satisfied by a recursive choice of () and ().

Now renormalize the measures and so that is a probability measure, and interpret as the distribution of points labelled i = 0, 1. Thus, the regression function is deterministic, , where we are learning the concept 0.

For a random element , the distance ) to the k-th nearest neighbour within an i.i.d. n-sample goes to zero almost surely when 0, according to a lemma of Cover and Hart, and the convergence is uniform on the precompact support of . It follows that the probability of one of the k nearest neighbours to a random point to be labelled one, conditionally on , converges to zero, uniformly in r. The k-NN learning rule will almost surely predict a sequence of classifiers converging to the identically zero classifier, and so is not consistent.

3. Classical theorem of Charles J. Stone

3.1. The case of continuous regression function

Proposition 1.1 and the Cover–Hart lemma 1.2 together imply that the k-NN classifier is universally consistent in a separable metric space whenever the regression function is continuous. In view of Proposition 1.1, it is enough to make the following observation.

Lemma 3.1. Let (Ω) be a separable metric space equipped with a Borel probability measure, and let be a continuous regression function. Then

in probability, when 0.

Proof. It follows from the Cover–Hart lemma that the set k-NN) of k nearest neighbours of x almost surely converges to x, for almost all supp , and since is continuous, the set of values -NN(x)) almost surely coverges to ) in an obvious sense: for every 0, there exists N such that

where k depends on n. Let 0 and N be fixed, and denote the set of pairs () consisting of a sample path and a point Ω satisfying Eq. (6). Select N with the property . Let denote the sequence of labels for , which is a random variable with the joint law . By the above, whenever (and , if is one of the k nearest neighbours of x in , we have ). According to a version of the Law of Large Numbers with Chernoff’s bounds, the probability of the event

is exponentially small, bounded above by 2 exp(). Thus, when + 2 exp(), and we conclude.

Remark 3.2. In the most general case (with the uniform tie-breaking) we can only infer the almost sure convergence if grows fast enough as a function in n, for otherwise the series 2 exp() may be divergent.

3.2. Stone’s geometric lemma for Rd

In the case of a general Borel regression function , which can be discontinuous -almost everywhere, where , as before, is the sample distribution on a separable metric space, a version of the classical Luzin theorem of real analysis says that for any 0 there is a closed precompact set K of measure upon which is continuous. (See Appendix.) Now we have control over the behaviour of those k-nearest neighbours of a point x that belong to K: the mean value of the regression function taken at those k-nearest neighbours will converge to ). However, we have no control over the behaviour of the values of at the k-nearest neighbours of x that belong to the open set . The problem is therefore to limit the influence of the remaining sample points belonging to U. Intuitively, as the example of Preiss shows, in infinite dimensions the influence of the few points outside of K can become overwhelming, no matter how close the measure of K is to one.

In the Euclidean case, this goal is achieved with the help of Stone’s geometric lemma, which uses the finite-dimensional Euclidean structure of the space in a beautiful way.

Lemma 3.3 (Stone’s geometric lemma for For every natural d, there is an absolute constant C = C(d) with the following property. Let

Figure 2. To the proof of Stone’s geometric lemma (case k = 2).

be a finite sample in ) (possibly with repetitions), and let ) be any. Given , the number of i such that and x is among the k nearest neighbours of inside the sample

is limited by Ck.

Proof. Cover with C = C(d) cones of central angle 3 with vertices at x. Inside each cone mark the maximal possible number of the nearest neighbours of x, that are set-theoretically different from x. (The strategy for possible distance tie-breaking is unimpotant.) In this way, up to Ck points are marked. Let now i be any, such that as a point. If has not been marked, this means the cone containing has k points, different from x, that have been marked. Consider any of the marked points inside the same cone, say y. A simple argument of planimetry, inside an affine plane passing through , and y, shows that

and so the k nearest neighbours of inside the sample in Eq. (8) will all be among the marked points, excluding

Remark 3.4. Note that in the statement of Stone’s geometric lemma neither the order of the sample , nor the tie-breaking strategy are of any importance.

Remark 3.5. If the cones have central angle , then the displayed inequality in the proof of the lemma (Eq. (9)) is no longer strict. This is less convenient in case of distance ties.

3.3. Proof of Stone’s theorem

Theorem 3.6 (Charles J. Stone, 1977). Let 0. Then the k-NN classification rule in the finite-dimensional Euclidean space is universally consistent.

Let us begin with the case where the domain Ω is an arbitrary separable metric space, is a probability measure on Ω, and is a regression function. Let 0 be any. Using a variation of Luzin’s theorem (Theorem A.6 in Appendix), choose a closed precompact set Ω with and continuous. Denote . By the Tietze–Urysohn extension theorem (see e.g., [7], 2.1.8), the function extends to a continuous [0, 1]-valued function on all of Ω.

Denote the empirical regression function (Eq. (3)) corresponding to viewed as a regression function on its own right. We have:

where (and (0 in probability by virtue of Lemma 3.1. It only remains to estimate the term (III).

), let K be a compact subset of , and let . Let be i.i.d. random elements of following the law . We will estimate the expected number of the random elements of an n-sample that (1) belong to U, and (2) are among the k nearest neighbours of a random element X belonging to K. Applying the symmetrization with a transposition of the coordinates , as well as Stone’s geometric lemma 3.3, we obtain:

Getting back to the term (we obtain:

Since 0 is as small as desired, we conclude that ) in probability, and so the k-NN classifier in ) is universally (weakly) consistent in ).

4. Nagata dimension of a metric space

Recall that a family of subsets of a set Ω has multiplicity if the intersection of more than different elements of is always empty. In other words,

Definition 4.1. Let (0]. We say that a metric space (Ω, d) has scale s > 0, if every finite family of closed balls of radii < s admits a subfamily of multiplicity + 1 which covers the centres of all the balls in . The smallest with this property, if it exists, and +otherwise, is called the Nagata dimension of Ω on the scale s > 0. A space Ω has Nagata dimension if it has Nagata dimension on a suitably small scale (0]. Notation: dim, or simply dim.

Sometimes the following reformulation is more convenient.

Proposition 4.2. A metric space (Ω, d) has Nagata dimension on the scale (0] if and only if it satisfies the following property. For every Ω, r < s, and a sequence ¯), there are i, j, , such that max.

): from the family of closed balls +2, all having radii < s, extract a family of +1 balls covering the centres. One of those balls, say with centre at , must contain some with , which means max.

): let be a finite family of closed balls of radii < s. Suppose it has multiplicity + 1. Then there exist a point Ω and + 2 balls in with centres that we denote , all containing x. Denote . Then r < s, and by the hypothesis, there are , with max. Without loss in generality, assume ), that is, belongs to the ball with centre in . Now the ball centred at can be removed from the family , with the remaining family still covering all the centres and having the cardinality 1. After finitely many steps, we arrive at a subfamily of multiplicity + 1 covering all the centres.

Example 4.3. The property of a metric space Ω having Nagata dimension zero on the scale +is equivalent to Ω being a non-archimedian metric space, that is, a metric space satisfying the strong triangle inequality, max{d(x, y), d(y, z)}.

Indeed, dim, contained in a closed ball ), we have max.

Example 4.4. It follows from Proposition 4.2 that dimbe three points contained in a closed ball, that is, an interval []. Without loss in generality, assume . If , then , and if , then .

The following example suggests that the Nagata dimension is relevant for the study of the k-NN classifier, as it captures in an abstract context the geometry behind Stone’s lemma.

Example 4.5. The Nagata dimension of the Euclidean space ) is finite, and it is bounded by 1, where C(d) is the value of the constant in Stone’s geometric lemma (Lemma 3.3).

Indeed, let be points belonging to a ball with centre x. Using the argument in the proof of Stone’s geometric lemma with ) points belonging to the ) cones with apex at x. At least one point, say , has not been marked; it belongs to some cone, which therefore already contains a marked point, say , different from , and .

Example 4.6. A similar argument shows that every finite-dimensional normed space has finite Nagata dimension.

Remark 4.7. In the family of closed balls of radius one centred at the vectors exp(25), k = 1, 2, . . . , 5, has multiplicity 5 and admits no proper subfamily containing all the centres. Therefore, the Nagata dimension of (2) is at least 5. Since the plane can be covered with 6 cones having the central angle 3, Example 4.5 implies that dim

Remark 4.8. The problem of calculating the Nagata dimension of the Euclidean space ) is mentioned as “possibly open” by Nagata [15], p. 9 (where the value dim+2 is called the “crowding number”). Nagata also remarks that dim

Remark 4.9. Notice that the property of the Euclidean space established in the proof of Stone’s geometric lemma is strictly stronger than the finiteness of the Nagata dimension. There exists a finite (in general, higher than the Nagata dimension) such that, given a sequence ¯, there are , such that max. The inequality here is strict, cf. Remark 3.5. This is exactly the property that removes the problem of distance ties in the Euclidean space. However, adopting this as a definition in the general case would be too restrictive, removing from consideration a large class of metric spaces in which the k-NN classifier is still universally consistent, such as all non-archimedean metric spaces.

Example 4.10. Let denote the n-th standard basic vector in the separable Hilbert space , that is, a sequence whose n-th coordinate is 1 and the rest are zeros. The convergent sequence (10, together with the limit 0, viewed as a metric subspace of , has infinite Nagata dimension on every scale s > 0. This is witnessed by the family of closed balls ), having zero as the common point, and having the property that every centre belongs to exactly one ball of the family. Realizing R as a continuous curve in without self-intersections passing through all elements of the sequence as well as the limit leads to an equivalent metric on R having infinite Nagata dimension on each scale.

Remark 4.11. The Nagata–Ostrand theorem [13,16] states that the Lebesgue covering dimension of a metrizable topological space is the smallest Nagata dimension of a compatible metric on the space (and in fact this is true on every scale s > 0, [12]). This is the historical origin of the concept of the metric dimension.

There appears to be no single comprehensive reference to the concept of Nagata dimension. Various results are scattered in the journal papers [1,12,13,15,16,18], see also the book [14], pages 151–154.

Metric spaces of finite Nagata dimension admit an almost literal version of Stone’s geometric lemma in case where the sample has no distance ties, that is, the values of the distances , are all pairwise distinct.

Lemma 4.12 (Stone’s geometric lemma, finite Nagata dimension, no ties)

be a finite sample in Ω, and let Ω be any. Suppose there are no distance ties inside the sample

and k is such that, inside the above sample, for all i. The number of i having the property that and x is among the k nearest neighbours of inside the sample above is limited by (k + 1)(+ 1).

Proof. Suppose that i = 1, 2, . . . , m is such that and has x among the k nearest neighbours inside the sample as in Eq. (10). The family of closed balls , admits a subfamily of multiplicity + 1 covering all the points . Since there are no distance ties, every ball belonging to contains + 1 points. It follows that + 1). All the balls in contain x, and we conclude: + 1. The result follows.

Now the same argument as in the original proof of Stone (Subs. 3.3) shows that the k-NN classifier is consistent under each distritution on Ω with the property that the distance ties occur with zero probability. Since we are going to give a proof of a more general result, we will not repeat the argument here, only mention that due to the Cover–Hart lemma, if n is sufficiently large, then with arbitrarily high probability, the k nearest neighbours of a random point inside a random sample will all lie at a distance < s.

5. Distance ties

In this section we will construct a series of examples to illustrate the difficulties arising in the presence of distance ties in general metric spaces that are absent in the Euclidean case. The fundamental difference between the two situations is the inequality in the equivalent definition of the Nagata dimension (proposition 4.2) that is, unlike in the Euclidean space, no longer strict.

As we have already noted (remark 3.4), the conclusion of Stone’s geometric lemma 3.3 remains valid even if we allow the adversary to break the distance ties and pick the k nearest neighbours. Our first example shows that it is no longer the case in a metric space of finite Nagata dimension.

Example 5.1. Consider a finite set with points, and assume that in the metric space all n + 1 points are pairwise at a distance one from each other. The Nagata dimension of the metric space is equal to of closed balls contains any ball of radius 1, it already covers on its own. Otherwise, we choose one ball of radius < 1 (that is, a singleton) for each centre. The multiplicity of the selected subfamily is 1 in each case.

Now let us discuss the distance ties. For any element of , the remaining n points of are tied between themselves as the possible k nearest neighbours. The adversary may decide to always select x among them, thus invalidating the conclusion of Stone’s geometric lemma.

However, the problem is easily resolved if we break distance ties using a uniform distribution on the nearest neighbour candidates. In this case, the expected number of indices i such that x is chosen as one of the k nearest neighbours of within the sample is obviously k.

the size of a sample inheriting a 0-1 distance will be limited from above by the dimension, d.

The next example shows that Stone’s geometric lemma in finite dimensional metric spaces cannot be saved even with the uniform tie-breaking.

Example 5.3. There exists a countable metric space of Nagata dimension 0, having the following property. Given , for a sufficiently large n the expected number of points within the sample having as the nearest neighbour under the uniform tie-breaking is .

We will construct recurrently. Let . Add at a distance 1 from , and set . If has been already defined, add at a distance 2from all the existing points , and set . It is clear that the distance so defined is a metric.

We will verify by induction in n that dim. For n = 1 this is trivially true. Assume the statement holds for , and let be a family of closed balls in . If one of those balls contains all the points, there is nothing to prove. Assume not, that is, all the balls elements of have radii smaller than 2. Choose a subfamily of multiplicity 1 consisting of balls centred in elements of and covering them all, and add one ball centred in (which is a singleton). Now it follows that dim

Finally, let us show that if n is sufficiently large, then the expected number of indices i such that is the nearest neighbour of under a uniform tie-breaking is as large as desired. With this purpose, for each 2 we will calculate the expectation of the event ), where ) denotes the set of nearest neighbours of in the rest of the finite sample .

For , the unique nearest neighbour within is , therefore , there are two points in at a distance 2 from , which can be chosen each with probability 1/2, namely and , therefore 2. For arbitrary i, in a similar way, . We conclude:

and the sum of the harmonic series converges to +as .

Can it be that the distance ties are in some sense extremely rare? Even this expectation is unfounded.

Example 5.4. Given a value 0 (risk) and a sequence , there exist a compact metric space of Nagata dimension zero (a Cantor space with a suitable compatible metric) equipped with a non-atomic probability measure, and a sequence 0, with the following property. With confidence , for every k, a random element X has distance ties among its k nearest neighbours within a random -sample .

The space Ω, just like in the Preiss example (Sect. 2), is the direct product ] of finite discrete spaces, whose cardinalities 2 will be chosen recursively, and [. The metric is given by the

rule

This metric induces the product topology and is non-archimedian, so the Nagata dimension of Ω is zero (example 4.3). The measure is the product of uniform measures on the spaces []. This measure is non-atomic, and in particular, -almost all distance ties occur at a strictly positive distance from a random element X.

Choose a sequence () with 0 and 2 . Choose so large that, with probability , independent random elements following a uniform distribution on the space [] are pairwise distinct. Now let