35:[["$","audio",null,{"id":"tts"}],["$","$L3a",null,{"paperID":"2002.11599","publisher":"arxiv","paperJSON":{"title":"Minimax Optimal Estimation of KL Divergence for Continuous Distributions","paperID":"2002.11599","avgLineHeight":13.87,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"Estimating Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains. One simple and effective estimator is based on the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"nearest neighbor distances between these samples. In this paper, we analyze the convergence rates of the bias and variance of this estimator. Furthermore, we derive a lower bound of the minimax mean square error and show that kNN method is asymptotically rate optimal.","element":"span"}],[{"style":{"width":"67%"},"width":1348,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/0-0.png","element":"img"}]]},{"heading":"I. INTRODUCTION","paragraphs":[[{"text":"Kullback-Leibler (KL) divergence has a broad range of applications in information theory, statistics and machine learning. For example, KL divergence can be used in hypothesis testing ","element":"span"},{"href":"#id-0","referenceIndex":1,"text":"[1]","element":"a"},{"text":", text classi-fication ","element":"span"},{"href":"#id-1","referenceIndex":2,"text":"[2]","element":"a"},{"text":", outlying sequence detection ","element":"span"},{"href":"#id-2","referenceIndex":3,"text":"[3]","element":"a"},{"text":", multimedia classification ","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"[4]","element":"a"},{"text":", speech recognition ","element":"span"},{"href":"#id-4","referenceIndex":5,"text":"[5]","element":"a"},{"text":", etc. In many applications, we hope to know the value of KL divergence, but the distributions are unknown. Therefore, it is important to estimate KL divergence based only on some identical and independently distributed (i.i.d) samples. Such problem has been widely studied ","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"[6–","element":"a"},{"href":"#id-6","referenceIndex":13,"text":"13]","element":"a"},{"text":". The estimation method is different depending on whether the underlying distribution is discrete or continuous. For discrete distributions, an intuitive method is called plug-in estimator, which first estimates the probability mass function (PMF) by simply counting the number of occurrences at each possible value and then calculates the KL divergence based on the estimated PMF. However, since it is always possible that the number of occurrences at some locations is zero, this method has infinite bias and variance for arbitrarily large sample size. As a result, it is necessary to design some new estimators, such that both the bias and variance converge to zero. Several methods have been proposed in ","element":"span"},{"href":"#id-7","referenceIndex":11,"text":"[11–","element":"a"},{"href":"#id-6","referenceIndex":13,"text":"13]","element":"a"},{"text":". These methods perform well for distributions with fixed alphabet size. Recently, there is a growing interest in designing estimators that are suitable for distributions with growing alphabet size. ","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"[6] ","element":"a"},{"text":"provided an ‘augmented plug-in estimator’, which is a modification of the simple plug-in method. The basic idea of this method is to add a term to both the numerator and the denominator when calculating the ratio of the probability mass. Although this modification will introduce some additional bias, the overall bias is reduced. Moreover, a minimax lower bound has also been derived in ","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"[6]","element":"a"},{"text":", which shows that the augmented plug-in estimator proposed in ","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"[6] ","element":"a"},{"text":"is rate optimal. For continuous distributions, there are also many interesting methods. A simple one is to divide the support into many bins, so that continuous values can be quantized, and then the distribution can be","element":"span"}],[{"text":"Puning Zhao and Lifeng Lai are with Department of Electrical and Computer Engineering, University of California, Davis, CA, 95616. Email: ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"pnzhao,lflai","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"text":"@ucdavis.edu. This work was supported by the National Science Foundation under grants CCF-17-17943, ECCS-17-11468, CNS-18-24553 and CCF-19-08258.","element":"span"}],[{"text":"converted to a discrete one. As a result, the KL divergence can be estimated based on these two discrete distributions. However, compared with other methods, this method is usually inefficient, especially when the distributions have heavy tails, as the probability mass of a bin at the tail of distributions is hard to estimate. An improvement was proposed in ","element":"span"},{"href":"#id-8","referenceIndex":7,"text":"[7]","element":"a"},{"text":", which is based on data dependent partitions on the densities with an appropriate bias correction technique. Comparing with the direct partition method mentioned above, this adaptive one constructs more bins at the regions with higher density, and vice versa, to ensure that the probability mass in each bins are approximately equal. It is shown in ","element":"span"},{"href":"#id-8","referenceIndex":7,"text":"[7] ","element":"a"},{"text":"that this method is strongly consistent. Another estimator was designed in ","element":"span"},{"href":"#id-9","referenceIndex":14,"text":"[14]","element":"a"},{"text":", which uses a kernel based approach to estimate the density ratio. There are also some previous works that focus on a more general problem of estimating ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"-divergence, with KL divergence being a special case. For example, ","element":"span"},{"href":"#id-10","referenceIndex":15,"text":"[15] ","element":"a"},{"text":"constructed an estimator based on a weighted ensemble of plug in estimators, and the parameters need to be tuned properly to get a good bias and variance tradeoff. Another method of estimating ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"-divergence in general was proposed in ","element":"span"},{"href":"#id-11","referenceIndex":10,"text":"[10]","element":"a"},{"text":", under certain structural assumptions.","element":"span"}],[{"text":"Among all the methods for the estimation of KL divergence between two continuous distributions, a simple and effective one is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"nearest neighbor (kNN) method based estimator. kNN method, which was first proposed in ","element":"span"},{"href":"#id-12","referenceIndex":16,"text":"[16]","element":"a"},{"text":", is a powerful tool for nonparametric statistics. Kozachenko and Leonenko ","element":"span"},{"href":"#id-13","referenceIndex":17,"text":"[17] ","element":"a"},{"text":"designed a kNN based method for the estimation of differential entropy, which is convenient to use and does not require too much parameter tuning. Both theoretical analysis and numerical experiments show that this method has desirable accuracy ","element":"span"},{"href":"#id-14","referenceIndex":18,"text":"[18–","element":"a"},{"href":"#id-15","referenceIndex":24,"text":"24]","element":"a"},{"text":". In particular, ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23] ","element":"a"},{"text":"shows that this estimator is nearly minimax rate optimal under some assumptions. The estimation of KL divergence shares some similarity with that of entropy estimation, since KL divergence between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":", which denotes the probability density functions (pdf) of two distributions, is actually the difference of the entropy of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and the cross entropy between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":". As a result, the idea of Kozachenko-Leonenko entropy estimator can be used to construct a kNN based estimator for KL divergence, which was first proposed in ","element":"span"},{"href":"#id-17","referenceIndex":8,"text":"[8]","element":"a"},{"text":". The basic idea of this estimator ","element":"span"},{"href":"#id-17","referenceIndex":8,"text":"[8] ","element":"a"},{"text":"is to obtain an approximate value of the ratio between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"based on the ratio of kNN distances. It has been discussed in ","element":"span"},{"href":"#id-17","referenceIndex":8,"text":"[8] ","element":"a"},{"text":"that, compared with other KL divergence estimators, the kNN based estimator has a much lower sample complexity, and is easier to generalize and implement for high dimensional data. Moreover, it was proved in ","element":"span"},{"href":"#id-17","referenceIndex":8,"text":"[8] ","element":"a"},{"text":"that the kNN based estimator is consistent, which means that both the bias and the variance converge to zero as sample sizes increase. However, the convergence rate remains unknown.","element":"span"}],[{"text":"In this paper, we make the following two contributions. Our first main contribution is the analysis of the convergence rates of bias and variance of the kNN based KL divergence estimator proposed in ","element":"span"},{"href":"#id-17","referenceIndex":8,"text":"[8]","element":"a"},{"text":". For the bias, we discuss two significantly different types of distributions separately. In the first type of distributions analyzed, both ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"have bounded support, and are bounded away from zero. One such example is when both distributions are uniform distributions. This implies that the distribution has boundaries, where the pdf suddenly changes. There are two main sources of estimation bias of kNN method for this case. The first source is the boundary effect, as the kNN method tends to underestimate the pdf values at the region near the boundary. The second source is the local non-uniformity of the pdf. It can be shown that the bias caused by the second source converges fast enough and thus can be negligible. As a result, the boundary bias is the main cause of bias of the kNN based KL divergence estimator for the first type of distributions considered. In the second type of distributions analyzed, we assume that both ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"are continuous everywhere. For example, a pair of two Gaussian distributions with different mean or variance belong to this case. For this type of distributions, the boundary effect does not exist. However, as the density values can be arbitrarily close to zero, we need to consider the bias caused by the tail region, in which ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"or ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is too low and thus kNN distances are too large for us to obtain an accurate estimation of the density ratio ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f/g","element":"span"},{"text":". For the variance of this estimator, we bound the convergence rate under a unified assumption, which holds for both two cases discussed above. The convergence rate of the mean square error can then be obtained based on that of the bias and variance. In this paper, we assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"is fixed. We will show that with fixed ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":", the convergence rate of the mean square error over the sample sizes is already minimax optimal.","element":"span"}],[{"text":"Our second main contribution is to derive a minimax lower bound of the mean square error of KL divergence estimation, which characterizes the theoretical limit of the convergence rates of any methods. For discrete distributions, the minimax lower bound has already been derived in ","element":"span"},{"href":"#id-18","referenceIndex":25,"text":"[25] ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-5","referenceIndex":6,"text":"[6]","element":"a"},{"text":". However, for continuous distributions, the minimax lower bound has not been established. In fact, there exists no estimators that are uniformly consistent for all continuous distributions. For example, let ","element":"span"},{"style":{"height":20.48},"width":1012.18,"height":51.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-0.png","element":"img","alt":" f = �mi=1 pi1((i − 1)/m < x ≤ i/m), in which 1","inline":true,"padRight":true},{"text":"is the indicator function, and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is uniform in ","element":"span"},{"text":"[0","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1]","element":"span"},{"text":". Then the estimation error of KL divergence between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"equals the estimation error of the entropy of ","element":"span"},{"style":{"height":19.6},"width":520.36,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-1.png","element":"img","alt":" p = (p1, . . . , pm). Since m","inline":true,"padRight":true},{"text":"can be arbitrarily large, according to the lower bound derived in ","element":"span"},{"href":"#id-19","referenceIndex":26,"text":"[26]","element":"a"},{"text":", there exists no uniformly consistent estimator. As a result, to find a minimax lower bound, it is necessary to impose some restrictions on the distributions. In this paper, we analyze the minimax lower bound for two cases that match our assumptions for deriving the upper bound, i.e. distributions with bounded support and densities bounded away from zero, and distributions that are smooth everywhere and densities can be arbitrarily close to zero. For each case, we show that the minimax lower bound nearly matches our upper bound using kNN method. This result indicates that the kNN based KL divergence estimator is nearly minimax optimal. To the best of our knowledge, our work is the first attempt to analyze the convergence rate of KL divergence estimator based on kNN method, and prove its minimax optimality.","element":"span"}],[{"text":"The remainder of this paper is organized as follows. In Section ","element":"span"},{"href":"#id-20","text":"II, ","element":"a"},{"text":"we provide the problem statements. In Sections ","element":"span"},{"href":"#id-21","text":"III ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-22","text":"IV, ","element":"a"},{"text":"we characterize the convergence rates of the bias and variance of the kNN based KL divergence estimator respectively. In Section ","element":"span"},{"href":"#id-23","text":"V, ","element":"a"},{"text":"we show the minimax lower bound. We then provide numerical examples in Section ","element":"span"},{"href":"#id-24","text":"VI, ","element":"a"},{"text":"and concluding remarks in Section ","element":"span"},{"href":"#id-25","text":"VII.","element":"a"}]]},{"heading":"II. PROBLEM STATEMENT","paragraphs":[[{"id":"id-20","text":"Consider two pdfs ","element":"span"},{"style":{"height":20.94},"width":941.93,"height":52.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-2.png","element":"img","alt":" f, g : Rd → R where f(x) > 0 only if g(x) > 0","inline":true},{"text":". The KL divergence between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is defined as","element":"span"}],[{"id":"id-29","style":{"width":"66%"},"width":1329,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"are unknown. However, we are given a set of samples ","element":"span"},{"style":{"height":19.2},"width":285.25,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-4.png","element":"img","alt":" {X1, . . . , XN}","inline":true,"padRight":true},{"text":"drawn i.i.d from pdf ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":", and another set of samples ","element":"span"},{"style":{"height":19.2},"width":290.9,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-5.png","element":"img","alt":" {Y1, . . . , YM}","inline":true,"padRight":true},{"text":"drawn i.i.d from pdf ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":". The goal is to estimate ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":"||","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":") ","element":"span"},{"text":"based on these samples.","element":"span"}],[{"href":"#id-17","referenceIndex":8,"text":"[8] ","element":"a"},{"text":"proposed a kNN based estimator:","element":"span"}],[{"id":"id-28","style":{"width":"69%"},"width":1384,"height":138,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-6.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":11.27},"width":30.91,"height":28.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-7.png","element":"img","alt":" ϵi","inline":true,"padRight":true},{"text":"is the distance between ","element":"span"},{"style":{"height":16.47},"width":234.89,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-8.png","element":"img","alt":" Xi and its k","inline":true},{"text":"-th nearest neighbor in ","element":"span"},{"style":{"height":19.2},"width":617.88,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-9.png","element":"img","alt":" {X1, . . . , Xi−1, Xi+1, . . . , XN},","inline":true,"padRight":true},{"text":"while ","element":"span"},{"style":{"height":11.67},"width":35.13,"height":29.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-10.png","element":"img","alt":" νi","inline":true,"padRight":true},{"text":"is the distance between ","element":"span"},{"style":{"height":16.47},"width":233.35,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-11.png","element":"img","alt":" Xi and its k","inline":true},{"text":"-th nearest neighbor in ","element":"span"},{"style":{"height":19.2},"width":341.08,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-12.png","element":"img","alt":" {Y1, . . . , YM}, d","inline":true,"padRight":true},{"text":"is the dimension. The distance between any two points ","element":"span"},{"style":{"fontWeight":"bold"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"style":{"fontWeight":"bold"},"text":"v ","element":"span"},{"text":"is defined as ","element":"span"},{"style":{"height":19.2},"width":452.76,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/2-13.png","element":"img","alt":" ∥u − v∥, in which ∥·∥","inline":true,"padRight":true},{"text":"can be an arbitrary norm. The basic idea of this estimator is using kNN method to estimate the density ratio. An estimation of ","element":"span"},{"style":{"height":17.6},"width":199.37,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-0.png","element":"img","alt":" f at Xi is","inline":true}],[{"id":"id-26","style":{"width":"66%"},"width":1331,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-1.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":") ","element":"span"},{"text":"is the volume of set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":". ","element":"span"},{"href":"#id-26","text":"(3) ","element":"a"},{"text":"can be understood as follows. Apart from ","element":"span"},{"style":{"height":16.07},"width":52.62,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-2.png","element":"img","alt":" Xi","inline":true},{"text":", there are another ","element":"span"},{"style":{"height":13.2},"width":131.7,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-3.png","element":"img","alt":" N − 1","inline":true,"padRight":true},{"text":"samples from ","element":"span"},{"style":{"height":16.8},"width":235.06,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-4.png","element":"img","alt":" X1, . . . , XN","inline":true},{"text":", among which ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"points fall in ","element":"span"},{"style":{"height":19.6},"width":255.88,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-5.png","element":"img","alt":" V (B(Xi, ϵi))","inline":true},{"text":". Therefore, ","element":"span"},{"style":{"height":19.6},"width":213.34,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-6.png","element":"img","alt":"k/(N − 1)","inline":true,"padRight":true},{"text":"is an estimate of ","element":"span"},{"style":{"height":19.67},"width":542.63,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-7.png","element":"img","alt":" Pf(B(Xi, ϵi)), in which Pf","inline":true,"padRight":true},{"text":"is the probability mass with respect to the distribution with pdf ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":". As the distribution is continuous, we have ","element":"span"},{"style":{"height":19.67},"width":720.94,"height":49.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-8.png","element":"img","alt":" Pf(B(Xi, ϵi)) ≈ f(Xi)V (B(Xi, ϵi)).","inline":true,"padRight":true},{"text":"We can then use ","element":"span"},{"href":"#id-26","text":"(3) ","element":"a"},{"text":"to estimate ","element":"span"},{"style":{"height":23.85},"width":118.55,"height":59.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-9.png","element":"img","alt":"ˆf(Xi)","inline":true},{"text":". Similarly, as there are ","element":"span"},{"style":{"height":17.6},"width":473.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-10.png","element":"img","alt":" M samples Y1, . . . , YM","inline":true,"padRight":true},{"text":"generated from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":", we can obtain an estimate ","element":"span"},{"style":{"height":18},"width":88.78,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-11.png","element":"img","alt":" ˆg by","inline":true}],[{"id":"id-27","style":{"width":"64%"},"width":1296,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-12.png","element":"img"}],[{"text":"As","element":"span"}],[{"style":{"width":"75%"},"width":1503,"height":139,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-13.png","element":"img"}],[{"text":"by replacing ","element":"span"},{"style":{"height":19.6},"width":257.43,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-14.png","element":"img","alt":" f(Xi), g(Xi)","inline":true,"padRight":true},{"text":"with ","element":"span"},{"href":"#id-26","text":"(3) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-27","text":"(4) ","element":"a"},{"text":"respectively, we can get the expression of the KL divergence estimator in ","element":"span"},{"href":"#id-28","text":"(2)","element":"a"},{"text":".","element":"span"}],[{"href":"#id-17","referenceIndex":8,"text":"[8] ","element":"a"},{"text":"has proved that this estimator is consistent, but the convergence rate remains unknown. In this paper, we analyze the convergence rates of the bias and variance of this estimator, and derive the minimax lower bound.","element":"span"}]]},{"heading":"III. BIAS ANALYSIS","paragraphs":[[{"id":"id-21","text":"In this section, we derive convergence rate of the bias of the estimator ","element":"span"},{"href":"#id-28","text":"(2)","element":"a"},{"text":". We will consider two different cases depending on whether the support is bounded or not, as they have different sources of biases.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"A. The Case with Bounded Support","element":"span"}],[{"text":"We first discuss the case in which the distributions have bounded support and the densities are bounded away from zero. The main source of bias of this case is boundary effects. The analysis is based on the following assumptions:","element":"span"}],[{"id":"id-31","style":{"fontWeight":"bold"},"text":"Assumption 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Assume the following conditions:","element":"span"}],[{"style":{"width":"97%"},"width":1942,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-15.png","element":"img"}],[{"style":{"height":18.87},"width":358.46,"height":47.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-16.png","element":"img","alt":"Ug for all x ∈ Sg;","inline":true}],[{"style":{"width":"97%"},"width":1942,"height":176,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-17.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"aV ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":", r","element":"span"},{"text":"))","element":"span"},{"style":{"fontStyle":"italic"},"text":", and for all ","element":"span"},{"style":{"height":19.67},"width":1044.56,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-18.png","element":"img","alt":" x ∈ Sg, V (B(x, r) ∩ Sg) ≥ aV (B(x, r)), in which V","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denotes the volume of a set;","element":"span"}],[{"style":{"width":"51%"},"width":1031,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/3-19.png","element":"img"}],[{"text":"Assumption (a) is necessary to ensure that the definition of KL divergence in ","element":"span"},{"href":"#id-29","text":"(1) ","element":"a"},{"text":"is valid. (b) bounds both the lower and upper bound of the pdf value. (c) restricts the surface area of the supports of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":". Since the kNN divergence estimator tends to cause significant bias at the region near to the boundary, the estimation bias for distributions with irregular supports with large surface area are usually large. (d) requires the boundedness of the support. The case with unbounded support will be considered in Section ","element":"span"},{"href":"#id-30","text":"III-B. ","element":"a"},{"text":"(e) ensures that the angles at the corners of the support sets have a lower bound, so that there will not be significant bias at the corner region. (f) ensures the smoothness of distribution in the support set. Note that ","element":"span"},{"href":"#id-26","text":"(3) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-27","text":"(4) ","element":"a"},{"text":"actually estimate the average density ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"over the ball ","element":"span"},{"style":{"height":19.6},"width":181.76,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-0.png","element":"img","alt":" B(Xi, ϵi)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":19.6},"width":488.79,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-1.png","element":"img","alt":" B(Xi, νi). If the f and g","inline":true,"padRight":true},{"text":"are smooth, then the average values will not deviate too much from the pdf value at the center of the balls, i.e. ","element":"span"},{"style":{"height":19.6},"width":347.99,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-2.png","element":"img","alt":" f(Xi) and g(Xi).","inline":true}],[{"text":"Based on the above assumptions, we have the following theorem regarding the bias of estimator ","element":"span"},{"href":"#id-28","text":"(2)","element":"a"},{"text":".","element":"span"}],[{"id":"id-38","style":{"fontWeight":"bold"},"text":"Theorem 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under Assumption ","element":"span"},{"href":"#id-31","style":{"fontStyle":"italic"},"text":"1, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"the convergence rate of the bias of kNN based KL divergence estimator is bounded by:","element":"span"}],[{"id":"id-35","style":{"width":"78%"},"width":1557,"height":146,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"(Outline) Considering that","element":"span"}],[{"style":{"width":"70%"},"width":1416,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-4.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"denotes the differential entropy, we decompose the KL divergence estimator to an estimator of the differential entropy of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":", as well as an estimator of the cross entropy between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":". We then bound the bias of these two estimators. In particular, we can write","element":"span"}],[{"id":"id-39","style":{"width":"71%"},"width":1423,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-5.png","element":"img"}],[{"text":"with","element":"span"}],[{"style":{"width":"79%"},"width":1588,"height":188,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-6.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":17.6},"width":31,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-7.png","element":"img","alt":" ψ","inline":true,"padRight":true},{"text":"is the digamma function, ","element":"span"},{"style":{"height":19.6},"width":586.06,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-8.png","element":"img","alt":" ψ(u) = d(ln Γ(u))/du, with Γ","inline":true,"padRight":true},{"text":"being the Gamma function. Due to the property of Gamma distribution, we know that ","element":"span"},{"style":{"height":19.6},"width":1027.33,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-9.png","element":"img","alt":" | ln M−ψ(M+1)| ≤ 1/M, and | ln(N−1)−ψ(N)| ≤","inline":true},{"style":{"height":19.2},"width":291.93,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-10.png","element":"img","alt":"1/N. Hence I3","inline":true,"padRight":true},{"text":"decays sufficiently fast and can be negligible for large sample sizes ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":".","element":"span"}],[{"style":{"height":16.07},"width":37.65,"height":40.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-11.png","element":"img","alt":"I1","inline":true,"padRight":true},{"text":"has the same form as the bias of Kozachenko-Leonenko entropy estimator ","element":"span"},{"href":"#id-13","referenceIndex":17,"text":"[17]","element":"a"},{"text":", which has been analyzed in many previous literatures ","element":"span"},{"href":"#id-32","referenceIndex":19,"text":"[19, ","element":"a"},{"href":"#id-33","referenceIndex":21,"text":"21–","element":"a"},{"href":"#id-16","referenceIndex":23,"text":"23, ","element":"a"},{"href":"#id-34","referenceIndex":27,"text":"27]","element":"a"},{"text":". With some modifications, the proofs related to the entropy estimator can also be used to bound ","element":"span"},{"style":{"height":16.07},"width":37.65,"height":40.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-12.png","element":"img","alt":" I2","inline":true},{"text":", which is actually the bias of a cross entropy estimator. However, as the assumptions are different from the assumptions made in previous literatures, we need to derive ","element":"span"},{"href":"#id-35","text":"(6) ","element":"a"},{"text":"in a different way.","element":"span"}],[{"text":"In our proof, for both the entropy estimator and the cross entropy estimator, we divide the support into two parts, the central region and the boundary region. In the central region, ","element":"span"},{"style":{"height":19.6},"width":142.47,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-13.png","element":"img","alt":" B(x, ϵ)","inline":true,"padRight":true},{"text":"will be within ","element":"span"},{"style":{"height":19.67},"width":293.84,"height":49.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-14.png","element":"img","alt":"Sf and B(x, ν)","inline":true,"padRight":true},{"text":"will be within ","element":"span"},{"style":{"height":18.87},"width":44.8,"height":47.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-15.png","element":"img","alt":" Sg","inline":true,"padRight":true},{"text":"with high probability. Since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"are smooth, the expected estimate ","element":"span"},{"style":{"height":22.65},"width":162.56,"height":56.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-16.png","element":"img","alt":"ˆf and ˆg","inline":true,"padRight":true},{"text":"are very close to the truth, and thus will not cause significant bias. The main bias comes from the boundary region, in which the density estimator ","element":"span"},{"style":{"height":22.65},"width":150.46,"height":56.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-17.png","element":"img","alt":"ˆf and ˆg","inline":true,"padRight":true},{"text":"are no longer accurate, as ","element":"span"},{"style":{"height":19.6},"width":197.87,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/4-18.png","element":"img","alt":" B(x, ϵ) or","inline":true}],[{"style":{"height":19.6},"width":149.72,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-0.png","element":"img","alt":"B(x, ν)","inline":true,"padRight":true},{"text":"exceeds the supports ","element":"span"},{"style":{"height":18.87},"width":197.55,"height":47.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-1.png","element":"img","alt":" Sf and Sg","inline":true},{"text":". We bound the boundary bias by letting the boundary region to shrink with a proper speed. The detailed proof is shown in Appendix ","element":"span"},{"href":"#id-36","text":"A.","element":"a"}],[{"id":"id-30","style":{"fontStyle":"italic"},"text":"B. The Case with Smooth Distributions","element":"span"}],[{"text":"We now consider the second case where the density is smooth everywhere and the density can be arbitrarily close to zero. For this case, the main source of bias is tail effects. We make the following assumptions:","element":"span"}],[{"id":"id-37","style":{"width":"100%"},"width":2000,"height":323,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-2.png","element":"img"}],[{"text":"Assumption (a) ensures that the definition of KL divergence in ","element":"span"},{"href":"#id-29","text":"(1) ","element":"a"},{"text":"is valid. (b) is the tail assumption. A lower ","element":"span"},{"style":{"height":13.2},"width":26,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-3.png","element":"img","alt":" γ","inline":true,"padRight":true},{"text":"indicates a stronger tail, and thus the convergence of bias of the KL divergence estimator will be slower. For example, ","element":"span"},{"style":{"height":17.2},"width":123.15,"height":43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-4.png","element":"img","alt":" γ = 1","inline":true,"padRight":true},{"text":"for Gaussian distribution and ","element":"span"},{"style":{"height":19.2},"width":169.95,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-5.png","element":"img","alt":" γ = 1/2","inline":true,"padRight":true},{"text":"for Cauchy distribution. (c) is the smoothness assumption. (d) is an additional tail assumption, which is actually very weak and holds for almost all of the common distributions, since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s ","element":"span"},{"text":"can be arbitrarily small. However, this assumption is important since it prevents very large ","element":"span"},{"style":{"height":14},"width":141.2,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-6.png","element":"img","alt":" ϵ and ν","inline":true},{"text":". Based on the above assumptions, we have the following theorem regarding the bias of estimator ","element":"span"},{"href":"#id-28","text":"(2)","element":"a"},{"text":".","element":"span"}],[{"id":"id-61","style":{"fontWeight":"bold"},"text":"Theorem 2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under Assumption ","element":"span"},{"href":"#id-37","style":{"fontStyle":"italic"},"text":"2, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"the convergence rate of the bias of kNN based KL divergence estimator is bounded by:","element":"span"}],[{"style":{"width":"84%"},"width":1681,"height":88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-7.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"(Outline) Similar to the proof of Theorem ","element":"span"},{"href":"#id-38","text":"1, ","element":"a"},{"text":"we still decompose the KL divergence estimator to two estimators that estimate the entropy of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and the cross entropy between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":", separately. In particular, we can still decompose the bias using ","element":"span"},{"href":"#id-39","text":"(8)","element":"a"},{"text":". For simplicity, we only provide the convergence bound of ","element":"span"},{"style":{"height":16.07},"width":37.65,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-8.png","element":"img","alt":" I2","inline":true},{"text":", which is the error of the cross entropy estimator. The bound of the entropy estimator holds similarly.","element":"span"}],[{"text":"For the cross entropy estimator, we divide the support into two parts, including a central region ","element":"span"},{"style":{"height":16.47},"width":59.72,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-9.png","element":"img","alt":" S1,","inline":true,"padRight":true},{"text":"in which ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"or ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is relatively high, and a tail region ","element":"span"},{"style":{"height":17.6},"width":392.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-10.png","element":"img","alt":" S2, in which f or g","inline":true,"padRight":true},{"text":"is relatively low. According to the results of order statistics ","element":"span"},{"href":"#id-34","referenceIndex":27,"text":"[27, ","element":"a"},{"href":"#id-40","referenceIndex":28,"text":"28]","element":"a"},{"text":", ","element":"span"},{"style":{"height":19.67},"width":1195.04,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-11.png","element":"img","alt":" E[ln Pg(B(x, ν))] = ψ(k) − ψ(M + 1), in which Pg(S) is the","inline":true,"padRight":true},{"text":"probability mass of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"with respect to the distribution with pdf ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":". Therefore, ","element":"span"},{"style":{"height":16.07},"width":37.65,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-12.png","element":"img","alt":" I2","inline":true,"padRight":true},{"text":"can be bounded by","element":"span"}],[{"id":"id-41","style":{"width":"84%"},"width":1692,"height":138,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-13.png","element":"img"}],[{"text":"We bound two terms in ","element":"span"},{"href":"#id-41","text":"(11) ","element":"a"},{"text":"separately. To derive the bound of bias in ","element":"span"},{"style":{"height":16.47},"width":45.8,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-14.png","element":"img","alt":" S1","inline":true},{"text":", we find a high probability upper bound of ","element":"span"},{"style":{"height":11.67},"width":35.13,"height":29.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-15.png","element":"img","alt":" νi","inline":true},{"text":", denoted as ","element":"span"},{"style":{"height":13.2},"width":24,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-16.png","element":"img","alt":" ρ","inline":true},{"text":". The bound of bias can be obtained by bounding the local non-uniformity of ","element":"span"},{"style":{"height":19.6},"width":435.59,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-17.png","element":"img","alt":" g in B(νi, ρ) if νi ≤ ρ","inline":true},{"text":". On the contrary, if ","element":"span"},{"style":{"height":14.4},"width":123.68,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-18.png","element":"img","alt":" νi > ρ","inline":true},{"text":", we use assumption (d) to ensure that ","element":"span"},{"style":{"height":11.67},"width":35.14,"height":29.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-19.png","element":"img","alt":"νi","inline":true,"padRight":true},{"text":"will not be too large, and thus will not cause significant estimation error. We let ","element":"span"},{"style":{"height":13.2},"width":24,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/5-20.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"text":"to decay with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"at a proper speed, to maximize the overall convergence rate of the bias.","element":"span"}],[{"text":"To bound the bias in ","element":"span"},{"style":{"height":16.47},"width":45.8,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-0.png","element":"img","alt":" S2","inline":true},{"text":", we let the threshold between ","element":"span"},{"style":{"height":16.47},"width":195.9,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-1.png","element":"img","alt":" S1 and S2","inline":true,"padRight":true},{"text":"to decay with sample size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":", so that the probability mass of ","element":"span"},{"style":{"height":16.47},"width":45.8,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-2.png","element":"img","alt":" S2","inline":true,"padRight":true},{"text":"also decreases with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":". We then combine the bound of ","element":"span"},{"style":{"height":16.47},"width":291.51,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-3.png","element":"img","alt":" S1 and S2, and","inline":true,"padRight":true},{"text":"adjust the rate of the decay of the threshold between ","element":"span"},{"style":{"height":18},"width":385.63,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-4.png","element":"img","alt":" S1 and S2 properly.","inline":true}],[{"text":"The detailed proof can be found in Appendix ","element":"span"},{"href":"#id-42","text":"B.","element":"a"}]]},{"heading":"IV. VARIANCE ANALYSIS","paragraphs":[[{"id":"id-22","text":"We now discuss the variance of this divergence estimator, based on the following unifying assump- ","element":"span"},{"text":"tions.","element":"span"}],[{"id":"id-43","style":{"fontWeight":"bold"},"text":"Assumption 3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Assume that the following conditions hold:","element":"span"}],[{"id":"id-44","style":{"width":"97%"},"width":1944,"height":666,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-5.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"in which","element":"span"}],[{"id":"id-90","style":{"width":"99%"},"width":1991,"height":257,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-6.png","element":"img"}],[{"text":"Assumption ","element":"span"},{"href":"#id-43","text":"3 ","element":"a"},{"text":"(a)-(c) are satisfied if either Assumption ","element":"span"},{"href":"#id-31","text":"1 ","element":"a"},{"text":"or Assumption ","element":"span"},{"href":"#id-37","text":"2 ","element":"a"},{"text":"is satisfied. (a) only requires that the pdf is continuous almost everywhere, and thus holds not only for distributions that are smooth everywhere, but also for distributions that have boundaries. (b) is obviously satisfied under Assumption ","element":"span"},{"href":"#id-31","text":"1, ","element":"a"},{"text":"since it requires that the densities are both upper and lower bounded. From Assumption ","element":"span"},{"href":"#id-37","text":"2, ","element":"a"},{"text":"it is also straightforward to show that ","element":"span"},{"style":{"height":22.83},"width":1143.04,"height":57.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-7.png","element":"img","alt":"�f(x) ln2 f(x)dx < ∞ and �f(x) ln2 g(x) < ∞. This","inline":true,"padRight":true},{"text":"property combining with the smoothness condition (Assumption ","element":"span"},{"href":"#id-37","text":"2 ","element":"a"},{"text":"(c)) imply that ","element":"span"},{"href":"#id-44","text":"(15) ","element":"a"},{"text":"holds for sufficiently small ","element":"span"},{"style":{"height":11.67},"width":38.1,"height":29.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-8.png","element":"img","alt":" r0","inline":true},{"text":". (c) is the same as Assumption ","element":"span"},{"href":"#id-37","text":"2 ","element":"a"},{"text":"(d) and weaker than Assumption ","element":"span"},{"href":"#id-31","text":"1 ","element":"a"},{"text":"(d). Therefore, (a)-(c) are weaker than both previous assumptions on the analysis of bias. (d) is a new assumption which restricts the density ratio. This is important since if the density ratio can be too large, which means that there exists a region on which there are too many samples from ","element":"span"},{"style":{"height":19.2},"width":495.9,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-9.png","element":"img","alt":" {X1, . . . , XN}, but much","inline":true,"padRight":true},{"text":"fewer samples from ","element":"span"},{"style":{"height":19.2},"width":455.25,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-10.png","element":"img","alt":" {Y1, . . . , YM}, then νi","inline":true,"padRight":true},{"text":"will be large and unstable for too many ","element":"span"},{"style":{"height":19.2},"width":307.86,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/6-11.png","element":"img","alt":" i ∈ {1, . . . , N}.","inline":true,"padRight":true},{"text":"Therefore we use assumption (d) to bound the density ratio.","element":"span"}],[{"text":"Under these assumptions, the variance of the divergence estimator can be bounded using the following theorem.","element":"span"}],[{"id":"id-62","style":{"fontWeight":"bold"},"text":"Theorem 3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under Assumption ","element":"span"},{"href":"#id-43","style":{"fontStyle":"italic"},"text":"3, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"if ","element":"span"},{"style":{"height":19.2},"width":343.26,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-0.png","element":"img","alt":" N ln M/M → ∞","inline":true},{"style":{"fontStyle":"italic"},"text":", then the convergence rate of the variance of estimator ","element":"span"},{"href":"#id-28","text":"(2) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"can be bounded by:","element":"span"}],[{"style":{"width":"75%"},"width":1508,"height":120,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-1.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"(Outline) From ","element":"span"},{"href":"#id-28","text":"(2)","element":"a"},{"text":", we have","element":"span"}],[{"id":"id-45","style":{"width":"80%"},"width":1607,"height":305,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-2.png","element":"img"}],[{"text":"Our proof uses some techniques from ","element":"span"},{"href":"#id-34","referenceIndex":27,"text":"[27]","element":"a"},{"text":", which proved the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/N","element":"span"},{"text":") ","element":"span"},{"text":"convergence of variance of Kozachenko-Leonenko entropy estimator with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"= 1 ","element":"span"},{"text":"for one dimensional distributions, and ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23]","element":"a"},{"text":", which generalizes the result to arbitrary fixed dimension and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":", without restrictions on the boundedness of the support. The basic idea is that if one sample is replaced by another i.i.d sample, then it can be shown that the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":"-NN distance will change only for a tiny fraction of the samples.","element":"span"}],[{"text":"The first term in ","element":"span"},{"href":"#id-45","text":"(18) ","element":"a"},{"text":"is just the variance of Kozachenko-Leonenko entropy estimator. Therefore we can use similar proof procedure as was already used in the proof of Theorem 2 in ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23]","element":"a"},{"text":". ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23] ","element":"a"},{"text":"analyzed a truncated Kozachenko-Leonenko entropy estimator, which means that ","element":"span"},{"style":{"height":11.27},"width":30.91,"height":28.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-3.png","element":"img","alt":" ϵi","inline":true,"padRight":true},{"text":"is truncated by an upper bound ","element":"span"},{"style":{"height":11.67},"width":54.58,"height":29.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-4.png","element":"img","alt":" aN","inline":true},{"text":". We prove the same convergence bound for the estimator without truncation.","element":"span"}],[{"text":"For the second term in ","element":"span"},{"href":"#id-45","text":"(18)","element":"a"},{"text":", the analysis becomes much harder, since the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":"-NN distance may change for much more samples from ","element":"span"},{"style":{"height":19.2},"width":285.25,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-5.png","element":"img","alt":" {X1, . . . , XN}","inline":true},{"text":", instead of only a tiny fraction of samples. For this term, we design a new method to obtain the high probability bound of the deviation of ","element":"span"},{"style":{"height":23.68},"width":329.4,"height":59.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-6.png","element":"img","alt":" (d/N) �Ni=1 ln νi","inline":true,"padRight":true},{"text":"from its mean. The basic idea of our new methods can be briefly stated as following: Define two sets ","element":"span"},{"style":{"height":18.33},"width":467.48,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-7.png","element":"img","alt":" S1 and S′1, in which S1 ","inline":true,"padRight":true},{"text":"is a subset of ","element":"span"},{"style":{"height":15.74},"width":52.54,"height":39.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-8.png","element":"img","alt":" Rd","inline":true,"padRight":true},{"text":"such that for any ","element":"span"},{"style":{"height":16.47},"width":227.9,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-9.png","element":"img","alt":" x ∈ S1, Y1 ","inline":true,"padRight":true},{"text":"is among the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"nearest ","element":"span"},{"text":"neighbors of ","element":"span"},{"style":{"height":19.2},"width":398,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-10.png","element":"img","alt":" x in {Y1, . . . , YM}","inline":true},{"text":". Similarly, define ","element":"span"},{"style":{"height":18.33},"width":47.58,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-11.png","element":"img","alt":" S′1 ","inline":true,"padRight":true},{"text":"to be a set such that for all ","element":"span"},{"style":{"height":18.33},"width":294.78,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-12.png","element":"img","alt":" x ∈ S′1, Y′1 is","inline":true,"padRight":true},{"text":"among the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"nearest neighbors of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":". If we replace ","element":"span"},{"style":{"height":18.33},"width":233.27,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-13.png","element":"img","alt":" Y1 with Y′1","inline":true},{"text":", the kNN distance of ","element":"span"},{"style":{"height":16.8},"width":324.37,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-14.png","element":"img","alt":" Xi, i = 1, . . . , N","inline":true,"padRight":true},{"text":"will only change if ","element":"span"},{"style":{"height":18.33},"width":402.54,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-15.png","element":"img","alt":" Xi ∈ S1 or Xi ∈ S′1","inline":true},{"text":". With this observation, we give a high probability bound of ","element":"span"},{"text":"the number of samples from ","element":"span"},{"style":{"height":19.2},"width":285.25,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-16.png","element":"img","alt":" {X1, . . . , XN}","inline":true,"padRight":true},{"text":"that are in ","element":"span"},{"style":{"height":18.33},"width":204.2,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-17.png","element":"img","alt":" S1 and S′1 ","inline":true,"padRight":true},{"text":"respectively, and then bound the ","element":"span"},{"text":"maximum difference of the estimated result caused by replacing ","element":"span"},{"style":{"height":18.33},"width":240.26,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-18.png","element":"img","alt":" Y1 with Y′1","inline":true},{"text":". Based on this bound, ","element":"span"},{"text":"we can then bound the second term in ","element":"span"},{"href":"#id-45","text":"(18) ","element":"a"},{"text":"using Efron-Stein inequality.","element":"span"}],[{"text":"The detailed proof can be found in Appendix ","element":"span"},{"href":"#id-46","text":"C.","element":"a"}],[{"text":"In the analysis above, we have derived the convergence rate of bias and variance. With these results, we can then bound the mean square error of kNN based KL divergence estimator. For distributions that satisfy Assumptions ","element":"span"},{"href":"#id-31","text":"1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-43","text":"3, ","element":"a"},{"text":"the mean square error can be bounded by","element":"span"}],[{"id":"id-55","style":{"width":"98%"},"width":1966,"height":87,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-19.png","element":"img"}],[{"text":"For distributions that satisfy Assumptions ","element":"span"},{"href":"#id-37","text":"2 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-43","text":"3, ","element":"a"},{"text":"the corresponding bound is","element":"span"}],[{"id":"id-57","style":{"width":"100%"},"width":1996,"height":87,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/7-20.png","element":"img"}]]},{"heading":"V. MINIMAX ANALYSIS","paragraphs":[[{"id":"id-23","text":"In this section, we derive the minimax lower bound of the mean square error of KL divergence ","element":"span"},{"text":"estimation, which holds for all methods (not necessarily kNN based) that do not have the knowledge of the distributions ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":". The minimax analysis also considers two cases, i.e. the distributions whose densities are bounded away from zero, and those who has approaching zero densities. For the first case, the following theorem holds.","element":"span"}],[{"id":"id-48","style":{"fontWeight":"bold"},"text":"Theorem 4. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Define ","element":"span"},{"style":{"height":16.47},"width":46.96,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-0.png","element":"img","alt":" Sa","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"as set of pairs ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"f, g","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"that satisfies Assumptions ","element":"span"},{"href":"#id-31","style":{"fontStyle":"italic"},"text":"1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-43","style":{"fontStyle":"italic"},"text":"3, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and","element":"span"}],[{"style":{"width":"76%"},"width":1532,"height":95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-1.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"in which ","element":"span"},{"style":{"height":23.64},"width":187.55,"height":59.09,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-2.png","element":"img","alt":"ˆD(N, M)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the estimation of KL divergence using ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N ","element":"span"},{"style":{"fontStyle":"italic"},"text":"samples drawn from distribution with pdf ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"style":{"fontStyle":"italic"},"text":"samples from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"style":{"fontStyle":"italic"},"text":". Then for sufficiently large ","element":"span"},{"style":{"height":18.47},"width":300.42,"height":46.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-3.png","element":"img","alt":" Uf, Ug, Hf, Hg","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and sufficiently small ","element":"span"},{"style":{"height":18.87},"width":142.3,"height":47.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-4.png","element":"img","alt":" Lf and","inline":true}],[{"id":"id-51","style":{"width":"99%"},"width":1991,"height":320,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-5.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"(Outline) The minimax lower bound of functional estimation can be bounded using Le Cam’s method ","element":"span"},{"href":"#id-47","referenceIndex":29,"text":"[29]","element":"a"},{"text":". For the proof of Theorem ","element":"span"},{"href":"#id-48","text":"4, ","element":"a"},{"text":"we use some techniques from ","element":"span"},{"href":"#id-19","referenceIndex":26,"text":"[26]","element":"a"},{"text":", which derived the minimax bound of entropy estimation for discrete distributions. The main idea is to construct a subset of distributions that satisfy Assumptions ","element":"span"},{"href":"#id-31","text":"1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-43","text":"3, ","element":"a"},{"text":"and then conduct Poisson sampling. These operations can help us calculate the distance between two distributions in a more convenient way, which is important for using Le Cam’s method. Details of the proof can be found in Appendix ","element":"span"},{"href":"#id-49","text":"D.","element":"a"}],[{"id":"id-54","style":{"width":"97%"},"width":1944,"height":191,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-6.png","element":"img"}],[{"text":"for arbitrarily small ","element":"span"},{"style":{"height":14.4},"width":120.93,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-7.png","element":"img","alt":" δ > 0.","inline":true}],[{"text":"We remark that in Theorem ","element":"span"},{"href":"#id-48","text":"4, ","element":"a"},{"text":"the support set ","element":"span"},{"style":{"height":18.87},"width":506.58,"height":47.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-8.png","element":"img","alt":" Sf and Sg of pdfs f and g","inline":true,"padRight":true},{"text":"are unknown. If we assume that ","element":"span"},{"style":{"height":18.87},"width":195.7,"height":47.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-9.png","element":"img","alt":" Sf and Sg","inline":true,"padRight":true},{"text":"are known, then with some boundary correction methods, such as the mirror reflection method proposed in ","element":"span"},{"href":"#id-50","referenceIndex":30,"text":"[30]","element":"a"},{"text":", the convergence rate can be faster than that in ","element":"span"},{"href":"#id-51","text":"(22)","element":"a"},{"text":". However, in Theorem ","element":"span"},{"href":"#id-48","text":"4, ","element":"a"},{"text":"instead of using fixed support sets, ","element":"span"},{"style":{"height":16.47},"width":46.96,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-10.png","element":"img","alt":" Sa","inline":true,"padRight":true},{"text":"contains distributions with a broad range of different support sets. These support sets are only restricted by Assumption ","element":"span"},{"href":"#id-31","text":"1 ","element":"a"},{"text":"(c) and (d), which require that the surface area of all the elements in ","element":"span"},{"style":{"height":16.47},"width":46.96,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-11.png","element":"img","alt":" Sa","inline":true,"padRight":true},{"text":"are bounded by ","element":"span"},{"style":{"height":18.87},"width":220.08,"height":47.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-12.png","element":"img","alt":" Hf and Hg","inline":true},{"text":", and the diameters are bounded by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R","element":"span"},{"text":". As a result, the minimax convergence rate becomes slower. This result indicates the inherent difficulty caused by the boundary effect for distributions with densities bounded away from zero.","element":"span"}],[{"text":"For the second case, the corresponding result is shown in Theorem ","element":"span"},{"href":"#id-52","text":"5.","element":"a"}],[{"id":"id-52","style":{"fontWeight":"bold"},"text":"Theorem 5. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Define ","element":"span"},{"style":{"height":16.47},"width":43.96,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-13.png","element":"img","alt":" Sb","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"as set of pairs ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"f, g","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"that satisfies Assumptions ","element":"span"},{"href":"#id-37","style":{"fontStyle":"italic"},"text":"2 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-43","style":{"fontStyle":"italic"},"text":"3, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and","element":"span"}],[{"style":{"width":"76%"},"width":1529,"height":95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/8-14.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"then for sufficiently large ","element":"span"},{"style":{"height":18},"width":178.83,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/9-0.png","element":"img","alt":" µ, C0, K,","inline":true}],[{"id":"id-56","style":{"width":"90%"},"width":1801,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/9-1.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"(Outline) The minimax convergence rate of differential entropy estimation under similar assumptions was derived in ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23]","element":"a"},{"text":". We can extend the analysis to the minimax convergence rate of cross entropy estimation between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":". Combine the bound for entropy and cross entropy, we can then obtain the minimax lower bound of the mean square error of KL divergence estimation. The detailed proof is shown in Appendix ","element":"span"},{"href":"#id-53","text":"E.","element":"a"}],[{"text":"Comparing ","element":"span"},{"href":"#id-54","text":"(23) ","element":"a"},{"text":"with ","element":"span"},{"href":"#id-55","text":"(19)","element":"a"},{"text":", as well as ","element":"span"},{"href":"#id-56","text":"(25) ","element":"a"},{"text":"with ","element":"span"},{"href":"#id-57","text":"(20)","element":"a"},{"text":", we observe that the convergence rate of the upper bound of mean square error of kNN based KL divergence estimator nearly matches the minimax lower bound for both cases. These results indicate that the kNN method with fixed ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"is nearly minimax rate optimal.","element":"span"}]]},{"heading":"VI. NUMERICAL EXAMPLES","paragraphs":[[{"id":"id-24","text":"In this section, we provide numerical experiments to illustrate the theoretical results in this paper. In ","element":"span"},{"text":"the simulation, we plot the curve of the estimated bias and variance over sample sizes. For illustration simplicity, we assume that the sample sizes for two distributions are equal, i.e. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"text":". For each sample size, the bias and variance are estimated by repeating the simulation ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"times, and then calculate the sample mean and the sample variance of all these trials. For low dimensional distributions, the bias is relatively small, therefore it is necessary to conduct more trials comparing with high dimensional distributions. In the following experiments, we repeat ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"= 100","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"000 ","element":"span"},{"text":"times if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"= 1","element":"span"},{"text":", and ","element":"span"},{"text":"10","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"000 ","element":"span"},{"text":"times if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d > ","element":"span"},{"text":"1","element":"span"},{"text":". In all of the figures, we use log-log plots with base ","element":"span"},{"text":"10","element":"span"},{"text":". In all of the trials, we fix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"= 3","element":"span"},{"text":". Figure ","element":"span"},{"href":"#id-58","text":"1 ","element":"a"},{"text":"shows the convergence rate of kNN based KL divergence estimator for two uniform distributions with different support. This case is an example that satisfies Assumption ","element":"span"},{"href":"#id-31","text":"1. ","element":"a"},{"text":"In Figure ","element":"span"},{"href":"#id-59","text":"2, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"are two Gaussian distributions with different mean but equal variance. In Figure ","element":"span"},{"href":"#id-60","text":"3, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"are two Gaussian distributions with the same mean but different variance. These two cases are examples that satisfy Assumption ","element":"span"},{"href":"#id-37","text":"2. ","element":"a"},{"text":"For all of these distributions above, we compare the empirical convergence rates of the bias and variance with the theoretical prediction. The empirical convergence rates are calculated by finding the negative slope of the curves in these figures by linear regression, while the theoretical ones come from Theorems ","element":"span"},{"href":"#id-38","text":"1, ","element":"a"},{"href":"#id-61","text":"2 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-62","text":"3 ","element":"a"},{"text":"respectively. The results are shown in Table ","element":"span"},{"href":"#id-63","text":"I. ","element":"a"},{"text":"For the convenience of expression, we say that the theoretical convergence rate of bias or variance is ","element":"span"},{"style":{"height":17.2},"width":28,"height":43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/9-2.png","element":"img","alt":" β","inline":true},{"text":", if it decays with either ","element":"span"},{"style":{"height":21.34},"width":164.77,"height":53.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/9-3.png","element":"img","alt":" O(N −β)","inline":true,"padRight":true},{"text":"or ","element":"span"},{"style":{"height":21.74},"width":207.3,"height":54.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/9-4.png","element":"img","alt":" O(N −β+δ)","inline":true,"padRight":true},{"text":"for arbitrarily small ","element":"span"},{"style":{"height":14.4},"width":108.52,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/9-5.png","element":"img","alt":" δ > 0","inline":true},{"text":", given the condition ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"text":".","element":"span"}],[{"id":"id-63","text":"TABLE I: Theoretical and empirical convergence rate comparison","element":"figcaption","subtype":"caption"}],[{"style":{"width":"66%"},"width":1324,"height":246,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/9-6.png","element":"img"}],[{"text":"In Table ","element":"span"},{"href":"#id-63","text":"I, ","element":"a"},{"text":"we observe that for the distribution used in Figure ","element":"span"},{"href":"#id-58","text":"1, ","element":"a"},{"text":"the empirical convergence rates of both bias and variance agree well with the theoretical prediction, in which the theoretical bound of bias comes from Theorem ","element":"span"},{"href":"#id-38","text":"1, ","element":"a"},{"text":"while the variance comes from Theorem ","element":"span"},{"href":"#id-62","text":"3.","element":"a"}],[{"id":"id-58","style":{"width":"93%"},"width":1869,"height":637,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/10-0.png","element":"img"}],[{"text":"Fig. 1: Convergence of bias and variance of kNN based KL divergence estimator for two uniform distributions with different support sets. ","element":"figcaption","subtype":"caption"},{"style":{"height":20.54},"width":838.04,"height":51.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/10-1.png","element":"img","alt":" f = 1 in [0.5, 1.5]d, and g = 2−d in [0, 2]d.","inline":true}],[{"id":"id-59","style":{"width":"93%"},"width":1869,"height":637,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/10-2.png","element":"img"}],[{"text":"Fig. 2: Convergence of bias and variance of kNN based KL divergence estimator for two Gaussian distributions with different means. ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"f ","element":"figcaption","subtype":"caption"},{"text":"is the pdf of ","element":"figcaption","subtype":"caption"},{"style":{"height":20},"width":306.36,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/10-3.png","element":"img","alt":" N(0, Id), and g","inline":true,"padRight":true},{"text":"is the pdf of ","element":"figcaption","subtype":"caption"},{"style":{"height":20},"width":422.57,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/10-4.png","element":"img","alt":" N(1, Id), in which Id","inline":true,"padRight":true},{"text":"denotes ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"d ","element":"figcaption","subtype":"caption"},{"text":"dimensional identity matrix, and ","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"1 ","element":"figcaption","subtype":"caption"},{"text":"= (1","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":", . . . , ","element":"figcaption","subtype":"caption"},{"text":"1)","element":"figcaption","subtype":"caption"},{"text":".","element":"figcaption","subtype":"caption"}],[{"text":"For the distribution in Figure ","element":"span"},{"href":"#id-59","text":"2, ","element":"a"},{"text":"the empirical convergence of bias matches the theoretical prediction from Theorem ","element":"span"},{"href":"#id-61","text":"2. ","element":"a"},{"text":"For Gaussian distributions with different mean, it can be shown that for any ","element":"span"},{"style":{"height":17.2},"width":125.28,"height":43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/10-5.png","element":"img","alt":" γ < 1,","inline":true,"padRight":true},{"text":"there exists a constant ","element":"span"},{"style":{"height":13.2},"width":28,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/10-6.png","element":"img","alt":" µ","inline":true,"padRight":true},{"text":"such that Assumption ","element":"span"},{"href":"#id-37","text":"2 ","element":"a"},{"text":"(b) holds. Therefore, according to Theorem ","element":"span"},{"href":"#id-61","text":"2, ","element":"a"},{"text":"the convergence rate of bias is ","element":"span"},{"style":{"height":25.2},"width":250.86,"height":63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/10-7.png","element":"img","alt":" O(N − 2d+2 +δ)","inline":true,"padRight":true},{"text":"for arbitrarily small ","element":"span"},{"style":{"height":14.4},"width":112.83,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/10-8.png","element":"img","alt":" δ > 0","inline":true},{"text":". Therefore, in the second line of Table ","element":"span"},{"href":"#id-63","text":"I, ","element":"a"},{"text":"the theoretical rate of bias is ","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"67","element":"span"},{"text":", ","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"50 ","element":"span"},{"text":"and ","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"40","element":"span"},{"text":", respectively. Now we discuss the convergence rate of variance. Note that the theoretical result about the variance is unknown, since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f/g ","element":"span"},{"text":"can reach infinity, thus Assumption ","element":"span"},{"href":"#id-43","text":"3 ","element":"a"},{"text":"(d) is not satisfied, and Theorem ","element":"span"},{"href":"#id-62","text":"3 ","element":"a"},{"text":"does not hold here. We observe that the empirical convergence rate is slower than that in other cases. Such a result may indicate that it is","element":"span"}],[{"id":"id-60","style":{"width":"93%"},"width":1869,"height":637,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/11-0.png","element":"img"}],[{"text":"Fig. 3: Convergence of bias and variance of kNN based KL divergence estimator for two Gaussian distributions with different variances. ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"f ","element":"figcaption","subtype":"caption"},{"text":"is the pdf of ","element":"figcaption","subtype":"caption"},{"style":{"height":20},"width":307.52,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/11-1.png","element":"img","alt":" N(0, Id), and g","inline":true,"padRight":true},{"text":"is the pdf of ","element":"figcaption","subtype":"caption"},{"style":{"height":20},"width":205.44,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/11-2.png","element":"img","alt":" N(0, 2Id).","inline":true}],[{"text":"harder to estimate the KL divergence if the density ratio is unbounded.","element":"span"}],[{"text":"For the distribution in Figure ","element":"span"},{"href":"#id-60","text":"3, ","element":"a"},{"text":"the empirical and theoretical convergence rate of the variance matches well, while the empirical rate of bias is faster than the theoretical prediction. Note that the bound we have derived holds universally for all distributions that satisfy the assumptions. For certain specific distribution, the convergence rate can probably be faster. In particular, there is an uniform bound on the Hessian of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"in Assumption ","element":"span"},{"href":"#id-37","text":"2 ","element":"a"},{"text":"(c). However, for Gaussian distributions, the Hessian is lower where the pdf value is small. Therefore, the local non-uniformity is not as serious as the worst case that satisfies the assumptions.","element":"span"}]]},{"heading":"VII. CONCLUSION","paragraphs":[[{"id":"id-25","text":"In this paper, we have analyzed the convergence rates of the bias and variance of the kNN based ","element":"span"},{"text":"KL divergence estimator proposed in ","element":"span"},{"href":"#id-17","referenceIndex":8,"text":"[8]","element":"a"},{"text":". For the bias, we have discussed two types of distributions depending on the main causes of the bias. In the first case, the distribution has bounded support, and the pdf is bounded away from zero. In the second case, the distribution is smooth everywhere and the pdf can approach zero arbitrarily close. For the variance, we have derived the convergence rate under a more general assumption. Furthermore, we have derived the minimax lower bound of KL divergence estimation. The bound holds for all possible estimators. We have shown that for both types of distributions, the kNN based KL divergence estimator is nearly minimax rate optimal. We have also used numerical experiments to illustrate that the practical performances of kNN based KL divergence estimator are consistent with our theoretical analysis.","element":"span"}],[{"id":"id-36","style":{"width":"97%"},"width":1945,"height":571,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-0.png","element":"img"}],[{"text":"in which","element":"span"}],[{"style":{"width":"79%"},"width":1588,"height":188,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-1.png","element":"img"}],[{"text":"and ","element":"span"},{"style":{"height":11.67},"width":38.15,"height":29.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-2.png","element":"img","alt":" cd","inline":true,"padRight":true},{"text":"is the volume of unit ball. Here, we omit ","element":"span"},{"style":{"height":19.6},"width":621.51,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-3.png","element":"img","alt":" i, since E[ln ϵ(i)] and E[ln ν(i)]","inline":true,"padRight":true},{"text":"are the same for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":".","element":"span"}],[{"text":"In the following, we provide details on how to bound ","element":"span"},{"style":{"height":16.07},"width":110,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-4.png","element":"img","alt":" I2. I1","inline":true,"padRight":true},{"text":"can then be bounded using similar method.","element":"span"}],[{"text":"To begin with, we denote ","element":"span"},{"style":{"height":19.67},"width":117.22,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-5.png","element":"img","alt":" Pg(S)","inline":true,"padRight":true},{"text":"as the probability mass of ","element":"span"},{"style":{"height":22.05},"width":769.88,"height":55.13,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-6.png","element":"img","alt":" S under pdf g, i.e. Pg(S) =�S g(x)dx.","inline":true,"padRight":true},{"text":"We have the following lemma.","element":"span"}],[{"id":"id-65","style":{"fontWeight":"bold"},"text":"Lemma 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There exists a constant ","element":"span"},{"style":{"height":16.47},"width":50.5,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-7.png","element":"img","alt":" C1","inline":true},{"style":{"fontStyle":"italic"},"text":", such that, if ","element":"span"},{"style":{"height":19.67},"width":446.6,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-8.png","element":"img","alt":" B(x, r) ⊂ Sg, we have","inline":true}],[{"style":{"width":"34%"},"width":693,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-9.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof.","element":"span"}],[{"style":{"width":"93%"},"width":1866,"height":389,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-10.png","element":"img"}],[{"text":"in which the first inequality uses Assumption ","element":"span"},{"href":"#id-31","text":"1 ","element":"a"},{"text":"(f).","element":"span"}],[{"text":"From order statistics ","element":"span"},{"href":"#id-40","referenceIndex":28,"text":"[28]","element":"a"},{"text":", ","element":"span"},{"style":{"height":19.67},"width":752.65,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-11.png","element":"img","alt":" E[ln Pg(B(x, r))] = ψ(k) − ψ(M + 1)","inline":true},{"text":", therefore","element":"span"}],[{"id":"id-64","style":{"width":"65%"},"width":1312,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-12.png","element":"img"}],[{"text":"Define","element":"span"}],[{"style":{"width":"64%"},"width":1280,"height":119,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/12-13.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":21.81},"width":954.2,"height":54.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-0.png","element":"img","alt":" aM = A(ln M/M)1/d, and A = (2/(Lgcd))1/d","inline":true},{"text":". From ","element":"span"},{"href":"#id-64","text":"(31)","element":"a"},{"text":", we observe that the bias is determined by the difference between the average pdf in ","element":"span"},{"style":{"height":19.6},"width":149.72,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-1.png","element":"img","alt":" B(x, ν)","inline":true,"padRight":true},{"text":"and the pdf at its center ","element":"span"},{"style":{"height":19.6},"width":213.87,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-2.png","element":"img","alt":" g(x). S1 is","inline":true,"padRight":true},{"text":"the region that is relatively far from the boundary. For all ","element":"span"},{"style":{"height":16.47},"width":132.64,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-3.png","element":"img","alt":" x ∈ S1","inline":true},{"text":", with high probability, ","element":"span"},{"style":{"height":19.67},"width":273.74,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-4.png","element":"img","alt":" B(x, ν) ⊂ Sg.","inline":true,"padRight":true},{"text":"In this case, the bias is caused by the non-uniformity of density. With the increase of sample size, the effect of such non-uniformity will converge to zero. ","element":"span"},{"style":{"height":16.47},"width":45.8,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-5.png","element":"img","alt":" S2","inline":true,"padRight":true},{"text":"is the region near to the boundary, in which the probability that ","element":"span"},{"style":{"height":19.6},"width":246.98,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-6.png","element":"img","alt":" B(x, ν) ̸⊂ S","inline":true,"padRight":true},{"text":"is not negligible, hence ","element":"span"},{"style":{"height":19.6},"width":222.83,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-7.png","element":"img","alt":" P(B(x, ν))","inline":true,"padRight":true},{"text":"can deviate significantly comparing with ","element":"span"},{"style":{"height":20.94},"width":173.89,"height":52.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-8.png","element":"img","alt":" cdνdg(x)","inline":true},{"text":". Therefore, the bias in this region will not converge to zero. However, we let the size of ","element":"span"},{"style":{"height":16.47},"width":45.8,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-9.png","element":"img","alt":" S2","inline":true,"padRight":true},{"text":"converge to zero, so that the overall bound of the bias converges.","element":"span"}],[{"id":"id-66","style":{"width":"97%"},"width":1945,"height":577,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-10.png","element":"img"}],[{"text":"In step (a), we use Lemma ","element":"span"},{"href":"#id-65","text":"1, ","element":"a"},{"text":"Assumption ","element":"span"},{"href":"#id-31","text":"1 ","element":"a"},{"text":"(b) and Assumption ","element":"span"},{"href":"#id-31","text":"1 ","element":"a"},{"text":"(e). In step (b), the first term uses the fact that for sufficiently large ","element":"span"},{"style":{"height":16.07},"width":144.26,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-11.png","element":"img","alt":" M, aM","inline":true,"padRight":true},{"text":"will be sufficiently small, hence ","element":"span"},{"style":{"height":20.54},"width":346.69,"height":51.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-12.png","element":"img","alt":" C1ν2/(cdg(x)) ≤","inline":true},{"style":{"height":20.54},"width":436.09,"height":51.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-13.png","element":"img","alt":"C1a2M/(cdg(x)) < 1/2","inline":true},{"text":". The second term of step (b) comes from the Chernoff bound, which indicates ","element":"span"},{"text":"that for all ","element":"span"},{"style":{"height":16.47},"width":132.64,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-14.png","element":"img","alt":" x ∈ S1","inline":true,"padRight":true},{"text":"and sufficiently large ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":",","element":"span"}],[{"id":"id-67","style":{"width":"97%"},"width":1945,"height":961,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-15.png","element":"img"}],[{"text":"In this equation, ","element":"span"},{"style":{"height":19.6},"width":121.64,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-16.png","element":"img","alt":" V (S2)","inline":true,"padRight":true},{"text":"is the volume of ","element":"span"},{"style":{"height":16.47},"width":45.8,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-17.png","element":"img","alt":" S2","inline":true},{"text":", and we use the fact that ","element":"span"},{"style":{"height":19.67},"width":303.24,"height":49.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-18.png","element":"img","alt":" V (S2) ≤ HgaM","inline":true,"padRight":true},{"text":"according to the definition of ","element":"span"},{"style":{"height":16.47},"width":45.8,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-19.png","element":"img","alt":" S2","inline":true,"padRight":true},{"text":"and Assumption ","element":"span"},{"href":"#id-31","text":"1 ","element":"a"},{"text":"(c). Based on ","element":"span"},{"href":"#id-66","text":"(34) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-67","text":"(36)","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"60%"},"width":1208,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/13-20.png","element":"img"}],[{"text":"Similarly, we have ","element":"span"},{"style":{"height":21.74},"width":415.77,"height":54.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-0.png","element":"img","alt":" |I1| ≲ (ln N/N)(1/d)","inline":true},{"text":", and according to the definition of digamma function ","element":"span"},{"style":{"height":17.6},"width":44.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-1.png","element":"img","alt":" ψ,","inline":true},{"style":{"height":19.2},"width":368.36,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-2.png","element":"img","alt":"|I3| ≲ 1/M + 1/N","inline":true},{"text":". Therefore","element":"span"}],[{"id":"id-42","style":{"width":"74%"},"width":1495,"height":328,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-3.png","element":"img"}],[{"text":"In this section, we derive the bound of the bias for distributions that satisfy Assumption ","element":"span"},{"href":"#id-37","text":"2. ","element":"a"},{"text":"These","element":"span"}],[{"text":"distributions are smooth everywhere and the densities can approach zero. We begin with the following lemmas, whose proofs can be found in Appendix ","element":"span"},{"href":"#id-68","text":"B-A, ","element":"a"},{"href":"#id-69","text":"B-B, ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-70","text":"B-C, ","element":"a"},{"text":"respectively.","element":"span"}],[{"id":"id-79","style":{"fontWeight":"bold"},"text":"Lemma 2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There exist constants ","element":"span"},{"style":{"height":19.67},"width":1137.62,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-4.png","element":"img","alt":" Uf and Ug such that f(x) ≤ Uf and g(x) ≤ Ug for all x.","inline":true}],[{"id":"id-81","style":{"fontWeight":"bold"},"text":"Lemma 3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There exists a constant ","element":"span"},{"style":{"height":16.47},"width":50.5,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-5.png","element":"img","alt":" C2","inline":true},{"style":{"fontStyle":"italic"},"text":", such that","element":"span"}],[{"style":{"width":"39%"},"width":778,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-6.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for sufficiently small ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"style":{"fontStyle":"italic"},"text":", in which ","element":"span"},{"style":{"fontWeight":"bold"},"text":"X ","element":"span"},{"style":{"fontStyle":"italic"},"text":"follows a distribution with pdf ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"id":"id-72","style":{"width":"99%"},"width":1993,"height":217,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-7.png","element":"img"}],[{"text":"Similar to the proof of Theorem ","element":"span"},{"href":"#id-38","text":"1, ","element":"a"},{"text":"we decompose the bias as ","element":"span"},{"style":{"height":23.64},"width":743.61,"height":59.09,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-8.png","element":"img","alt":" E[ ˆD(f||g)]−D(f||g) = −I1 +I2 +I3.","inline":true,"padRight":true},{"text":"Then","element":"span"}],[{"style":{"width":"66%"},"width":1319,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-9.png","element":"img"}],[{"text":"Divide ","element":"span"},{"style":{"height":18.87},"width":44.8,"height":47.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-10.png","element":"img","alt":" Sg","inline":true,"padRight":true},{"text":"into two parts.","element":"span"}],[{"id":"id-71","style":{"width":"65%"},"width":1297,"height":179,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-11.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":21.74},"width":729.84,"height":54.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-12.png","element":"img","alt":" aM = AM −β, A = (k/C1)(1/(d+2)). β","inline":true,"padRight":true},{"text":"will be determined later. ","element":"span"},{"style":{"height":16.47},"width":50.5,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-13.png","element":"img","alt":" C1","inline":true,"padRight":true},{"text":"is the constant in Lemma ","element":"span"},{"href":"#id-65","text":"1.","element":"a"}],[{"id":"id-77","style":{"width":"97%"},"width":1946,"height":602,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/14-14.png","element":"img"}],[{"text":"in which (a) comes from Lemma ","element":"span"},{"href":"#id-65","text":"1. ","element":"a"},{"text":"For (b), note that according to ","element":"span"},{"href":"#id-71","text":"(41)","element":"a"},{"text":", ","element":"span"},{"style":{"height":20.54},"width":520.64,"height":51.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-0.png","element":"img","alt":" C1a2M/(cdg(x)) < 1/2 for","inline":true},{"style":{"height":19.6},"width":1002.3,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-1.png","element":"img","alt":"x ∈ S1, and | ln(1 − u)| ≤ 2u for any 0 < u ≤ 1/2","inline":true},{"text":". (c) uses Lemma ","element":"span"},{"href":"#id-72","text":"4. ","element":"a"},{"text":"For ","element":"span"},{"style":{"height":12.87},"width":148.72,"height":32.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-2.png","element":"img","alt":" ν > aM","inline":true},{"text":", note that according to Lemma ","element":"span"},{"href":"#id-65","text":"1,","element":"a"}],[{"id":"id-83","style":{"width":"77%"},"width":1547,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-3.png","element":"img"}],[{"text":"Based on this fact, if ","element":"span"},{"style":{"height":19.6},"width":281.32,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-4.png","element":"img","alt":" β ≤ 1/(d + 2)","inline":true},{"text":", we show the following two lemmas:","element":"span"}],[{"id":"id-86","style":{"fontWeight":"bold"},"text":"Lemma 5. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There exists a constant ","element":"span"},{"style":{"height":16.47},"width":50.5,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-5.png","element":"img","alt":" C3","inline":true},{"style":{"fontStyle":"italic"},"text":", such that","element":"span"}],[{"style":{"width":"69%"},"width":1396,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-6.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Please see Appendix ","element":"span"},{"href":"#id-73","text":"B-D ","element":"a"},{"text":"for detailed proof.","element":"span"}],[{"id":"id-84","style":{"fontWeight":"bold"},"text":"Lemma 6. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There exists a constant ","element":"span"},{"style":{"height":16.47},"width":50.5,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-7.png","element":"img","alt":" C4","inline":true},{"style":{"fontStyle":"italic"},"text":", such that","element":"span"}],[{"style":{"width":"77%"},"width":1550,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Please see Appendix ","element":"span"},{"href":"#id-74","text":"B-E ","element":"a"},{"text":"for detailed proof.","element":"span"}],[{"style":{"width":"97%"},"width":1945,"height":381,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-9.png","element":"img"}],[{"text":"Note that","element":"span"}],[{"style":{"width":"87%"},"width":1746,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-10.png","element":"img"}],[{"text":"Therefore","element":"span"}],[{"id":"id-75","style":{"width":"99%"},"width":1993,"height":616,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-11.png","element":"img"}],[{"text":"From order statistics ","element":"span"},{"href":"#id-40","referenceIndex":28,"text":"[28]","element":"a"},{"text":", ","element":"span"},{"style":{"height":19.67},"width":918.11,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-12.png","element":"img","alt":" |E[ln Pg(B(x, ν))|x]| = |ψ(k)−ψ(M)| ≤ ln M","inline":true},{"text":". According to Assumption ","element":"span"},{"href":"#id-37","text":"2 ","element":"a"},{"text":"(b), the first three terms in ","element":"span"},{"href":"#id-75","text":"(51) ","element":"a"},{"text":"can be bounded by:","element":"span"}],[{"style":{"width":"89%"},"width":1778,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/15-13.png","element":"img"}],[{"style":{"width":"95%"},"width":1895,"height":766,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-0.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"65%"},"width":1309,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-1.png","element":"img"}],[{"text":"The last term in ","element":"span"},{"href":"#id-75","text":"(51) ","element":"a"},{"text":"can be bounded using the following lemma, whose proof can be found in Appendix ","element":"span"},{"href":"#id-76","text":"B-F.","element":"a"}],[{"id":"id-87","style":{"fontWeight":"bold"},"text":"Lemma 7. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There exist two constants ","element":"span"},{"style":{"height":16.47},"width":208.14,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-2.png","element":"img","alt":" C5 and C6","inline":true},{"style":{"fontStyle":"italic"},"text":", such that for sufficiently large ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"}],[{"style":{"width":"69%"},"width":1385,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-3.png","element":"img"}],[{"text":"Using this lemma, we have","element":"span"}],[{"id":"id-78","style":{"width":"97%"},"width":1945,"height":318,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-4.png","element":"img"}],[{"text":"Combining ","element":"span"},{"href":"#id-77","text":"(43)","element":"a"},{"text":", ","element":"span"},{"href":"#id-75","text":"(50) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-78","text":"(57)","element":"a"},{"text":", we get","element":"span"}],[{"style":{"width":"70%"},"width":1405,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-5.png","element":"img"}],[{"text":"Since the above bound holds for arbitrary ","element":"span"},{"style":{"height":19.6},"width":281.32,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-6.png","element":"img","alt":" β ≤ 1/(d + 2)","inline":true},{"text":", we just let ","element":"span"},{"style":{"height":19.6},"width":391.86,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-7.png","element":"img","alt":" β = 1/(d + 2), then","inline":true}],[{"style":{"width":"61%"},"width":1224,"height":64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-8.png","element":"img"}],[{"text":"Similarly, we have ","element":"span"},{"style":{"height":25.5},"width":369.76,"height":63.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-9.png","element":"img","alt":" |I1| ≲ N − 2γd+2 ln N","inline":true},{"text":", and according to the definition of digamma function, ","element":"span"},{"style":{"height":19.2},"width":121.52,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-10.png","element":"img","alt":" |I3| ≲","inline":true}],[{"style":{"width":"99%"},"width":1989,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/16-11.png","element":"img"}],[{"id":"id-68","style":{"fontStyle":"italic"},"text":"A. Proof of Lemma ","element":"span"},{"href":"#id-79","style":{"fontStyle":"italic"},"text":"2 ","element":"a"},{"text":"We only show that there exists a constant ","element":"span"},{"style":{"height":19.67},"width":459.14,"height":49.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-0.png","element":"img","alt":" Ug such that g(x) ≤ Ug","inline":true,"padRight":true},{"text":"holds for all ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":". The proof of the upper bound ","element":"span"},{"style":{"height":18.47},"width":50.8,"height":46.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-1.png","element":"img","alt":" Uf","inline":true,"padRight":true},{"text":"of density ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"will be exactly the same. From Lemma ","element":"span"},{"href":"#id-65","text":"1,","element":"a"}],[{"style":{"width":"68%"},"width":1372,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-2.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":19.67},"width":508.77,"height":49.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-3.png","element":"img","alt":" Pg(B(x, r)) ≤ 1, we have","inline":true}],[{"id":"id-80","style":{"width":"61%"},"width":1226,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-4.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":18.47},"width":329.52,"height":46.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-5.png","element":"img","alt":" r > 0. Define Ug","inline":true,"padRight":true},{"text":"as the right hand side of ","element":"span"},{"href":"#id-80","text":"(62) ","element":"a"},{"text":"given ","element":"span"},{"style":{"height":21.74},"width":491.91,"height":54.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-6.png","element":"img","alt":" r = (d/(2C1))1/(d+2), i.e.","inline":true}],[{"style":{"width":"61%"},"width":1219,"height":175,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-7.png","element":"img"}],[{"text":"then ","element":"span"},{"style":{"height":19.67},"width":397.95,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-8.png","element":"img","alt":" g(x) ≤ Ug for all x.","inline":true}],[{"id":"id-69","style":{"fontStyle":"italic"},"text":"B. Proof of Lemma ","element":"span"},{"href":"#id-81","style":{"fontStyle":"italic"},"text":"3 ","element":"a"},{"text":"From H¨older inequality, For any ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q ","element":"span"},{"text":"such that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p > ","element":"span"},{"text":"1","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q > ","element":"span"},{"text":"1","element":"span"},{"text":", and ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/p ","element":"span"},{"text":"+ 1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/q ","element":"span"},{"text":"= 1","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"83%"},"width":1663,"height":65,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-9.png","element":"img"}],[{"text":"From Assumption ","element":"span"},{"href":"#id-37","text":"2 ","element":"a"},{"text":"(b),","element":"span"}],[{"id":"id-108","style":{"width":"99%"},"width":1993,"height":783,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-10.png","element":"img"}],[{"text":"Using Stirling’s formula ","element":"span"},{"style":{"height":20.14},"width":498.33,"height":50.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-11.png","element":"img","alt":" p! ≤ epp+1/2e−p, we have","inline":true}],[{"id":"id-82","style":{"width":"82%"},"width":1638,"height":211,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-12.png","element":"img"}],[{"text":"which holds for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p > ","element":"span"},{"text":"1","element":"span"},{"text":". For sufficiently small ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"= ln(1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/t","element":"span"},{"text":")","element":"span"},{"text":", then the right hand side of ","element":"span"},{"href":"#id-82","text":"(67) ","element":"a"},{"text":"becomes ","element":"span"},{"style":{"height":19.6},"width":219.07,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/17-13.png","element":"img","alt":" etγ ln(1/t).","inline":true}],[{"id":"id-70","style":{"fontStyle":"italic"},"text":"C. Proof of Lemma ","element":"span"},{"href":"#id-72","style":{"fontStyle":"italic"},"text":"4","element":"a"}],[{"id":"id-73","style":{"width":"99%"},"width":1994,"height":844,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/18-0.png","element":"img"}],[{"text":"in which we used ","element":"span"},{"href":"#id-71","text":"(41) ","element":"a"},{"text":"and Lemma ","element":"span"},{"href":"#id-65","text":"1. ","element":"a"},{"text":"Hence, according to ","element":"span"},{"href":"#id-83","text":"(44) ","element":"a"},{"text":"and Chernoff inequality,","element":"span"}],[{"id":"id-85","style":{"width":"78%"},"width":1561,"height":328,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/18-1.png","element":"img"}],[{"text":"Moreover, define ","element":"span"},{"style":{"height":20.57},"width":397.88,"height":51.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/18-2.png","element":"img","alt":" a = McdadM/2, then","inline":true}],[{"style":{"width":"80%"},"width":1612,"height":841,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/18-3.png","element":"img"}],[{"text":"The proof is complete.","element":"span"}],[{"id":"id-74","style":{"fontStyle":"italic"},"text":"E. Proof of Lemma ","element":"span"},{"href":"#id-84","style":{"fontStyle":"italic"},"text":"6","element":"a"}],[{"style":{"width":"97%"},"width":1944,"height":43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-0.png","element":"img"}],[{"href":"#id-37","style":{"height":20.94},"width":459.03,"height":52.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-1.png","element":"img","alt":"Rd \\ B(0, r). Denote ν0","inline":true,"padRight":true},{"text":"as the kNN distance of ","element":"span"},{"style":{"height":17.6},"width":515.54,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-2.png","element":"img","alt":" x = 0 among Y1, . . . , YM","inline":true},{"text":". Then for sufficiently large","element":"span"}],[{"style":{"width":"99%"},"width":1991,"height":568,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-3.png","element":"img"}],[{"text":"Denote ","element":"span"},{"style":{"height":19.6},"width":124.9,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-4.png","element":"img","alt":" nY (S)","inline":true,"padRight":true},{"text":"as the number of samples from ","element":"span"},{"style":{"height":19.2},"width":290.89,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-5.png","element":"img","alt":" {Y1, . . . , YM}","inline":true,"padRight":true},{"text":"that are in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":". Then for any given ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":", and","element":"span"}],[{"style":{"width":"99%"},"width":1992,"height":197,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-6.png","element":"img"}],[{"text":"Let","element":"span"}],[{"style":{"width":"69%"},"width":1394,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-7.png","element":"img"}],[{"text":"It can be checked that ","element":"span"},{"style":{"height":21.74},"width":467.4,"height":54.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-8.png","element":"img","alt":" aMet0 ≥ (2K)1/s + ∥x∥","inline":true},{"text":", therefore","element":"span"}],[{"style":{"width":"90%"},"width":1797,"height":808,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-9.png","element":"img"}],[{"text":"In (a), we use ","element":"span"},{"href":"#id-85","text":"(70) ","element":"a"},{"text":"and the definition of ","element":"span"},{"style":{"height":14.87},"width":33.91,"height":37.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-10.png","element":"img","alt":" t0","inline":true},{"text":", which implies that ","element":"span"},{"style":{"height":19.74},"width":289.01,"height":49.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-11.png","element":"img","alt":" ∥x∥ ≤ aMet/2","inline":true},{"text":". (b) uses the fact that","element":"span"}],[{"style":{"width":"99%"},"width":1991,"height":181,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/19-12.png","element":"img"}],[{"text":"It remains to bound ","element":"span"},{"style":{"height":19.6},"width":527.04,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/20-0.png","element":"img","alt":" E[φ(X)t0]. For any T > 0,","inline":true}],[{"style":{"width":"79%"},"width":1577,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/20-1.png","element":"img"}],[{"text":"In Lemma ","element":"span"},{"href":"#id-86","text":"5, ","element":"a"},{"text":"we have shown that ","element":"span"},{"style":{"height":21.74},"width":482.34,"height":54.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/20-2.png","element":"img","alt":" E[φ(X)] ≤ C3M −γ(1−βd)","inline":true},{"text":". For the second term,","element":"span"}],[{"style":{"width":"99%"},"width":1992,"height":511,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/20-3.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"= (1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/s","element":"span"},{"text":") ln ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":", then","element":"span"}],[{"style":{"width":"66%"},"width":1327,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/20-4.png","element":"img"}],[{"text":"Hence","element":"span"}],[{"style":{"width":"76%"},"width":1524,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/20-5.png","element":"img"}],[{"id":"id-76","style":{"fontStyle":"italic"},"text":"F. Proof of Lemma ","element":"span"},{"href":"#id-87","style":{"fontStyle":"italic"},"text":"7","element":"a"}],[{"id":"id-88","style":{"width":"82%"},"width":1637,"height":511,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/20-6.png","element":"img"}],[{"text":"In (a), we use Lemma ","element":"span"},{"href":"#id-79","text":"2. ","element":"a"},{"text":"(b) uses Chernoff bound. Moreover, let ","element":"span"},{"style":{"height":20.54},"width":832.22,"height":51.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-0.png","element":"img","alt":" t0 = max{ln(2 ∥x∥), (1/s) ln(21+seK), 0},","inline":true,"padRight":true},{"text":"then","element":"span"}],[{"id":"id-89","style":{"width":"82%"},"width":1639,"height":753,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-1.png","element":"img"}],[{"text":"Combining ","element":"span"},{"href":"#id-88","text":"(81) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-89","text":"(82)","element":"a"},{"text":", the proof is complete.","element":"span"}],[{"id":"id-46","style":{"width":"97%"},"width":1945,"height":561,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-2.png","element":"img"}],[{"text":"We bound ","element":"span"},{"style":{"height":16.47},"width":179.76,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-3.png","element":"img","alt":" I1 and I2","inline":true,"padRight":true},{"text":"separately.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Bound of ","element":"span"},{"style":{"height":16.07},"width":107.09,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-4.png","element":"img","alt":" I1. I1","inline":true,"padRight":true},{"text":"is the variance of Kozachenko-Leonenko entropy estimator ","element":"span"},{"href":"#id-13","referenceIndex":17,"text":"[17]","element":"a"},{"text":", which estimates ","element":"span"},{"style":{"height":21.6},"width":543.41,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-5.png","element":"img","alt":"h(f) = −�f(x) ln f(x)dx","inline":true},{"text":". Here we use similar proof procedure as was already used in the proof of Theorem 2 in our recent work ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23]","element":"a"},{"text":". ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23] ","element":"a"},{"text":"has analyzed a truncated KL entropy estimator, which means that ","element":"span"},{"style":{"height":11.27},"width":30.9,"height":28.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-6.png","element":"img","alt":" ϵi","inline":true,"padRight":true},{"text":"is truncated by an upper bound ","element":"span"},{"style":{"height":11.67},"width":54.58,"height":29.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-7.png","element":"img","alt":" aN","inline":true},{"text":". The variance of this estimator is actually equal to ","element":"span"},{"style":{"height":23.68},"width":992.03,"height":59.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-8.png","element":"img","alt":" Var[(d/N) �Ni=1 ln ρi], in which ρi = min{ϵ, aN}","inline":true},{"text":". It was shown in ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23] ","element":"a"},{"text":"that if ","element":"span"},{"style":{"height":19.01},"width":329.14,"height":47.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-9.png","element":"img","alt":" aN ∼ N −β with","inline":true},{"style":{"height":23.68},"width":1034.42,"height":59.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-10.png","element":"img","alt":"0 < β < 1/d, then Var[(d/N) �Ni=1 ln ρi] = O(N −1)","inline":true},{"text":". In this section, we prove the same convergence ","element":"span"},{"text":"bound for the estimator without truncation, i.e. ","element":"span"},{"style":{"height":45.02},"width":1329.05,"height":112.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-11.png","element":"img","alt":" Var[(d/N) �Ni=1 ln ϵi].Let X′1 ","inline":true,"padRight":true},{"text":"be a sample that is i.i.d with ","element":"span"},{"style":{"height":16.8},"width":315.59,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-12.png","element":"img","alt":" X1, X2, . . . , XN","inline":true},{"text":". Recall that ","element":"span"},{"style":{"height":16.47},"width":205.54,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-13.png","element":"img","alt":" ϵi is the k","inline":true},{"text":"-th nearest neighbor ","element":"span"},{"text":"distance of ","element":"span"},{"style":{"height":17.6},"width":547.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-14.png","element":"img","alt":" Xi among X1, X2, . . . , XN","inline":true},{"text":". If we replace ","element":"span"},{"style":{"height":18.33},"width":250.01,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-15.png","element":"img","alt":" X1 with X′1","inline":true},{"text":", then the kNN distances will ","element":"span"},{"text":"change. Denote ","element":"span"},{"style":{"height":18.33},"width":212.21,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-16.png","element":"img","alt":" ϵ′i as the k","inline":true},{"text":"-th nearest neighbor distance based on ","element":"span"},{"style":{"height":17.93},"width":315.6,"height":44.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-17.png","element":"img","alt":" X′1, X2, . . . , XN","inline":true},{"text":". Then use Efron- ","element":"span"},{"text":"Stein inequality ","element":"span"},{"href":"#id-47","referenceIndex":29,"text":"[29]","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"80%"},"width":1609,"height":173,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/21-18.png","element":"img"}],[{"text":"Define ","element":"span"},{"style":{"height":20.94},"width":1105.65,"height":52.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-0.png","element":"img","alt":" Ui = ln(Ncdϵdi ) and U ′i = ln(Ncd(ϵ′i)d) for i = 1, . . . , N","inline":true},{"text":". Moreover, define ","element":"span"},{"style":{"height":18.33},"width":367.59,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-1.png","element":"img","alt":" ϵ′′i as the k nearest","inline":true,"padRight":true},{"text":"neighbor distances based on ","element":"span"},{"style":{"height":20.94},"width":1013.08,"height":52.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-2.png","element":"img","alt":" X2, . . . , XN, and U ′′i = ln(Ncd(ϵ′′i )d), i = 2, . . . , N","inline":true},{"text":". Follow the steps in ","element":"span"},{"text":"Appendix C of ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23]","element":"a"},{"text":", we have","element":"span"}],[{"id":"id-97","style":{"width":"83%"},"width":1665,"height":144,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-3.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":13.2},"width":42.29,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-4.png","element":"img","alt":" γd","inline":true,"padRight":true},{"text":"is a constant that depends on dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"and the norm we use. For example, if we use ","element":"span"},{"style":{"height":16.47},"width":36.65,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-5.png","element":"img","alt":"ℓ2","inline":true,"padRight":true},{"text":"norm, then ","element":"span"},{"style":{"height":13.2},"width":42.29,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-6.png","element":"img","alt":" γd","inline":true,"padRight":true},{"text":"is the minimum number of cones with angle ","element":"span"},{"style":{"height":19.2},"width":74.69,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-7.png","element":"img","alt":" π/6","inline":true,"padRight":true},{"text":"that cover ","element":"span"},{"style":{"height":16.14},"width":65.96,"height":40.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-8.png","element":"img","alt":" Rd.","inline":true}],[{"text":"Now we bound ","element":"span"},{"style":{"height":21.34},"width":1624.56,"height":53.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-9.png","element":"img","alt":" E[U 21] and E[(U ′′1 )2]. Define ρ = min{ϵ, aN}, in which aN ∼ N −β, 0 < β < 1/d.","inline":true,"padRight":true},{"text":"Note that we truncate the estimator for the convenience of analysis, although we are now analyzing an estimator without truncation. The deviation caused by such truncation will be bounded later. In the following proof, we omit the index for convenience. ","element":"span"},{"style":{"height":20.14},"width":113.62,"height":50.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-10.png","element":"img","alt":" E[U 2]","inline":true,"padRight":true},{"text":"can be bounded by","element":"span"}],[{"id":"id-96","style":{"width":"102%"},"width":2050,"height":424,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-11.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":19.67},"width":119.75,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-12.png","element":"img","alt":" Pf(S)","inline":true,"padRight":true},{"text":"is the probability mass of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"under a distribution with pdf ","element":"span"},{"style":{"height":22.05},"width":526.74,"height":55.13,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-13.png","element":"img","alt":" f, i.e. Pf(S) =�S f(x)dx.","inline":true,"padRight":true},{"text":"According to Assumption ","element":"span"},{"href":"#id-43","text":"3 ","element":"a"},{"text":"(b), ","element":"span"},{"style":{"height":22.83},"width":791.89,"height":57.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-14.png","element":"img","alt":" E[(ln f(X))2] =�f(x) ln2 f(x)dx < ∞","inline":true},{"text":". Moreover, Lemma 6 and Lemma 7 in ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23] ","element":"a"},{"text":"have shown that","element":"span"}],[{"style":{"width":"75%"},"width":1497,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-15.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"68%"},"width":1376,"height":144,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-16.png","element":"img"}],[{"text":"It remains to show that ","element":"span"},{"style":{"height":21.83},"width":329.53,"height":54.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-17.png","element":"img","alt":" E[ln2(ϵ/ρ)] → 0:","inline":true}],[{"id":"id-94","style":{"width":"91%"},"width":1818,"height":350,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-18.png","element":"img"}],[{"text":"For sufficiently large ","element":"span"},{"style":{"height":16.07},"width":229.12,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-19.png","element":"img","alt":" N, aN < r0","inline":true},{"text":". From Assumption ","element":"span"},{"href":"#id-43","text":"3 ","element":"a"},{"text":"(b), for sufficiently small ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":",","element":"span"}],[{"id":"id-92","style":{"width":"83%"},"width":1665,"height":144,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/22-20.png","element":"img"}],[{"text":"in which we use small ","element":"span"},{"style":{"fontStyle":"italic"},"text":"o ","element":"span"},{"text":"notation, since for any variable ","element":"span"},{"style":{"height":19.6},"width":872.73,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-0.png","element":"img","alt":" U such that U ≥ 0 and E[U] < ∞, uP(U >","inline":true}],[{"id":"id-91","style":{"width":"99%"},"width":1992,"height":452,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-1.png","element":"img"}],[{"text":"In (a), we use the definition of ","element":"span"},{"style":{"height":21.85},"width":33.19,"height":54.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-2.png","element":"img","alt":"˜f","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-90","text":"(16) ","element":"a"},{"text":"for the first term, and use Chernoff inequality for the second term. (b) holds because ","element":"span"},{"style":{"height":21.37},"width":1153.41,"height":53.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-3.png","element":"img","alt":" N 1−δadN ∼ N 1−δ−dβ. 1 − δ − βd > 0, thus N 1−δ−dβ → ∞","inline":true},{"text":". Then we can get ","element":"span"},{"href":"#id-91","text":"(91) ","element":"a"},{"text":"using ","element":"span"},{"href":"#id-92","text":"(90)","element":"a"},{"text":".","element":"span"}],[{"text":"Moreover, we can show the following Lemma:","element":"span"}],[{"id":"id-95","style":{"fontWeight":"bold"},"text":"Lemma 8.","element":"span"}],[{"style":{"width":"64%"},"width":1281,"height":73,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-4.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Please see Appendix ","element":"span"},{"href":"#id-93","text":"C-A.","element":"a"}],[{"text":"Based on ","element":"span"},{"href":"#id-94","text":"(89)","element":"a"},{"text":", ","element":"span"},{"href":"#id-91","text":"(91) ","element":"a"},{"text":"and Lemma ","element":"span"},{"href":"#id-95","text":"8, ","element":"a"},{"style":{"height":21.84},"width":316.12,"height":54.59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-5.png","element":"img","alt":" E[ln2(ϵ/ρ)] → 0","inline":true},{"text":". Therefore ","element":"span"},{"href":"#id-96","text":"(86) ","element":"a"},{"text":"becomes","element":"span"}],[{"id":"id-106","style":{"width":"78%"},"width":1575,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-6.png","element":"img"}],[{"text":"Similar results hold for ","element":"span"},{"style":{"height":20.54},"width":170.4,"height":51.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-7.png","element":"img","alt":" E[(U ′′)2]","inline":true},{"text":". Hence ","element":"span"},{"href":"#id-97","text":"(85) ","element":"a"},{"text":"becomes","element":"span"}],[{"style":{"width":"67%"},"width":1339,"height":144,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-8.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Bound of ","element":"span"},{"style":{"height":17.93},"width":217.12,"height":44.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-9.png","element":"img","alt":" I2. Let Y′1 ","inline":true,"padRight":true},{"text":"be a sample that is i.i.d with ","element":"span"},{"style":{"height":18.33},"width":652.7,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-10.png","element":"img","alt":" Y1, . . . , YM. Define ν′i as the k","inline":true},{"text":"-th nearest ","element":"span"},{"text":"neighbor distance of ","element":"span"},{"style":{"height":19.2},"width":1095.2,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-11.png","element":"img","alt":" Xi among {Y′1, Y2, . . . , YM} for i = 1, . . . , N. Let X′1 ","inline":true,"padRight":true},{"text":"be a sample that is i.i.d ","element":"span"},{"text":"with ","element":"span"},{"style":{"height":16.8},"width":235.06,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-12.png","element":"img","alt":" X1, . . . , XN","inline":true},{"text":", and define ","element":"span"},{"style":{"height":18.33},"width":224.62,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-13.png","element":"img","alt":" ν′′1 as the k","inline":true},{"text":"-th nearest neighbor distance of ","element":"span"},{"style":{"height":19.2},"width":529.08,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-14.png","element":"img","alt":" X′1 among {Y1, . . . , YM}.","inline":true,"padRight":true},{"text":"Then from Efron-Stein inequality,","element":"span"}],[{"id":"id-98","style":{"width":"89%"},"width":1790,"height":571,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/23-15.png","element":"img"}],[{"text":"To bound the right hand side of ","element":"span"},{"href":"#id-98","text":"(95)","element":"a"},{"text":", we first make the following definitions:","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Define two sets ","element":"span"},{"style":{"height":20.47},"width":375.58,"height":51.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-0.png","element":"img","alt":" S1 ⊂ Rd, S′1 ⊂ Rd:","inline":true}],[{"style":{"width":"83%"},"width":1662,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-1.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Definition 2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Define three events:","element":"span"}],[{"id":"id-103","style":{"width":"81%"},"width":1620,"height":364,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-2.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"in which ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is the constant in Assumption ","element":"span"},{"href":"#id-43","style":{"fontStyle":"italic"},"text":"3 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"(d). Here, for any set ","element":"span"},{"style":{"height":23.68},"width":695.55,"height":59.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-3.png","element":"img","alt":" S, nX(S) = �Ni=1 1(Xi ∈ S) is the","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"number of points from ","element":"span"},{"style":{"height":16.8},"width":235.06,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-4.png","element":"img","alt":" X1, . . . , XN","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"that are in ","element":"span"},{"style":{"height":18},"width":191.03,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-5.png","element":"img","alt":" S, and γd","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the same constant used in ","element":"span"},{"href":"#id-97","text":"(85)","element":"a"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"text":"We also denote ","element":"span"},{"style":{"height":16.07},"width":379.48,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-6.png","element":"img","alt":" E = E1 ∪ E2 ∪ E3.","inline":true,"padRight":true},{"text":"The following lemma shows that all of these three events happen with low probability.","element":"span"}],[{"id":"id-101","style":{"fontWeight":"bold"},"text":"Lemma 9. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The probabilities of ","element":"span"},{"style":{"height":16.47},"width":292.74,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-7.png","element":"img","alt":" E1, E2 and E3","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are bounded by:","element":"span"}],[{"id":"id-107","style":{"width":"86%"},"width":1720,"height":383,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Please see Appendix ","element":"span"},{"href":"#id-99","text":"C-B.","element":"a"}],[{"text":"These three bounds show that P","element":"span"},{"style":{"height":20.54},"width":921.54,"height":51.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-9.png","element":"img","alt":"(E) ≲ (M +N)−4 as long as N ln M/M → ∞","inline":true},{"text":". Moreover, we show the following lemma:","element":"span"}],[{"id":"id-102","style":{"fontWeight":"bold"},"text":"Lemma 10. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There exists a constant ","element":"span"},{"style":{"height":16.47},"width":50.5,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-10.png","element":"img","alt":" C1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that for sufficiently large ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"style":{"fontStyle":"italic"},"text":"we have","element":"span"}],[{"style":{"width":"61%"},"width":1236,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/24-11.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Please see Appendix ","element":"span"},{"href":"#id-100","text":"C-C.","element":"a"}],[{"text":"Based on Lemma ","element":"span"},{"href":"#id-101","text":"9 ","element":"a"},{"text":"and Lemma ","element":"span"},{"href":"#id-102","text":"10,","element":"a"}],[{"id":"id-104","style":{"width":"85%"},"width":1714,"height":634,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-0.png","element":"img"}],[{"text":"If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"E ","element":"span"},{"text":"does not happen, then ","element":"span"},{"style":{"height":19.2},"width":366.69,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-1.png","element":"img","alt":" ∥Xi∥, ∥Yi∥, ∥Y′1∥","inline":true,"padRight":true},{"text":"are all upper bounded by ","element":"span"},{"style":{"height":21.74},"width":512.16,"height":54.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-2.png","element":"img","alt":" (M + N + 1)(5/s). Thus νi","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":13.53},"width":42.15,"height":33.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-3.png","element":"img","alt":" ν′i ","inline":true,"padRight":true},{"text":"are all upper bounded by ","element":"span"},{"style":{"height":21.74},"width":362.27,"height":54.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-4.png","element":"img","alt":" 2(M + N + 1)(5/s)","inline":true},{"text":". Besides, from ","element":"span"},{"href":"#id-103","text":"(99)","element":"a"},{"text":", they are both lower bounded ","element":"span"},{"text":"by ","element":"span"},{"style":{"height":24.44},"width":271.66,"height":61.09,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-5.png","element":"img","alt":" (M + N)− k+5dk","inline":true,"padRight":true},{"text":". There are at most ","element":"span"},{"style":{"height":19.6},"width":346.33,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-6.png","element":"img","alt":" nX(S1) + nX(S′1)","inline":true,"padRight":true},{"text":"points such that ","element":"span"},{"style":{"height":18.33},"width":287.58,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-7.png","element":"img","alt":" νi ̸= ν′i. Hence","inline":true}],[{"id":"id-105","style":{"width":"104%"},"width":2085,"height":290,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-8.png","element":"img"}],[{"text":"Combining ","element":"span"},{"href":"#id-104","text":"(105) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-105","text":"(106)","element":"a"},{"text":", we have","element":"span"}],[{"style":{"width":"64%"},"width":1291,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-9.png","element":"img"}],[{"text":"Then ","element":"span"},{"style":{"height":16.07},"width":54.58,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-10.png","element":"img","alt":" I22","inline":true,"padRight":true},{"text":"can be bounded by:","element":"span"}],[{"style":{"width":"75%"},"width":1515,"height":317,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-11.png","element":"img"}],[{"text":"Similar to the analysis from ","element":"span"},{"href":"#id-96","text":"(86) ","element":"a"},{"text":"to ","element":"span"},{"href":"#id-106","text":"(93)","element":"a"},{"text":", we can show that the limit of ","element":"span"},{"style":{"height":20.94},"width":324.12,"height":52.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-12.png","element":"img","alt":" E[(ln(Mcdνd1))2]","inline":true,"padRight":true},{"text":"can also be ","element":"span"},{"text":"bounded by the right hand side of ","element":"span"},{"href":"#id-106","text":"(93)","element":"a"},{"text":". Therefore","element":"span"}],[{"style":{"width":"72%"},"width":1454,"height":269,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-13.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"78%"},"width":1569,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/25-14.png","element":"img"}],[{"id":"id-93","style":{"width":"100%"},"width":1996,"height":278,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/26-0.png","element":"img"}],[{"text":"Then","element":"span"}],[{"style":{"width":"99%"},"width":1993,"height":718,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/26-1.png","element":"img"}],[{"text":"Define","element":"span"}],[{"style":{"width":"89%"},"width":1776,"height":207,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/26-2.png","element":"img"}],[{"text":"It can be shown that P","element":"span"},{"style":{"height":19.6},"width":132.79,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/26-3.png","element":"img","alt":"(ϵ > e","inline":true}],[{"text":"dominated convergence theorem,","element":"span"}],[{"id":"id-99","style":{"width":"99%"},"width":1994,"height":435,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/26-4.png","element":"img"}],[{"text":"Similar bound holds for ","element":"span"},{"style":{"height":21.74},"width":1085.4,"height":54.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/26-5.png","element":"img","alt":" ∥X′1∥ and Yi, i = 1, . . . , M. Let t = (M + N + 1)(5/s)","inline":true},{"text":", and using the union ","element":"span"},{"text":"bound, we get ","element":"span"},{"href":"#id-107","text":"(101)","element":"a"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-107","text":"(102)","element":"a"},{"text":". Since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is bounded by ","element":"span"},{"style":{"height":21.01},"width":1179.83,"height":52.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/26-6.png","element":"img","alt":" Ug, we have Pg(B(x, r)) ≤ Ugcdrd for any x and r > 0. Let","inline":true},{"style":{"height":24.44},"width":374.68,"height":61.09,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/26-7.png","element":"img","alt":"r0 = (M + N)− k+5dk","inline":true,"padRight":true},{"text":", then for sufficiently large ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"58%"},"width":1167,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/26-8.png","element":"img"}],[{"text":"Hence from Chernoff inequality,","element":"span"}],[{"style":{"width":"84%"},"width":1679,"height":128,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-0.png","element":"img"}],[{"text":"Then ","element":"span"},{"href":"#id-107","text":"(102) ","element":"a"},{"text":"can be obtained by calculating the union bound. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-107","text":"(103)","element":"a"},{"text":". We first prove ","element":"span"},{"href":"#id-107","text":"(103) ","element":"a"},{"text":"under the condition that we are using ","element":"span"},{"style":{"height":16.47},"width":36.65,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-1.png","element":"img","alt":" ℓ2","inline":true,"padRight":true},{"text":"norm first. We will then generalize the result to the case with arbitrary norm. Define","element":"span"}],[{"style":{"width":"83%"},"width":1668,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-2.png","element":"img"}],[{"text":"then ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"style":{"height":19.6},"width":488.43,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-3.png","element":"img","alt":"1 = S(Y1), S′1 = S(Y′1).","inline":true}],[{"text":"Recall that ","element":"span"},{"style":{"height":13.2},"width":42.28,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-4.png","element":"img","alt":" γd","inline":true,"padRight":true},{"text":"is defined as the minimum number of cones with angle ","element":"span"},{"style":{"height":19.2},"width":74.69,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-5.png","element":"img","alt":" π/6","inline":true,"padRight":true},{"text":"that can cover ","element":"span"},{"style":{"height":16.14},"width":174.41,"height":40.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-6.png","element":"img","alt":" Rd. Now","inline":true,"padRight":true},{"text":"we pick any ","element":"span"},{"style":{"height":19.74},"width":140.12,"height":49.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-7.png","element":"img","alt":" y ∈ Rd","inline":true},{"text":", and divide ","element":"span"},{"style":{"height":20.14},"width":203.75,"height":50.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-8.png","element":"img","alt":" Rd into γd","inline":true,"padRight":true},{"text":"cones with angle ","element":"span"},{"style":{"height":19.2},"width":74.69,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-9.png","element":"img","alt":" π/6","inline":true},{"text":", such that ","element":"span"},{"style":{"fontWeight":"bold"},"text":"y ","element":"span"},{"text":"is the vertex of all the cones. These cones are named as ","element":"span"},{"style":{"height":23.04},"width":1216.12,"height":57.61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-10.png","element":"img","alt":" Cj, j = 1, . . . , γd, and then ∪γdj=1Cj = Rd. Define rj such that","inline":true}],[{"style":{"width":"74%"},"width":1481,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-11.png","element":"img"}],[{"text":"Define ","element":"span"},{"style":{"height":23.68},"width":509.95,"height":59.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-12.png","element":"img","alt":" nY (S) = �Mi=1 1(Yi ∈ S)","inline":true,"padRight":true},{"text":"as the number of points from ","element":"span"},{"style":{"height":19.2},"width":290.89,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-13.png","element":"img","alt":" {Y1, . . . , YM}","inline":true,"padRight":true},{"text":"that are in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S","element":"span"},{"text":". Moreover, ","element":"span"},{"text":"define","element":"span"}],[{"style":{"width":"65%"},"width":1309,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-14.png","element":"img"}],[{"text":"Then from Chernoff inequality,","element":"span"}],[{"style":{"width":"75%"},"width":1512,"height":62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-15.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"79%"},"width":1588,"height":68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-16.png","element":"img"}],[{"text":"This result indicates that with probability at least ","element":"span"},{"style":{"height":23.46},"width":484.38,"height":58.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-17.png","element":"img","alt":" 1 − γde−k ln2 M(e ln2 M)k","inline":true},{"text":", there are at least ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"points in ","element":"span"},{"style":{"height":19.67},"width":625.6,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-18.png","element":"img","alt":" B(y, rj) ∩ Cj for j = 1, . . . , γd.","inline":true}],[{"style":{"width":"97%"},"width":1940,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-19.png","element":"img"}],[{"style":{"height":19.67},"width":965.94,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-20.png","element":"img","alt":"x ∈ Cj for some j ∈ {1, . . . , γd}. In B(y, rj)∩Cj","inline":true},{"text":", there are already at least ","element":"span"},{"style":{"height":18.72},"width":516.91,"height":46.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-21.png","element":"img","alt":" k points, Yil, l = 1, . . . , k,","inline":true,"padRight":true},{"text":"among ","element":"span"},{"style":{"height":19.67},"width":1607.7,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-22.png","element":"img","alt":" Y1, . . . , YM. Then ∥Yil − y∥ < rj for l = 1, . . . , k, while ∥x − y∥ ≥ rj. Denote θ","inline":true,"padRight":true},{"text":"as the angle between vector ","element":"span"},{"style":{"height":19.67},"width":1501.49,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-23.png","element":"img","alt":" Yil − y and x − y. Since Yil ∈ Cj and x ∈ Cj, we have θ < π/3, and thus","inline":true}],[{"style":{"width":"83%"},"width":1670,"height":195,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-24.png","element":"img"}],[{"text":"which indicates that ","element":"span"},{"style":{"height":19.52},"width":1111.11,"height":48.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-25.png","element":"img","alt":" ∥y − x∥ > ∥Yil − x∥ for l = 1, . . . , k. Yil, l = 1, . . . , k","inline":true,"padRight":true},{"text":"are all closer to ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"than ","element":"span"},{"style":{"fontWeight":"bold"},"text":"y","element":"span"},{"text":", therefore ","element":"span"},{"style":{"fontWeight":"bold"},"text":"y ","element":"span"},{"text":"can not be one of the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"nearest neighbors of ","element":"span"},{"style":{"height":19.6},"width":310.73,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-26.png","element":"img","alt":" x, i.e. x /∈ S(y)","inline":true},{"text":". Recall that ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"is arbitrarily picked outside ","element":"span"},{"style":{"height":19.6},"width":507.21,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-27.png","element":"img","alt":" S0(y), thus S(y) ⊂ S0(y)","inline":true},{"text":". Therefore with probability at least ","element":"span"},{"style":{"height":23.46},"width":491.29,"height":58.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-28.png","element":"img","alt":" 1−γde−k ln2 M(e ln2 M)k,","inline":true}],[{"style":{"width":"83%"},"width":1659,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/27-29.png","element":"img"}],[{"text":"Using Assumption ","element":"span"},{"href":"#id-43","text":"3 ","element":"a"},{"text":"(d), ","element":"span"},{"style":{"height":21.01},"width":1447.02,"height":52.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/28-0.png","element":"img","alt":" Pf(S) ≤ CPg(S) for any S ⊂ Rd. If both S(Y1) ⊂ S0(Y1) and S(Y1) ⊂","inline":true},{"style":{"height":19.6},"width":143.49,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/28-1.png","element":"img","alt":"S0(Y1)","inline":true,"padRight":true},{"text":"hold, then","element":"span"}],[{"style":{"width":"70%"},"width":1407,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/28-2.png","element":"img"}],[{"text":"Using Chernoff inequality again,","element":"span"}],[{"style":{"width":"97%"},"width":1954,"height":305,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/28-3.png","element":"img"}],[{"text":"Therefore","element":"span"}],[{"style":{"width":"91%"},"width":1829,"height":386,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/28-4.png","element":"img"}],[{"text":"The proof is complete.","element":"span"}],[{"id":"id-100","style":{"width":"99%"},"width":1990,"height":251,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/28-5.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"63%"},"width":1259,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/28-6.png","element":"img"}],[{"text":"then","element":"span"}],[{"style":{"width":"81%"},"width":1624,"height":234,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/28-7.png","element":"img"}],[{"style":{"width":"96%"},"width":1928,"height":960,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/29-0.png","element":"img"}],[{"text":"In (a), we use ","element":"span"},{"style":{"height":19.74},"width":180.64,"height":49.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/29-1.png","element":"img","alt":" ∥x∥ < et","inline":true}],[{"text":"according to ","element":"span"},{"href":"#id-100","text":"(130)","element":"a"},{"text":", ","element":"span"},{"style":{"height":35.97},"width":658,"height":89.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/29-2.png","element":"img","alt":" (21+seK)12 M exp�− 14sMt141�< 1.","inline":true}],[{"style":{"width":"97%"},"width":1945,"height":415,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/29-3.png","element":"img"}],[{"text":"Thus","element":"span"}],[{"style":{"width":"73%"},"width":1471,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/29-4.png","element":"img"}],[{"text":"in which the last step uses ","element":"span"},{"href":"#id-108","text":"(66)","element":"a"},{"text":".","element":"span"}],[{"id":"id-49","style":{"width":"23%"},"width":459,"height":89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/29-5.png","element":"img"}],[{"text":"In this section, we show the minimax convergence rate of KL divergence estimator for distributions","element":"span"}],[{"text":"with bounded support and densities bounded away from zero. The proof can be divided into proving the following three bounds separately:","element":"span"}],[{"id":"id-123","style":{"width":"76%"},"width":1534,"height":255,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/29-6.png","element":"img"}],[{"style":{"width":"97%"},"width":1945,"height":248,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-0.png","element":"img"}],[{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":") = 1","element":"span"},{"text":". Then","element":"span"}],[{"style":{"width":"81%"},"width":1626,"height":422,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-1.png","element":"img"}],[{"text":"Therefore, for sufficiently small ","element":"span"},{"style":{"height":19.6},"width":687.82,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-2.png","element":"img","alt":" δ, D(f2||g) − D(f1||g) ≥ (ln 3)δ/4","inline":true},{"text":". Moreover,","element":"span"}],[{"style":{"width":"74%"},"width":1483,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-3.png","element":"img"}],[{"text":"By Taylor expansion, it can be shown that ","element":"span"},{"style":{"height":20.54},"width":1113.06,"height":51.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-4.png","element":"img","alt":" ln(1+2δ/3) ≥ 2δ/3−δ2/9, and ln(1−2δ) ≥ −2δ +2δ2,","inline":true,"padRight":true},{"text":"thus","element":"span"}],[{"style":{"width":"60%"},"width":1203,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-5.png","element":"img"}],[{"text":"Therefore, from Le Cam’s lemma ","element":"span"},{"href":"#id-47","referenceIndex":29,"text":"[29]","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"79%"},"width":1587,"height":234,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-6.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":21.8},"width":325.82,"height":54.49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-7.png","element":"img","alt":" δ = 1/√N, then","inline":true}],[{"style":{"width":"60%"},"width":1200,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-8.png","element":"img"}],[{"text":"Similarly, let","element":"span"}],[{"style":{"width":"90%"},"width":1801,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-9.png","element":"img"}],[{"text":"for ","element":"span"},{"style":{"height":20.54},"width":198.66,"height":51.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-10.png","element":"img","alt":" x ∈ [0, 1]d","inline":true},{"text":". Then it can be shown that","element":"span"}],[{"style":{"width":"99%"},"width":1993,"height":226,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/30-11.png","element":"img"}],[{"text":"The proof has similar idea with ","element":"span"},{"href":"#id-19","referenceIndex":26,"text":"[26] ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23]","element":"a"},{"text":". To begin with, define","element":"span"}],[{"id":"id-110","style":{"width":"86%"},"width":1726,"height":510,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-0.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":19.6},"width":684.05,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-1.png","element":"img","alt":" Qa(x) = 1/vd for x ∈ B(0, 1), vd","inline":true,"padRight":true},{"text":"is the unit ball volume, thus","element":"span"},{"style":{"height":21.6},"width":517.87,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-2.png","element":"img","alt":"�Qa(x)dx = 1. C1 and c","inline":true,"padRight":true},{"text":"are two constants. ","element":"span"},{"style":{"height":19.6},"width":333.52,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-3.png","element":"img","alt":" α ∈ (0, 1) and D","inline":true,"padRight":true},{"text":"decrease with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"text":", while ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"increases with ","element":"span"},{"style":{"height":16.8},"width":448.5,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-4.png","element":"img","alt":" N. ai, i = 1, . . . , n are","inline":true,"padRight":true},{"text":"selected such that ","element":"span"},{"style":{"height":19.67},"width":1042.39,"height":49.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-5.png","element":"img","alt":" ∥ai − aj∥ > 2D for all i, j ∈ {1, . . . , m} and i ̸= j","inline":true},{"text":". It can be checked that both ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"integrate to ","element":"span"},{"text":"1","element":"span"},{"text":". The condition ","element":"span"},{"style":{"height":20.94},"width":496.58,"height":52.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-6.png","element":"img","alt":" ui/(mDd) ∈ {0} ∪ (c, 1)","inline":true,"padRight":true},{"text":"is designed such that the density in the support is bounded away from zero, i.e. if ","element":"span"},{"style":{"height":19.6},"width":499.61,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-7.png","element":"img","alt":" f(x) > 0, then f(x) ≥ c","inline":true},{"text":". Moreover, the surface area of the support is ","element":"span"},{"style":{"height":20.94},"width":574.25,"height":52.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-8.png","element":"img","alt":" sd(1 + mDd−1), in which sd","inline":true,"padRight":true},{"text":"is the surface area of unit ball, and ","element":"span"},{"style":{"height":16.47},"width":309.26,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-9.png","element":"img","alt":" sd = dvd. With","inline":true,"padRight":true},{"text":"the condition ","element":"span"},{"style":{"height":18.61},"width":348.71,"height":46.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-10.png","element":"img","alt":" 1 < mDd−1 < C1","inline":true},{"text":", the surface area of the supports of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"are both upper bounded by ","element":"span"},{"style":{"height":16.47},"width":91.98,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-11.png","element":"img","alt":" sdC1","inline":true},{"text":". Therefore, for sufficiently large ","element":"span"},{"style":{"height":18.47},"width":306.51,"height":46.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-12.png","element":"img","alt":" Hf, Hg, Uf, Ug","inline":true,"padRight":true},{"text":"and sufficiently small ","element":"span"},{"style":{"height":18.87},"width":410.66,"height":47.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-13.png","element":"img","alt":" Lf and Lg, Fa ∈ Sa.","inline":true,"padRight":true},{"text":"Define","element":"span"}],[{"style":{"width":"77%"},"width":1553,"height":103,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-14.png","element":"img"}],[{"text":"Recall that ","element":"span"},{"style":{"height":19.6},"width":202.99,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-15.png","element":"img","alt":" Ra(N, M)","inline":true,"padRight":true},{"text":"is defined as the minimax mean square error over ","element":"span"},{"style":{"height":16.47},"width":188.92,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-16.png","element":"img","alt":" Sa, hence","inline":true}],[{"style":{"width":"97%"},"width":1945,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-17.png","element":"img"}],[{"text":"Define","element":"span"}],[{"id":"id-118","style":{"width":"74%"},"width":1487,"height":103,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-18.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":19.6},"width":259.92,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-19.png","element":"img","alt":" N ′ ∼ Poi(N)","inline":true},{"text":", Poi is the Poisson distribution. Then we have the following lemma:","element":"span"}],[{"id":"id-112","style":{"fontWeight":"bold"},"text":"Lemma 11.","element":"span"}],[{"id":"id-119","style":{"width":"77%"},"width":1537,"height":94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-20.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Please refer to Appendix ","element":"span"},{"href":"#id-109","text":"D-A ","element":"a"},{"text":"for details.","element":"span"}],[{"text":"Furthermore, define","element":"span"}],[{"id":"id-129","style":{"width":"94%"},"width":1885,"height":510,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/31-21.png","element":"img"}],[{"text":"Comparing with the definition of ","element":"span"},{"style":{"height":16.47},"width":52.37,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-0.png","element":"img","alt":" Fa","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-110","text":"(148)","element":"a"},{"text":", the only difference is that we now allow ","element":"span"},{"style":{"height":20.48},"width":286.08,"height":51.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-1.png","element":"img","alt":" (1/m) �mi=1 ui","inline":true,"padRight":true},{"text":"to deviate slightly from ","element":"span"},{"style":{"height":9.2},"width":30,"height":23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-2.png","element":"img","alt":" α","inline":true},{"text":". As a result, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is not necessarily a pdf, since it is not normalized. However, we extend the definition of KL divergence ","element":"span"},{"style":{"height":21.6},"width":692.42,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-3.png","element":"img","alt":" D(f||g) =�f(x) ln(f(x)/g(x))dx","inline":true,"padRight":true},{"text":"here. Define","element":"span"}],[{"style":{"width":"78%"},"width":1561,"height":95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-4.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":21.6},"width":462.7,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-5.png","element":"img","alt":" N ′ ∼ Poi(N�f(x)dx)","inline":true},{"text":". Then the number of samples falling on any two disjoint intervals are mutually independent. ","element":"span"},{"style":{"height":16.07},"width":70.63,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-6.png","element":"img","alt":" Ra2","inline":true,"padRight":true},{"text":"can be lower bounded by ","element":"span"},{"style":{"height":16.07},"width":70.63,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-7.png","element":"img","alt":" Ra3","inline":true,"padRight":true},{"text":"with the following lemma:","element":"span"}],[{"id":"id-113","style":{"fontWeight":"bold"},"text":"Lemma 12. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"If ","element":"span"},{"style":{"height":19.2},"width":269.93,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-8.png","element":"img","alt":" ϵ < α/2, then","inline":true}],[{"style":{"width":"85%"},"width":1712,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-9.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Please refer to Appendix ","element":"span"},{"href":"#id-111","text":"D-B ","element":"a"},{"text":"for details.","element":"span"}],[{"text":"With Lemma ","element":"span"},{"href":"#id-112","text":"11 ","element":"a"},{"text":"and Lemma ","element":"span"},{"href":"#id-113","text":"12, ","element":"a"},{"text":"the problem of bounding ","element":"span"},{"style":{"height":19.6},"width":203,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-10.png","element":"img","alt":" Ra(N, M)","inline":true,"padRight":true},{"text":"can be converted to bounding ","element":"span"},{"style":{"height":19.6},"width":257.21,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-11.png","element":"img","alt":"Ra3(M, N, ϵ)","inline":true},{"text":". We then show the following lemma, which is slightly modified from Lemma 11 in ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23]","element":"a"},{"text":".","element":"span"}],[{"id":"id-114","style":{"width":"99%"},"width":1993,"height":361,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-12.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Let","element":"span"}],[{"id":"id-116","style":{"width":"58%"},"width":1159,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-13.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"then","element":"span"}],[{"id":"id-115","style":{"width":"94%"},"width":1889,"height":275,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-14.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"in which ","element":"span"},{"style":{"height":19.6},"width":271.06,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-15.png","element":"img","alt":" h(Qa) = ln vd","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the differential entropy of ","element":"span"},{"style":{"height":17.2},"width":69.07,"height":43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-16.png","element":"img","alt":" Qa.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"The proof is exactly the same as the proof of Lemma 11 in ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23]","element":"a"},{"text":". Condition (1) is different from the corresponding condition in ","element":"span"},{"href":"#id-16","referenceIndex":23,"text":"[23]","element":"a"},{"text":", but such difference does not affect the proof.","element":"span"}],[{"style":{"width":"97%"},"width":1947,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-17.png","element":"img"}],[{"text":"let","element":"span"}],[{"style":{"width":"75%"},"width":1506,"height":190,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/32-18.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":16.47},"width":37.74,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-0.png","element":"img","alt":" δ0","inline":true,"padRight":true},{"text":"denotes the distribution that puts all the mass on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"= 0","element":"span"},{"text":". Now we assume ","element":"span"},{"style":{"height":20.94},"width":354.49,"height":52.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-1.png","element":"img","alt":" α ≤ (1−ϵ)mDdη.","inline":true,"padRight":true},{"text":"Let ","element":"span"},{"style":{"height":19.2},"width":397.1,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-2.png","element":"img","alt":" λ = α/η, then U, U ′ ","inline":true,"padRight":true},{"text":"are supported in ","element":"span"},{"style":{"height":19.2},"width":97.71,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-3.png","element":"img","alt":" [0, λ]","inline":true},{"text":", and condition (1) in Lemma ","element":"span"},{"href":"#id-114","text":"13 ","element":"a"},{"text":"is satisfied. Then from Lemma 4 in ","element":"span"},{"href":"#id-19","referenceIndex":26,"text":"[26]","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"68%"},"width":1361,"height":244,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-4.png","element":"img"}],[{"text":"and ","element":"span"},{"style":{"height":20.14},"width":635.83,"height":50.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-5.png","element":"img","alt":" E[U j] = E[U ′j] for j = 1, . . . , L","inline":true},{"text":". In particular, ","element":"span"},{"style":{"height":19.2},"width":725.65,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-6.png","element":"img","alt":" E[U] = E[U ′] = α. When X and X′ ","inline":true,"padRight":true},{"text":"are properly selected, according to eq.(34) in ","element":"span"},{"href":"#id-19","referenceIndex":26,"text":"[26]","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"99%"},"width":1993,"height":406,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-7.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":19.6},"width":973.57,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-8.png","element":"img","alt":" x = 1 − (t + 1)/(a + 1), and η = (a − 1)/(a + 1)","inline":true},{"text":", then the above equation can be transformed to the following one:","element":"span"}],[{"style":{"width":"75%"},"width":1500,"height":159,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-9.png","element":"img"}],[{"text":"i.e. there exist two constants ","element":"span"},{"style":{"height":19.6},"width":302.96,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-10.png","element":"img","alt":" c1(η) and c2(η)","inline":true,"padRight":true},{"text":"that depend on ","element":"span"},{"style":{"height":13.2},"width":24,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-11.png","element":"img","alt":" η","inline":true},{"text":", such that","element":"span"}],[{"style":{"width":"68%"},"width":1371,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-12.png","element":"img"}],[{"text":"Hence","element":"span"}],[{"style":{"width":"58%"},"width":1174,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-13.png","element":"img"}],[{"text":"To bound the total variation term in ","element":"span"},{"href":"#id-115","text":"(159)","element":"a"},{"text":", we use the following lemma.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Lemma 14. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"(","element":"span"},{"href":"#id-19","referenceIndex":26,"style":{"fontStyle":"italic"},"text":"[26]","element":"a"},{"style":{"fontStyle":"italic"},"text":", Lemma 3) Let ","element":"span"},{"style":{"height":16.8},"width":105.34,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-14.png","element":"img","alt":" Z, Z′ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be random variables on ","element":"span"},{"style":{"height":20.14},"width":675.87,"height":50.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-15.png","element":"img","alt":" [0, A]. If E[V j] = E[V ′j] for j =","inline":true}],[{"style":{"width":"99%"},"width":1989,"height":197,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-16.png","element":"img"}],[{"text":"Substitute ","element":"span"},{"style":{"height":19.2},"width":1189.07,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-17.png","element":"img","alt":" Z, Z′ with NU/m and NU ′/m, and let A = Nλ/m, we get","inline":true}],[{"id":"id-137","style":{"width":"88%"},"width":1759,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-18.png","element":"img"}],[{"text":"in which the last step holds because ","element":"span"},{"style":{"height":20.94},"width":340.82,"height":52.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/33-19.png","element":"img","alt":" λ ≤ (1 − ϵ)mDd.","inline":true}],[{"id":"id-124","style":{"width":"97%"},"width":1945,"height":334,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-0.png","element":"img"}],[{"text":"and from ","element":"span"},{"href":"#id-110","text":"(148)","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"70%"},"width":1415,"height":63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-1.png","element":"img"}],[{"text":"and","element":"span"}],[{"id":"id-125","style":{"width":"65%"},"width":1304,"height":140,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-2.png","element":"img"}],[{"text":"Then","element":"span"}],[{"style":{"width":"66%"},"width":1327,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-3.png","element":"img"}],[{"text":"Note that the second, third and fourth term in the bracket at the right hand side of ","element":"span"},{"href":"#id-115","text":"(159) ","element":"a"},{"text":"converge to zero. In particular, for the second term,","element":"span"}],[{"style":{"width":"65%"},"width":1303,"height":119,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-4.png","element":"img"}],[{"text":"For the third term,","element":"span"}],[{"style":{"width":"88%"},"width":1765,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-5.png","element":"img"}],[{"text":"and it is straightforward to show that the fourth term also converges to zero. Therefore, from Lemma ","element":"span"},{"href":"#id-114","text":"13,","element":"a"}],[{"style":{"width":"76%"},"width":1535,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-6.png","element":"img"}],[{"text":"Pick ","element":"span"},{"style":{"height":20.54},"width":424.96,"height":51.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-7.png","element":"img","alt":" η such that c2(η) = e2","inline":true},{"text":". According to condition 1) in the statement of Lemma ","element":"span"},{"href":"#id-114","text":"13, ","element":"a"},{"text":"this is possible if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"text":"is sufficiently small. Then","element":"span"}],[{"id":"id-117","style":{"width":"78%"},"width":1561,"height":65,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-8.png","element":"img"}],[{"text":"From Lemma ","element":"span"},{"href":"#id-113","text":"12, ","element":"a"},{"text":"and note that from ","element":"span"},{"href":"#id-116","text":"(158)","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"66%"},"width":1331,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-9.png","element":"img"}],[{"text":"which converges sufficiently fast, thus ","element":"span"},{"style":{"height":19.6},"width":288.68,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-10.png","element":"img","alt":" Ra2(N(1 − ϵ))","inline":true,"padRight":true},{"text":"can also be lower bounded with the right hand side of ","element":"span"},{"href":"#id-117","text":"(179)","element":"a"},{"text":". From ","element":"span"},{"href":"#id-118","text":"(150) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-119","text":"(152)","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"97%"},"width":1945,"height":149,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/34-11.png","element":"img"}],[{"text":"Define","element":"span"}],[{"style":{"width":"82%"},"width":1655,"height":509,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/35-0.png","element":"img"}],[{"text":"Then for any ","element":"span"},{"style":{"height":19.6},"width":226,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/35-1.png","element":"img","alt":" (f, g) ∈ Ga,","inline":true}],[{"id":"id-132","style":{"width":"74%"},"width":1492,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/35-2.png","element":"img"}],[{"text":"Define","element":"span"}],[{"style":{"width":"76%"},"width":1534,"height":95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/35-3.png","element":"img"}],[{"text":"then for sufficiently large ","element":"span"},{"style":{"height":18.47},"width":47.8,"height":46.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/35-4.png","element":"img","alt":" Ug","inline":true,"padRight":true},{"text":"and sufficiently low ","element":"span"},{"style":{"height":19.67},"width":756.76,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/35-5.png","element":"img","alt":" Lg, we have Ra(N, M) ≥ Ra4(N, M).","inline":true,"padRight":true},{"text":"We use Poisson sampling again. Define","element":"span"}],[{"style":{"width":"77%"},"width":1539,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/35-6.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":19.6},"width":275.53,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/35-7.png","element":"img","alt":" M ′ ∼ Poi(M)","inline":true},{"text":". Then we have the following lemma.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Lemma 15.","element":"span"}],[{"style":{"width":"80%"},"width":1612,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/35-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Please refer to Appendix ","element":"span"},{"href":"#id-120","text":"D-C.","element":"a"}],[{"text":"Define","element":"span"}],[{"style":{"width":"91%"},"width":1820,"height":510,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/35-9.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"77%"},"width":1539,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/35-10.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":23.2},"width":480.7,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-0.png","element":"img","alt":" M ′ ∼ Poi�M�g(x)dx�","inline":true},{"text":". Then the following lemma lower bounds ","element":"span"},{"style":{"height":16.47},"width":276.62,"height":41.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-1.png","element":"img","alt":" Ra5 with Ra6:","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Lemma 16. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"If ","element":"span"},{"style":{"height":19.2},"width":269.94,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-2.png","element":"img","alt":" ϵ < α/2, then","inline":true}],[{"style":{"width":"72%"},"width":1438,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Please refer to Appendix ","element":"span"},{"href":"#id-121","text":"D-D.","element":"a"}],[{"text":"Now we bound ","element":"span"},{"style":{"height":19.6},"width":257.21,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-4.png","element":"img","alt":" Ra6(N, M, ϵ)","inline":true,"padRight":true},{"text":"with the following lemma.","element":"span"}],[{"id":"id-136","style":{"fontWeight":"bold"},"text":"Lemma 17. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":16.8},"width":104.58,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-5.png","element":"img","alt":" V, V ′ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be two random variables that satisfy the following conditions: (1) ","element":"span"},{"style":{"height":20.94},"width":1546.88,"height":52.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-6.png","element":"img","alt":" V, V ′ ∈ [ηλ, λ], in which λ ≤ (1 − ϵ)mDd, 0 < η < 1 and ηλ ≥ c(1 + ϵ)mDd;","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"(2) ","element":"span"},{"style":{"height":19.2},"width":370.43,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-7.png","element":"img","alt":" E[V ] = E[V ′] = α.","inline":true}],[{"id":"id-135","style":{"width":"97%"},"width":1946,"height":370,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Please refer to Appendix ","element":"span"},{"href":"#id-122","text":"D-E.","element":"a"}],[{"text":"Now we use eq.(34) in ","element":"span"},{"href":"#id-19","referenceIndex":26,"text":"[26] ","element":"a"},{"text":"again, which shows that there exist ","element":"span"},{"style":{"height":19.2},"width":284.68,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-9.png","element":"img","alt":" V, V ′ ∈ [ηλ, λ]","inline":true,"padRight":true},{"text":"that have matching moments up to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L","element":"span"},{"text":"-th order, such that","element":"span"}],[{"style":{"width":"74%"},"width":1489,"height":88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-10.png","element":"img"}],[{"text":"The remaining proof follows the proof of ","element":"span"},{"href":"#id-123","text":"(137)","element":"a"},{"text":". ","element":"span"},{"style":{"height":17.2},"width":330.61,"height":43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-11.png","element":"img","alt":" L, D, m, λ and α","inline":true,"padRight":true},{"text":"take the same value as the equations from ","element":"span"},{"href":"#id-124","text":"(170) ","element":"a"},{"text":"to ","element":"span"},{"href":"#id-125","text":"(174)","element":"a"},{"text":", and then we can get similar bound as ","element":"span"},{"href":"#id-123","text":"(137)","element":"a"},{"text":", replacing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N ","element":"span"},{"text":"with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":".","element":"span"}],[{"id":"id-109","style":{"width":"100%"},"width":1996,"height":770,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/36-12.png","element":"img"}],[{"text":"in which the inequality in the second step comes from Jensen’s inequality. Note that ","element":"span"},{"style":{"height":19.6},"width":308.1,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-0.png","element":"img","alt":" Ra1(N, M) is a","inline":true,"padRight":true},{"text":"nonincreasing function of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"text":", because if ","element":"span"},{"style":{"height":19.2},"width":866.24,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-1.png","element":"img","alt":" N1 < N2, given N2 samples {X1, . . . , XN2}","inline":true},{"text":", one can always pick ","element":"span"},{"style":{"height":16.07},"width":54.5,"height":40.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-2.png","element":"img","alt":" N1","inline":true,"padRight":true},{"text":"samples for the estimation, thus ","element":"span"},{"style":{"height":19.6},"width":536.9,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-3.png","element":"img","alt":" Ra1(N1, M) ≥ Ra1(N2, M)","inline":true,"padRight":true},{"text":"always holds. Therefore","element":"span"}],[{"id":"id-126","style":{"width":"71%"},"width":1417,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-4.png","element":"img"}],[{"text":"Moreover, since ","element":"span"},{"style":{"height":19.6},"width":283.34,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-5.png","element":"img","alt":" N ′ ∼ Poi(2N)","inline":true},{"text":", use Chernoff inequality, we get","element":"span"}],[{"id":"id-127","style":{"width":"68%"},"width":1368,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-6.png","element":"img"}],[{"text":"Now it remains to bound ","element":"span"},{"style":{"height":19.6},"width":465.02,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-7.png","element":"img","alt":" E[Ra1(N ′, M)|N ′ ≤ N]","inline":true},{"text":". Note that we can always let the estimator be","element":"span"}],[{"style":{"width":"76%"},"width":1532,"height":144,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-8.png","element":"img"}],[{"text":"hence","element":"span"}],[{"style":{"width":"85%"},"width":1698,"height":152,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-9.png","element":"img"}],[{"text":"From the definition of ","element":"span"},{"style":{"height":16.47},"width":52.37,"height":41.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-10.png","element":"img","alt":" Fa","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-110","text":"(148)","element":"a"},{"text":", for all ","element":"span"},{"style":{"height":19.6},"width":231.92,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-11.png","element":"img","alt":" (f, g) ∈ Fa,","inline":true}],[{"style":{"width":"99%"},"width":1992,"height":728,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-12.png","element":"img"}],[{"text":"which is the same for all ","element":"span"},{"style":{"height":19.6},"width":217.94,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-13.png","element":"img","alt":" (f, g) ∈ Fa","inline":true},{"text":". In addition,","element":"span"}],[{"id":"id-130","style":{"width":"77%"},"width":1548,"height":407,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/37-14.png","element":"img"}],[{"text":"Hence,","element":"span"}],[{"id":"id-128","style":{"width":"97%"},"width":1953,"height":601,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/38-0.png","element":"img"}],[{"text":"From ","element":"span"},{"href":"#id-109","text":"(193)","element":"a"},{"text":", ","element":"span"},{"href":"#id-126","text":"(194)","element":"a"},{"text":", ","element":"span"},{"href":"#id-127","text":"(195) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-128","text":"(201)","element":"a"},{"text":",","element":"span"}],[{"id":"id-111","style":{"width":"99%"},"width":1994,"height":674,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/38-1.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"77%"},"width":1546,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/38-2.png","element":"img"}],[{"text":"Then from ","element":"span"},{"href":"#id-129","text":"(154)","element":"a"},{"text":", ","element":"span"},{"style":{"height":21.6},"width":1010.88,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/38-3.png","element":"img","alt":" |q − 1| < ϵ/α,�f ∗(x)dx = 1, and f ∗ ∈ Fa. Hence","inline":true}],[{"id":"id-131","style":{"width":"95%"},"width":1908,"height":340,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/38-4.png","element":"img"}],[{"text":"Now we bound the second term.","element":"span"}],[{"style":{"width":"91%"},"width":1826,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/38-5.png","element":"img"}],[{"text":"According to ","element":"span"},{"href":"#id-130","text":"(200)","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"81%"},"width":1620,"height":795,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/39-0.png","element":"img"}],[{"text":"in which (a) is obtained by maximizing ","element":"span"},{"style":{"height":20.48},"width":433.49,"height":51.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/39-1.png","element":"img","alt":" | �mi=1(ui/q) ln(ui/q)|","inline":true,"padRight":true},{"text":"under the restriction ","element":"span"},{"style":{"height":20.48},"width":412.24,"height":51.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/39-2.png","element":"img","alt":" (1/m) �mi=1(ui/q) =","inline":true},{"style":{"height":9.2},"width":30,"height":23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/39-3.png","element":"img","alt":"α","inline":true},{"text":", (b) comes from ","element":"span"},{"style":{"height":19.2},"width":266.24,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/39-4.png","element":"img","alt":" |q − 1| < ϵ/α","inline":true},{"text":", and (c) uses ","element":"span"},{"style":{"height":19.2},"width":158.39,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/39-5.png","element":"img","alt":" ϵ < α/2","inline":true},{"text":". Moreover,","element":"span"}],[{"style":{"width":"87%"},"width":1742,"height":268,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/39-6.png","element":"img"}],[{"text":"Hence","element":"span"}],[{"style":{"width":"77%"},"width":1546,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/39-7.png","element":"img"}],[{"text":"Therefore","element":"span"}],[{"id":"id-120","style":{"width":"99%"},"width":1990,"height":339,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/39-8.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"93%"},"width":1862,"height":575,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/39-9.png","element":"img"}],[{"text":"The proof is complete.","element":"span"}],[{"id":"id-121","style":{"width":"99%"},"width":1994,"height":279,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/40-0.png","element":"img"}],[{"text":"define ","element":"span"},{"style":{"height":20.48},"width":502.01,"height":51.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/40-1.png","element":"img","alt":" q = (�mi=1 vi)/(mα), and","inline":true}],[{"style":{"width":"77%"},"width":1545,"height":130,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/40-2.png","element":"img"}],[{"text":"Similar to ","element":"span"},{"href":"#id-131","text":"(206)","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"86%"},"width":1728,"height":94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/40-3.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"80%"},"width":1603,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/40-4.png","element":"img"}],[{"text":"in which the last step holds since ","element":"span"},{"style":{"height":19.2},"width":527.21,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/40-5.png","element":"img","alt":" |q − 1| < ϵ/α and ϵ < α/2","inline":true},{"text":". The proof is complete.","element":"span"}],[{"id":"id-122","style":{"width":"99%"},"width":1994,"height":430,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/40-6.png","element":"img"}],[{"text":"Define two events:","element":"span"}],[{"id":"id-133","style":{"width":"83%"},"width":1657,"height":304,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/40-7.png","element":"img"}],[{"text":"then","element":"span"}],[{"style":{"width":"76%"},"width":1527,"height":146,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/40-8.png","element":"img"}],[{"text":"Consider that ","element":"span"},{"style":{"height":19.6},"width":785.03,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/40-9.png","element":"img","alt":" | ln V | ∈ (ln(1/λ), ln(1/(ηλ))), we have","inline":true}],[{"style":{"width":"66%"},"width":1327,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/40-10.png","element":"img"}],[{"text":"hence for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"2","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"81%"},"width":1623,"height":346,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-0.png","element":"img"}],[{"text":"Therefore","element":"span"}],[{"style":{"width":"69%"},"width":1392,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-1.png","element":"img"}],[{"text":"According to ","element":"span"},{"href":"#id-132","text":"(183)","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"80%"},"width":1607,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-2.png","element":"img"}],[{"text":"From the definition of ","element":"span"},{"style":{"height":16},"width":119.62,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-3.png","element":"img","alt":" E, E′ ","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-133","text":"(220) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-133","text":"(221)","element":"a"},{"text":", if ","element":"span"},{"style":{"height":16.8},"width":111.91,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-4.png","element":"img","alt":" E, E′ ","inline":true,"padRight":true},{"text":"happen, then","element":"span"}],[{"style":{"width":"66%"},"width":1335,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-5.png","element":"img"}],[{"text":"Denote ","element":"span"},{"style":{"height":17.67},"width":45.28,"height":44.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-6.png","element":"img","alt":" π∗1 ","inline":true,"padRight":true},{"text":"as the distribution of samples according to ","element":"span"},{"style":{"height":12.8},"width":39.42,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-7.png","element":"img","alt":" g1 ","inline":true,"padRight":true},{"text":"conditional on ","element":"span"},{"style":{"height":18.33},"width":197.13,"height":45.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-8.png","element":"img","alt":" E, and π∗2 ","inline":true,"padRight":true},{"text":"as the distribution ","element":"span"},{"text":"according to ","element":"span"},{"style":{"height":12.8},"width":39.42,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-9.png","element":"img","alt":" g2","inline":true,"padRight":true},{"text":"conditional on ","element":"span"},{"style":{"height":13.2},"width":53.46,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-10.png","element":"img","alt":" E′","inline":true},{"text":". Then under ","element":"span"},{"style":{"height":17.67},"width":135.1,"height":44.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-11.png","element":"img","alt":" π∗1, π∗2,","inline":true}],[{"style":{"width":"74%"},"width":1479,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-12.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"81%"},"width":1627,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-13.png","element":"img"}],[{"text":"Then according to Le Cam’s lemma,","element":"span"}],[{"style":{"width":"93%"},"width":1861,"height":312,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-14.png","element":"img"}],[{"text":"The proof is complete.","element":"span"}],[{"text":"A","element":"span"},{"text":"PPENDIX ","element":"span"},{"text":"E P","element":"span"},{"text":"ROOF OF ","element":"span"},{"text":"T","element":"span"},{"text":"HEOREM ","element":"span"},{"href":"#id-52","text":"5 ","element":"a"},{"text":"Similar to Theorem ","element":"span"},{"href":"#id-48","text":"4, ","element":"a"},{"text":"the proof can be divided into proving the following three bounds:","element":"span"}],[{"id":"id-53","style":{"width":"68%"},"width":1376,"height":254,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/41-15.png","element":"img"}],[{"style":{"width":"99%"},"width":1993,"height":236,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-0.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":11.67},"width":43.61,"height":29.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-1.png","element":"img","alt":" x1","inline":true,"padRight":true},{"text":"is the value of the first coordinate of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":", and","element":"span"}],[{"style":{"width":"99%"},"width":1993,"height":490,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-2.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"67%"},"width":1352,"height":290,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-3.png","element":"img"}],[{"text":"From Le Cam’s lemma,","element":"span"}],[{"style":{"width":"79%"},"width":1581,"height":371,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-4.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":21.79},"width":214.19,"height":54.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-5.png","element":"img","alt":" δ = 1/√N","inline":true},{"text":", for sufficiently large ","element":"span"},{"style":{"height":19.6},"width":507,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-6.png","element":"img","alt":" N, Rb(N, M) ≥ 1/(32N)","inline":true},{"text":". Similarly, let","element":"span"}],[{"style":{"width":"65%"},"width":1300,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-7.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"70%"},"width":1410,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-8.png","element":"img"}],[{"text":"in which ","element":"span"},{"style":{"height":19.6},"width":292.18,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-9.png","element":"img","alt":" σ1 = (1 + δ)σ2","inline":true},{"text":", then we can get ","element":"span"},{"style":{"height":19.6},"width":508.58,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-10.png","element":"img","alt":" Rb(N, M) ≳ 1/M. Hence","inline":true}],[{"style":{"width":"97%"},"width":1945,"height":229,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/42-11.png","element":"img"}],[{"style":{"width":"97%"},"width":1945,"height":389,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/43-0.png","element":"img"}],[{"text":"Define","element":"span"}],[{"id":"id-134","style":{"width":"84%"},"width":1689,"height":510,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/43-1.png","element":"img"}],[{"text":"In ","element":"span"},{"href":"#id-134","text":"(245)","element":"a"},{"text":", there are two conditions that are different from the definition of ","element":"span"},{"href":"#id-110","style":{"height":40.99},"width":1995.04,"height":102.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.11599/images/43-2.png","element":"img","alt":" Fa in (148): 1