1b:["$","$L29",null,{"isWhiteLabelled":false,"children":["$","$Lb",null,{"pt":{"compact":0,"expanded":3},"children":[["$","$L2a",null,{"noStar":true,"publisher":true,"task":true,"params":true,"size":"xl","product":{"id":"eyJwYXBlcklEIjoiMjAwMS4wNjc3NiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","updated":"2020-01-19T05:10:56.000Z","paperID":"2001.06776","published":"2020-01-19T05:10:56.000Z","authors":"[\"Akshay Krishnamurthy\",\"Arya Mazumdar\",\"Andrew McGregor\",\"Soumyabrata Pal\"]","title":"Algebraic and Analytic Approaches for Parameter Learning in Mixture Models","scoreTrending":null,"summary":"We present two different approaches for parameter learning in several mixture\nmodels in one dimension. Our first approach uses complex-analytic methods and\napplies to Gaussian mixtures with shared variance, binomial mixtures with\nshared success probability, and Poisson mixtures, among others. An example\nresult is that $\\exp(O(N^{1/3}))$ samples suffice to exactly learn a mixture of\n$k ","element":"span"},{"text":"1 ","element":"span"},{"text":"is any real number and ","element":"span"},{"style":{"height":15.89},"width":138.81,"height":39.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-3.png","element":"img","alt":" r ∈ R+","inline":true},{"text":". For any discrete random variable ","element":"span"},{"text":"X ","element":"span"},{"text":"with support ","element":"span"},{"text":"Z ","element":"span"},{"text":"and pmf ","element":"span"},{"text":"f","element":"span"},{"text":",","element":"span"}],[{"id":"id-30","style":{"width":"23%"},"width":400,"height":120,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-4.png","element":"img"}],[{"text":"Proof Note that, ","element":"span"},{"style":{"height":19.54},"width":986.84,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-5.png","element":"img","alt":" Pr(X ≥ x) = Pr(a2X−2x ≥ 1) ≤ E[a2X−2x]. We have,","inline":true}],[{"style":{"width":"96%"},"width":1667,"height":202,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-6.png","element":"img"}],[{"id":"id-31","text":"Theorem 7 (TV Lower Bounds) ","element":"span"},{"text":"The following bounds hold on distance between two different mixtures assuming all ","element":"span"},{"text":"k ","element":"span"},{"text":"parameters are distinct for each mixture.","element":"span"}],[{"text":"• ","element":"span"},{"text":"Gaussian: ","element":"span"},{"style":{"height":22.82},"width":1372.24,"height":57.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-7.png","element":"img","alt":" M = 1k�ki=1 N(µi, σ) and M′ = 1k�ki=1 N(µ′i, σ) where µi, µ′i ∈ ǫZ. Then","inline":true},{"style":{"height":23.81},"width":767.52,"height":59.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-8.png","element":"img","alt":"��M′ − M��TV ≥ k−1 exp(−Ω((σ/ǫ)2/3)) .","inline":true}],[{"text":"• ","element":"span"},{"text":"Poisson: ","element":"span"},{"style":{"height":23.01},"width":1503.28,"height":57.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-9.png","element":"img","alt":" M = 1k�ki=1 Poi(λi) and M′ = 1k�ki=1 Poi(λ′i) where λi, λ′i ∈ {0, 1, . . . , N}. Then","inline":true},{"style":{"height":23.81},"width":707.04,"height":59.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-10.png","element":"img","alt":"��M′ − M��TV ≥ k−1 exp(−Ω(N 1/3)) .","inline":true}],[{"text":"• ","element":"span"},{"text":"Chi-Squared: ","element":"span"},{"style":{"height":38.96},"width":1687.64,"height":97.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-11.png","element":"img","alt":" M = 1k�ki=1 χ2(ℓi) and M′ = 1k�ki=1 χ2(ℓ′i) where ℓi, ℓ′i ∈ {1, 2, . . . , N}.Then","inline":true}],[{"style":{"width":"40%"},"width":698,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-12.png","element":"img"}],[{"text":"• ","element":"span"},{"text":"Negative Binomial: ","element":"span"},{"style":{"height":38.96},"width":1798.52,"height":97.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-13.png","element":"img","alt":" M = 1k�ki=1 NB(ri, p) and M′ = 1k�ki=1 NB(r′i, p) where ri, r′i ∈ {1, 2, . . . , N}.Then","inline":true}],[{"style":{"width":"44%"},"width":772,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-14.png","element":"img"}],[{"text":"Proof As above we give the argument for Poisson random variables, deferring the others to the appendix. Let ","element":"span"},{"style":{"height":13.6},"width":412.48,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-15.png","element":"img","alt":" X ∼ M and X′ ∼ M′","inline":true},{"text":". Then, for ","element":"span"},{"text":"w ","element":"span"},{"text":"= 1 + ","element":"span"},{"text":"it","element":"span"},{"text":", from Lemma ","element":"span"},{"href":"#id-29","text":"4","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"42%"},"width":741,"height":137,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/6-16.png","element":"img"}],[{"text":"Now we use Lemma ","element":"span"},{"href":"#id-27","text":"3 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":17.6},"width":931.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/7-0.png","element":"img","alt":" G(x) = wx, Ω′ = {0, 1, . . . , 2N} and t ≤ 1, to have,","inline":true}],[{"style":{"width":"85%"},"width":1469,"height":372,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/7-1.png","element":"img"}],[{"text":"Now using Lemma ","element":"span"},{"href":"#id-30","text":"6","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"101%"},"width":1752,"height":749,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/7-2.png","element":"img"}],[{"text":"2.3. Parameter Learning","element":"span"}],[{"text":"Union Bound Approach for Discrete Distributions ","element":"span"},{"text":"We begin with the following proposition which follows from Theorem 7.1 of ","element":"span"},{"href":"#id-23","referenceIndex":13,"text":"Devroye and Lugosi ","element":"a"},{"text":"(","element":"span"},{"href":"#id-23","referenceIndex":13,"text":"2012","element":"a"},{"id":"id-32","text":").","element":"span"}],[{"style":{"width":"103%"},"width":1784,"height":152,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/7-3.png","element":"img"}],[{"text":"For the mixture of Poissons, ","element":"span"},{"style":{"height":22.81},"width":882.72,"height":57.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/7-4.png","element":"img","alt":" M = 1k�ki=1 Poi(λi) where λi ∈ {0, 1, . . . , N},","inline":true,"padRight":true},{"text":"the number of ","element":"span"},{"text":"choices for parameters in the mixture is ","element":"span"},{"style":{"height":19.54},"width":171.12,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/7-5.png","element":"img","alt":" (N + 1)k","inline":true},{"text":". Now using Lemmas ","element":"span"},{"href":"#id-31","text":"7 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-32","text":"8","element":"a"},{"text":", ","element":"span"},{"style":{"height":20.34},"width":261.32,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/7-6.png","element":"img","alt":" exp(O(N 1/3))","inline":true,"padRight":true},{"text":"samples are sufficient to learn the parameters of the mixture.","element":"span"}],[{"text":"Exactly the same argument applies to mixtures of Chi-Squared and Negative-Binomial distributions, yielding ","element":"span"},{"style":{"height":20.33},"width":686.6,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/7-7.png","element":"img","alt":" exp(O(N 1/3)) and exp(O((N/p)1/3))","inline":true,"padRight":true},{"text":"samples suffice, respectively. However, for Gaussians we need a more intricate approach.","element":"span"}],[{"text":"VC Approach for Gaussians ","element":"span"},{"text":"To learn the parameters of a Gaussian mixture","element":"span"}],[{"style":{"width":"64%"},"width":1107,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-0.png","element":"img"}],[{"text":"we use the minimum distance estimator precisely defined in (","element":"span"},{"href":"#id-23","referenceIndex":13,"text":"Devroye and Lugosi","element":"a"},{"text":", ","element":"span"},{"href":"#id-23","referenceIndex":13,"text":"2012","element":"a"},{"text":", Section 6.8). Let ","element":"span"},{"style":{"height":17.6},"width":535.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-1.png","element":"img","alt":" A ≡ {{x : M(x) ≥ M′(x)} :","inline":true,"padRight":true},{"text":"for any two mixtures ","element":"span"},{"style":{"height":17.6},"width":196.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-2.png","element":"img","alt":" M ̸= M′}","inline":true,"padRight":true},{"text":"be a collection of subsets. Let ","element":"span"},{"style":{"height":14.69},"width":57.84,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-3.png","element":"img","alt":"Pm","inline":true,"padRight":true},{"text":"denote the empirical probability measure induced by the ","element":"span"},{"text":"m ","element":"span"},{"text":"samples. Then, choose a mixture ","element":"span"},{"style":{"height":18.01},"width":53,"height":45.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-4.png","element":"img","alt":"ˆM","inline":true,"padRight":true},{"text":"for which the quantity ","element":"span"},{"style":{"height":20.14},"width":546.24,"height":50.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-5.png","element":"img","alt":" supA∈A | Pr∼ ˆM(A) − Pm(A)|","inline":true,"padRight":true},{"text":"is minimum (or within ","element":"span"},{"text":"1","element":"span"},{"text":"/m ","element":"span"},{"text":"of the infi- ","element":"span"},{"text":"mum). This is the minimum distance estimator, whose performance is guaranteed by the following proposition (","element":"span"},{"href":"#id-23","referenceIndex":13,"text":"Devroye and Lugosi","element":"a"},{"text":", ","element":"span"},{"href":"#id-23","referenceIndex":13,"text":"2012","element":"a"},{"text":", Thm. 6.4).","element":"span"}],[{"text":"Proposition 9 Given ","element":"span"},{"text":"m ","element":"span"},{"text":"samples from ","element":"span"},{"style":{"height":18.22},"width":1027.52,"height":45.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-6.png","element":"img","alt":" M and with ∆ = supA∈A | Pr∼M(A) − Pm(A)|, we have","inline":true},{"style":{"height":35.62},"width":475.2,"height":89.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-7.png","element":"img","alt":"�� ˆM − M��TV ≤ 4∆ + 3m.","inline":true}],[{"text":"We now upper bound the right-hand side of the above inequality. Via McDiarmid’s inequality and a standard symmetrization argument, ","element":"span"},{"style":{"height":12.8},"width":37,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-8.png","element":"img","alt":" ∆","inline":true,"padRight":true},{"text":"is concentrated around its mean which is a function of ","element":"span"},{"text":"V C","element":"span"},{"text":"(","element":"span"},{"text":"A","element":"span"},{"text":")","element":"span"},{"text":", the VC dimension of the class ","element":"span"},{"text":"A","element":"span"},{"text":", see (","element":"span"},{"href":"#id-23","referenceIndex":13,"text":"Devroye and Lugosi","element":"a"},{"text":", ","element":"span"},{"href":"#id-23","referenceIndex":13,"text":"2012","element":"a"},{"text":", Section 4.3):","element":"span"}],[{"style":{"width":"75%"},"width":1297,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-9.png","element":"img"}],[{"text":"with high probability, for an absolute constant ","element":"span"},{"text":"c","element":"span"},{"text":". This latter term is bounded by the following.","element":"span"}],[{"text":"Lemma 10 For the class ","element":"span"},{"text":"A ","element":"span"},{"text":"defined above, the VC dimension is given by ","element":"span"},{"text":"V C","element":"span"},{"text":"(","element":"span"},{"text":"A","element":"span"},{"text":") = ","element":"span"},{"text":"O","element":"span"},{"text":"(","element":"span"},{"text":"k","element":"span"},{"text":")","element":"span"},{"text":".","element":"span"}],[{"text":"Proof First of all we show that any element of the set ","element":"span"},{"text":"A ","element":"span"},{"text":"can be written as union of at most ","element":"span"},{"style":{"height":12.8},"width":121.84,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-10.png","element":"img","alt":" 4k − 1","inline":true,"padRight":true},{"text":"intervals in ","element":"span"},{"text":"R","element":"span"},{"text":". For this we use the fact that a linear combination of ","element":"span"},{"text":"k ","element":"span"},{"text":"Gaussian pdfs ","element":"span"},{"text":"f","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") = ","element":"span"},{"style":{"height":22.05},"width":412.32,"height":55.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-11.png","element":"img","alt":"�ki=1 αifi(x) where fi","inline":true},{"text":"s normal pdf ","element":"span"},{"style":{"height":19.73},"width":619.16,"height":49.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-12.png","element":"img","alt":" N(µi, σ2i ) and αi ∈ R, 1 ≤ i ≤ k","inline":true,"padRight":true},{"text":"has at most ","element":"span"},{"style":{"height":12.8},"width":227.16,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-13.png","element":"img","alt":" 2k − 2 zero-","inline":true,"padRight":true},{"text":"crossings (","element":"span"},{"href":"#id-33","referenceIndex":21,"text":"Kalai et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-33","referenceIndex":21,"text":"2012","element":"a"},{"text":"). Therefore, for any two mixtures of interest ","element":"span"},{"style":{"height":17.6},"width":409.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-14.png","element":"img","alt":" M(x) − M′(x) has at","inline":true,"padRight":true},{"text":"most ","element":"span"},{"style":{"height":12.8},"width":123.76,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-15.png","element":"img","alt":" 4k − 2","inline":true,"padRight":true},{"text":"zero-crossings. Therefore any ","element":"span"},{"style":{"height":14},"width":128.16,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-16.png","element":"img","alt":" A ∈ A","inline":true,"padRight":true},{"text":"must be a union of at most ","element":"span"},{"style":{"height":12.8},"width":123.76,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-17.png","element":"img","alt":" 4k − 1","inline":true,"padRight":true},{"text":"contiguous regions in ","element":"span"},{"text":"R","element":"span"},{"text":". It is now an easy exercise to see that the VC dimension of such a class is ","element":"span"},{"style":{"height":17.6},"width":102.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-18.png","element":"img","alt":" Θ(k).","inline":true}],[{"style":{"width":"96%"},"width":1659,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-19.png","element":"img"}],[{"text":"from Theorem ","element":"span"},{"href":"#id-31","text":"7","element":"a"},{"text":", notice that for any other mixture ","element":"span"},{"style":{"height":13.6},"width":68.32,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-20.png","element":"img","alt":" M′ ","inline":true,"padRight":true},{"text":"we must have,","element":"span"}],[{"style":{"width":"43%"},"width":752,"height":60,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-21.png","element":"img"}],[{"text":"As long as","element":"span"},{"style":{"height":32.86},"width":609.56,"height":82.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-22.png","element":"img","alt":"�� ˆM − M��TV ≤ 12 ∥M − M′∥TV ","inline":true,"padRight":true},{"text":"we will exactly identify the parameters. Therefore ","element":"span"},{"style":{"height":20.33},"width":468.68,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-23.png","element":"img","alt":"m = k3 exp(O((σ/ǫ)2/3))","inline":true,"padRight":true},{"text":"samples suffice to exactly learn the parameters with high probability.","element":"span"}],[{"text":"2.4. Extension to Non-Uniform Mixtures","element":"span"}],[{"text":"The above results extend to non-uniform mixtures, where the main change is that we require a generalization of Lemma ","element":"span"},{"href":"#id-30","text":"5","element":"a"},{"text":". The result, also proved by ","element":"span"},{"href":"#id-28","referenceIndex":5,"text":"Borwein and Erd´elyi ","element":"a"},{"text":"(","element":"span"},{"href":"#id-28","referenceIndex":5,"text":"1997","element":"a"},{"text":"), states that if ","element":"span"},{"style":{"height":17.6},"width":654.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-24.png","element":"img","alt":"a0, a1, a2, . . . ∈ [−1, 1] with poly(n)","inline":true,"padRight":true},{"text":"precision then ","element":"span"},{"style":{"height":21.98},"width":801.52,"height":54.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/8-25.png","element":"img","alt":" max−π/L≤θ≤π/L |A(eiθ)| ≥ e−cL log n, for an","inline":true,"padRight":true},{"text":"absolute constant ","element":"span"},{"text":"c","element":"span"},{"text":". This weaker bound yields an extra poly","element":"span"},{"text":"(","element":"span"},{"text":"n","element":"span"},{"text":") ","element":"span"},{"text":"factor in the sample complexity.","element":"span"}]]},{"heading":"3. Learning Mixtures via Moments","paragraphs":[[{"text":"There are some mixtures where the problem of learning parameters is not amenable to the approach in the previous section. ","element":"span"},{"text":"A simple motivating example is learning the parameters ","element":"span"},{"style":{"height":13.2},"width":89.48,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-0.png","element":"img","alt":" pi ∈","inline":true},{"style":{"height":19.14},"width":480.16,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-1.png","element":"img","alt":"{0, ǫ, 2ǫ, 3ǫ, . . . , 1} values3 ","inline":true,"padRight":true},{"text":"in the mixture ","element":"span"},{"style":{"height":23.01},"width":450.44,"height":57.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-2.png","element":"img","alt":" M = 1k�ki=1 Bin(n, pi)","inline":true},{"text":". In this section, we present ","element":"span"},{"text":"an alternative procedure for learning such mixtures. The basic idea is as follows:","element":"span"}],[{"text":"• ","element":"span"},{"text":"We compute moments ","element":"span"},{"style":{"height":12},"width":84.64,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-3.png","element":"img","alt":" EXℓ ","inline":true,"padRight":true},{"text":"exactly for ","element":"span"},{"style":{"height":15.6},"width":269.08,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-4.png","element":"img","alt":" ℓ = 0, 1, . . . , T","inline":true,"padRight":true},{"text":"by taking sufficiently many samples. The number of samples will depend on ","element":"span"},{"text":"T ","element":"span"},{"text":"and the precision of the parameters of the mixture.","element":"span"}],[{"text":"• ","element":"span"},{"text":"We argue that if ","element":"span"},{"text":"T ","element":"span"},{"text":"is sufficiently large, then these moments uniquely define the parameters of the mixture. To do this we use a combinatorial result due to ","element":"span"},{"href":"#id-34","referenceIndex":24,"text":"Krasikov and Roditty ","element":"a"},{"text":"(","element":"span"},{"href":"#id-34","referenceIndex":24,"text":"1997","element":"a"},{"text":").","element":"span"}],[{"text":"In this section, it will be convenient to define a function ","element":"span"},{"style":{"height":10.88},"width":54.4,"height":27.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-5.png","element":"img","alt":" mℓ","inline":true,"padRight":true},{"text":"on multi-sets where","element":"span"}],[{"id":"id-41","style":{"width":"18%"},"width":322,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-6.png","element":"img"}],[{"text":"Our main result is as follows:","element":"span"}],[{"text":"Theorem 11 (Learning Binomial mixtures) Let ","element":"span"},{"style":{"height":22.82},"width":449,"height":57.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-7.png","element":"img","alt":" M = 1k�ki=1 Bin(n, pi)","inline":true,"padRight":true},{"text":"be a uniform mixture ","element":"span"},{"text":"of ","element":"span"},{"text":"k ","element":"span"},{"text":"binomials, with known shared number of trials ","element":"span"},{"text":"n ","element":"span"},{"text":"and unknown probabilities ","element":"span"},{"style":{"height":13.2},"width":232.04,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-8.png","element":"img","alt":" p1, . . . , pk ∈","inline":true},{"style":{"height":17.6},"width":280.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-9.png","element":"img","alt":"{0, ǫ, 2ǫ, . . . , 1}","inline":true},{"text":". Then, provided ","element":"span"},{"style":{"height":17.87},"width":462.96,"height":44.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-10.png","element":"img","alt":" n ≥ 4/√ǫ, the first 4/√ǫ","inline":true,"padRight":true},{"text":"moments suffice to learn the parameters ","element":"span"},{"style":{"height":11.6},"width":34.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-11.png","element":"img","alt":" pi","inline":true,"padRight":true},{"text":"and there exists an algorithm that, when given ","element":"span"},{"style":{"height":20.75},"width":288.68,"height":51.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-12.png","element":"img","alt":" O(k2(n/ǫ)8/√ǫ)","inline":true,"padRight":true},{"text":"samples from ","element":"span"},{"text":"M","element":"span"},{"text":", exactly identifies the parameters ","element":"span"},{"style":{"height":19.93},"width":134.12,"height":49.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-13.png","element":"img","alt":" {pi}ki=1 ","inline":true,"padRight":true},{"text":"with high probability.","element":"span"}],[{"text":"Computing the Moments ","element":"span"},{"text":"We compute the ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-14.png","element":"img","alt":" ℓ","inline":true},{"text":"th moment as ","element":"span"},{"style":{"height":18.48},"width":677.72,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-15.png","element":"img","alt":" Sℓ,t = � Y ℓi /t where Y1, . . . , Yt ∼ X.","inline":true}],[{"text":"Lemma 12 ","element":"span"},{"style":{"height":31.6},"width":901.04,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-16.png","element":"img","alt":" Pr[|Sℓ,t − EXℓ| ≥ γ] ≤ EX2ℓtγ2 ≤ (2ℓ)!γ2t infα�EeαXα2ℓ �","inline":true},{"text":"where the last inequality assumes the all the moments of ","element":"span"},{"text":"X ","element":"span"},{"text":"are non-negative.","element":"span"}],[{"text":"Proof By the Chebyshev bound,","element":"span"}],[{"style":{"width":"60%"},"width":1039,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-17.png","element":"img"}],[{"text":"We then use the moment generating function: for all ","element":"span"},{"style":{"height":19.54},"width":572.6,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-18.png","element":"img","alt":" α > 0, EX2ℓ ≤ (2ℓ)!EeαX/α2ℓ.","inline":true}],[{"text":"The following corollary, tailors the above lemma for a mixture of binomial distributions.","element":"span"}],[{"style":{"width":"95%"},"width":1649,"height":309,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-19.png","element":"img"}],[{"text":"where ","element":"span"},{"text":"f ","element":"span"},{"text":"is a polynomial of degree at most ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-20.png","element":"img","alt":" ℓ","inline":true,"padRight":true},{"text":"with integer coefficients (","element":"span"},{"href":"#id-4","referenceIndex":4,"text":"Belkin and Sinha","element":"a"},{"text":", ","element":"span"},{"href":"#id-4","referenceIndex":4,"text":"2010","element":"a"},{"text":"). If ","element":"span"},{"style":{"height":11.6},"width":34.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-21.png","element":"img","alt":"pi","inline":true,"padRight":true},{"text":"is an integer multiple of ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-22.png","element":"img","alt":" ǫ","inline":true,"padRight":true},{"text":"then this implies ","element":"span"},{"style":{"height":17.6},"width":197.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-23.png","element":"img","alt":" k(EXℓ)/ǫℓ ","inline":true,"padRight":true},{"text":"is integral and therefore any mixture with a different ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-24.png","element":"img","alt":" ℓ","inline":true},{"text":"th moment differs by at least ","element":"span"},{"style":{"height":17.6},"width":78.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-25.png","element":"img","alt":" ǫℓ/k","inline":true},{"text":". Hence, learning the ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-26.png","element":"img","alt":" ℓ","inline":true},{"text":"th moment up to ","element":"span"},{"style":{"height":17.6},"width":232.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/9-27.png","element":"img","alt":" γℓ < ǫℓ/(2k)","inline":true,"padRight":true},{"text":"implies learning the moment exactly.","element":"span"}],[{"text":"Lemma 14 For ","element":"span"},{"style":{"height":17.6},"width":368.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-0.png","element":"img","alt":" X ∼ Bin(n, p), EXℓ ","inline":true,"padRight":true},{"text":"is a polynomial in ","element":"span"},{"text":"p ","element":"span"},{"text":"of degree exactly ","element":"span"},{"style":{"height":16},"width":178.04,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-1.png","element":"img","alt":" ℓ if n ≥ ℓ.","inline":true}],[{"text":"The proof of the lemma is relegated to the appendix.","element":"span"}],[{"text":"Theorem 15 ","element":"span"},{"style":{"height":20.75},"width":288.68,"height":51.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-2.png","element":"img","alt":" O(k2(n/ǫ)8/√ǫ)","inline":true,"padRight":true},{"text":"samples are sufficient to exactly learn the first ","element":"span"},{"style":{"height":17.87},"width":97.68,"height":44.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-3.png","element":"img","alt":" 4/√ǫ","inline":true,"padRight":true},{"text":"moments of a uniform mixture of ","element":"span"},{"text":"k ","element":"span"},{"text":"binomial distributions ","element":"span"},{"style":{"height":22.05},"width":332.6,"height":55.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-4.png","element":"img","alt":"�ki=1 Bin(n, pi)/k","inline":true,"padRight":true},{"text":"with probability at least ","element":"span"},{"text":"7","element":"span"},{"text":"/","element":"span"},{"text":"8 ","element":"span"},{"text":"where ","element":"span"},{"text":"each ","element":"span"},{"style":{"height":17.6},"width":379.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-5.png","element":"img","alt":" pi ∈ {0, ǫ, 2ǫ, . . . , 1}.","inline":true}],[{"text":"Proof Let ","element":"span"},{"style":{"height":17.68},"width":194.16,"height":44.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-6.png","element":"img","alt":" T = 4/√ǫ","inline":true},{"text":". From Corollary ","element":"span"},{"href":"#id-35","text":"20 ","element":"a"},{"text":"and the preceding discussion, learning the ","element":"span"},{"style":{"height":12.8},"width":208.36,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-7.png","element":"img","alt":" ℓth moment","inline":true,"padRight":true},{"text":"exactly with failure probability ","element":"span"},{"style":{"height":19.54},"width":325.64,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-8.png","element":"img","alt":" 1/91+T−ℓ requires","inline":true}],[{"style":{"width":"65%"},"width":1135,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-9.png","element":"img"}],[{"text":"samples. And hence, we can compute all ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-10.png","element":"img","alt":" ℓ","inline":true},{"text":"th moments exactly for ","element":"span"},{"style":{"height":17.87},"width":358.96,"height":44.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-11.png","element":"img","alt":" 1 ≤ ℓ ≤ 4/√ǫ using","inline":true}],[{"style":{"width":"40%"},"width":692,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-12.png","element":"img"}],[{"text":"samples with failure probability ","element":"span"},{"style":{"height":21.86},"width":669.08,"height":54.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-13.png","element":"img","alt":"�Tℓ=1 1/91+T−ℓ < �∞i=1 1/9i = 1/8.","inline":true}],[{"text":"How many moments determine the parameters ","element":"span"},{"text":"It remains to show the first ","element":"span"},{"style":{"height":17.68},"width":273.8,"height":44.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-14.png","element":"img","alt":" 4/√ǫ moments","inline":true,"padRight":true},{"text":"suffice to determine the ","element":"span"},{"style":{"height":11.6},"width":34.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-15.png","element":"img","alt":" pi","inline":true,"padRight":true},{"text":"values in the mixture ","element":"span"},{"style":{"height":22.82},"width":867.28,"height":57.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-16.png","element":"img","alt":" X ∼ �ki=1 Bin(n, pi)/k provided n ≥ 4ǫ . To do","inline":true,"padRight":true},{"text":"this suppose there exists another mixture ","element":"span"},{"style":{"height":22.05},"width":423.32,"height":55.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-17.png","element":"img","alt":" Y ∼ �ki=1 Bin(n, qi)/k","inline":true,"padRight":true},{"text":"and we will argue that","element":"span"}],[{"style":{"width":"38%"},"width":657,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-18.png","element":"img"}],[{"text":"implies ","element":"span"},{"style":{"height":20.05},"width":370.44,"height":50.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-19.png","element":"img","alt":" {pi}i∈[k] = {qi}i∈[k]","inline":true},{"text":". To argue this, define integers ","element":"span"},{"style":{"height":17.6},"width":429.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-20.png","element":"img","alt":" αi, βi ∈ {0, 1, . . . , 1/ǫ}","inline":true,"padRight":true},{"text":"such at that","element":"span"}],[{"style":{"width":"83%"},"width":1436,"height":541,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-21.png","element":"img"}],[{"text":"Hence, if the first ","element":"span"},{"text":"T ","element":"span"},{"text":"moments match ","element":"span"},{"style":{"height":17.6},"width":719.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-22.png","element":"img","alt":" mℓ(A) = mℓ(B) for all ℓ = 0, 1, . . . , T","inline":true},{"text":". But the following ","element":"span"},{"id":"id-36","text":"theorem establishes that if ","element":"span"},{"style":{"height":20.8},"width":216.24,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-23.png","element":"img","alt":" T = 4�1/ǫ","inline":true,"padRight":true},{"text":"then this implies ","element":"span"},{"text":"A ","element":"span"},{"text":"= ","element":"span"},{"text":"B","element":"span"},{"text":".","element":"span"}],[{"text":"Theorem 16 (","element":"span"},{"href":"#id-34","referenceIndex":24,"text":"Krasikov and Roditty ","element":"a"},{"text":"(","element":"span"},{"href":"#id-34","referenceIndex":24,"text":"1997","element":"a"},{"text":")) For any two subsets ","element":"span"},{"style":{"height":17.6},"width":535.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-24.png","element":"img","alt":" S, T of {0, 1, . . . , n − 1}, then","inline":true}],[{"style":{"width":"58%"},"width":1010,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-25.png","element":"img"}],[{"text":"We note that the above theorem is essentially tight. Specifically, there exists ","element":"span"},{"style":{"height":16.8},"width":236.56,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-26.png","element":"img","alt":" S ̸= T with","inline":true},{"style":{"height":17.6},"width":1054.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-27.png","element":"img","alt":"mk(S) = mk(T) for k = 0, 1, . . . , cn/ log n for some c","inline":true},{"text":". As a consequence of this, we note that even the exact values of the ","element":"span"},{"style":{"height":17.68},"width":200.24,"height":44.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-28.png","element":"img","alt":" c√n/ log n","inline":true,"padRight":true},{"text":"moments are insufficient to learn the parameters of the distribution. ","element":"span"},{"text":"For an example in terms of Gaussian mixtures, even given the promise ","element":"span"},{"style":{"height":13.6},"width":92.84,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-29.png","element":"img","alt":" µi ∈","inline":true},{"style":{"height":17.6},"width":304.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-30.png","element":"img","alt":"{0, 1, . . . , n − 1}","inline":true,"padRight":true},{"text":"are distinct, then the first ","element":"span"},{"style":{"height":17.68},"width":200.24,"height":44.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-31.png","element":"img","alt":" c√n/ log n","inline":true,"padRight":true},{"text":"moments of ","element":"span"},{"style":{"height":19.18},"width":223.4,"height":47.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-32.png","element":"img","alt":"�i N(µi, 1)","inline":true,"padRight":true},{"text":"are insufficient to ","element":"span"},{"text":"uniquely determine ","element":"span"},{"style":{"height":12},"width":38.4,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-33.png","element":"img","alt":" µi","inline":true,"padRight":true},{"text":"whereas the first ","element":"span"},{"style":{"height":17.6},"width":84.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/10-34.png","element":"img","alt":" 4√n","inline":true,"padRight":true},{"text":"moments are sufficient.","element":"span"}],[{"text":"3.1. Extension to Non-Uniform Distributions","element":"span"}],[{"text":"We now consider extending the framework to non-uniform distributions. In this case, the method of computing the moments is identical to the uniform case. However, when arguing that a small number of moments suffices we can no longer appeal to the Theorem ","element":"span"},{"href":"#id-36","text":"16","element":"a"},{"text":".","element":"span"}],[{"text":"To handle non-uniform distribution we introduce a precision variable ","element":"span"},{"text":"q ","element":"span"},{"text":"and assume that the weights of the component distributions ","element":"span"},{"style":{"height":11.2},"width":253.68,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/11-0.png","element":"img","alt":" ω1, ω2, . . . , ωk","inline":true,"padRight":true},{"text":"are of the form:","element":"span"}],[{"style":{"width":"14%"},"width":259,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/11-1.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":407.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/11-2.png","element":"img","alt":" wi ∈ {0, 1, . . . , q − 1}","inline":true},{"text":". Then, in the above framework if we are trying to learn parameters ","element":"span"},{"style":{"height":11.2},"width":189.36,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/11-3.png","element":"img","alt":"α1, . . . , αk","inline":true,"padRight":true},{"text":"then the moments are going to define a multi-set consisting of ","element":"span"},{"style":{"height":16.4},"width":423.28,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/11-4.png","element":"img","alt":" wi copies of αi for each","inline":true},{"style":{"height":17.6},"width":116.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/11-5.png","element":"img","alt":"i ∈ [k]","inline":true},{"text":". To quantify how many moments suffice in this case, we need to prove a variant of Theorem ","element":"span"},{"href":"#id-36","text":"16","element":"a"},{"text":". The proof is a relatively straight-forward generalization of proof by ","element":"span"},{"href":"#id-37","referenceIndex":29,"text":"Scott ","element":"a"},{"text":"(","element":"span"},{"href":"#id-37","referenceIndex":29,"text":"1997","element":"a"},{"text":") and can be found in the appendix.","element":"span"}],[{"id":"id-38","text":"Theorem 17 ","element":"span"},{"text":"For any two multi-sets ","element":"span"},{"text":"S, T ","element":"span"},{"text":"where each element is in ","element":"span"},{"style":{"height":17.6},"width":305.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/11-6.png","element":"img","alt":" {0, 1, . . . , n − 1}","inline":true,"padRight":true},{"text":"and the multiplicity of each element is at most ","element":"span"},{"style":{"height":16},"width":337.72,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/11-7.png","element":"img","alt":" q − 1, then S = T","inline":true,"padRight":true},{"text":"if and only if ","element":"span"},{"style":{"height":17.6},"width":515.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/11-8.png","element":"img","alt":" mk(S) = mk(T) for all k =","inline":true},{"style":{"height":18.05},"width":394.52,"height":45.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/11-9.png","element":"img","alt":"0, 1, . . . , 2√qn log qn.","inline":true}],[{"text":"Acknowledgements ","element":"span"},{"text":"The work was partially supported by NSF grants CCF-1909046, CCF-1934846, CCF-1908849, and CCF-1637536.","element":"span"}]]},{"heading":"References","paragraphs":[[{"id":"id-9","text":"Jayadev Acharya, Ilias Diakonikolas, Jerry Li, and Ludwig Schmidt. Sample-optimal density esti- ","element":"span"},{"text":"mation in nearly-linear time. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1278–1289. SIAM, 2017.","element":"span"}],[{"id":"id-2","text":"Dimitris Achlioptas and Frank McSherry. On spectral learning of mixtures of distributions. In ","element":"span"},{"text":"Conference on Learning Theory, 2005.","element":"span"}],[{"id":"id-5","text":"Sanjeev Arora and Ravi Kannan. Learning mixtures of arbitrary gaussians. In ","element":"span"},{"text":"Symposium on Theory of Computing, 2001.","element":"span"}],[{"id":"id-4","text":"Mikhail Belkin and Kaushik Sinha. Polynomial learning of distribution families. In ","element":"span"},{"text":"Foundations of Computer Science, 2010.","element":"span"}],[{"id":"id-28","text":"P. Borwein and T. Erd´elyi. Littlewood-type problems on subarcs of the unit circle. ","element":"span"},{"text":"Indiana University Mathematics Journal, 1997.","element":"span"}],[{"id":"id-25","text":"Peter Borwein. ","element":"span"},{"text":"The Prouhet—Tarry—Escott Problem, pages 85–95. ","element":"span"},{"text":"Springer New York, New York, NY, 2002. ","element":"span"},{"text":"ISBN 978-0-387-21652-2. ","element":"span"},{"text":"doi: 10.1007/978-0-387-21652-211. ","element":"span"},{"text":"URL ","element":"span"},{"href":"https://doi.org/10.1007/978-0-387-21652-2_11","text":"https://doi.org/10.1007/978-0-387-21652-2_11","element":"a"},{"text":".","element":"span"}],[{"id":"id-20","text":"Siu-On Chan, Ilias Diakonikolas, Rocco A Servedio, and Xiaorui Sun. Learning mixtures of struc- ","element":"span"},{"text":"tured distributions over discrete domains. In Symposium on Discrete Algorithms, 2013.","element":"span"}],[{"id":"id-8","text":"Siu-On Chan, Ilias Diakonikolas, Rocco A Servedio, and Xiaorui Sun. Efficient density estimation ","element":"span"},{"text":"via piecewise polynomial approximation. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 604–613. ACM, 2014.","element":"span"}],[{"id":"id-1","text":"Sanjoy Dasgupta. Learning mixtures of gaussians. In ","element":"span"},{"text":"Foundations of Computer Science, pages 634–644, 1999.","element":"span"}],[{"id":"id-21","text":"Constantinos Daskalakis and Gautam Kamath. ","element":"span"},{"text":"Faster and sample near-optimal algorithms for proper learning mixtures of gaussians. In Conference on Learning Theory, 2014.","element":"span"}],[{"id":"id-15","text":"Anindya De, Ryan O’Donnell, and Rocco A. Servedio. Optimal mean-based algorithms for trace ","element":"span"},{"text":"reconstruction. In Symposium on Theory of Computing, 2017a.","element":"span"}],[{"id":"id-17","text":"Anindya De, Ryan O’Donnell, and Rocco A. Servedio. Sharp bounds for population recovery. ","element":"span"},{"text":"CoRR, abs/1703.01474, 2017b. URL ","element":"span"},{"href":"http://arxiv.org/abs/1703.01474","text":"http://arxiv.org/abs/1703.01474","element":"a"},{"text":".","element":"span"}],[{"id":"id-23","text":"Luc Devroye and G´abor Lugosi. ","element":"span"},{"text":"Combinatorial methods in density estimation. Springer Science & Business Media, 2012.","element":"span"}],[{"id":"id-19","text":"Ilias Diakonikolas. Learning structured distributions. ","element":"span"},{"text":"Handbook of Big Data, 267, 2016.","element":"span"}],[{"id":"id-22","text":"Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. Statistical query lower bounds for robust ","element":"span"},{"text":"estimation of high-dimensional gaussians and gaussian mixtures. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 73–84. IEEE, 2017.","element":"span"}],[{"id":"id-11","text":"Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. List-decodable robust mean estimation ","element":"span"},{"text":"and learning mixtures of spherical gaussians. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1047–1060. ACM, 2018.","element":"span"}],[{"id":"id-7","text":"Jon Feldman, Ryan O’Donnell, and Rocco A Servedio. Learning mixtures of product distributions ","element":"span"},{"text":"over discrete domains. SIAM Journal on Computing, 2008.","element":"span"}],[{"id":"id-13","text":"Moritz Hardt and Eric Price. Tight bounds for learning a mixture of two gaussians. In ","element":"span"},{"text":"Symposium on Theory of Computing, 2015.","element":"span"}],[{"id":"id-39","text":"Godfrey Harold Hardy, Edward Maitland Wright, et al. ","element":"span"},{"text":"An introduction to the theory of numbers. Oxford university press, 1979.","element":"span"}],[{"id":"id-10","text":"Samuel B Hopkins and Jerry Li. Mixture models, robustness, and sum of squares proofs. In ","element":"span"},{"text":"Symposium on Theory of Computing, 2018.","element":"span"}],[{"id":"id-33","text":"Adam Kalai, Ankur Moitra, and Gregory Valiant. Disentangling Gaussians. ","element":"span"},{"text":"Communications of the ACM, 55(2):113–120, 2012.","element":"span"}],[{"id":"id-3","text":"Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant. Efficiently learning mixtures of two ","element":"span"},{"text":"gaussians. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 553–562. ACM, 2010.","element":"span"}],[{"id":"id-12","text":"Pravesh K Kothari, Jacob Steinhardt, and David Steurer. Robust moment estimation and improved ","element":"span"},{"text":"clustering via sum of squares. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1035–1046. ACM, 2018.","element":"span"}],[{"id":"id-34","text":"I. Krasikov and Y. Roditty. On a reconstruction problem for sequences. ","element":"span"},{"text":"Journal of Combinatorial Theory, Series A, 1997.","element":"span"}],[{"id":"id-26","text":"Akshay Krishnamurthy, Arya Mazumdar, Andrew McGregor, and Soumyabrata Pal. Trace recon- ","element":"span"},{"text":"struction: Generalized and parameterized. In 27th Annual European Symposium on Algorithms, ESA 2019, September 9-11, 2019, Munich/Garching, Germany., pages 68:1–68:25, 2019.","element":"span"}],[{"id":"id-18","text":"Ankur Moitra. ","element":"span"},{"text":"Algorithmic aspects of machine learning. Cambridge University Press, 2018.","element":"span"}],[{"id":"id-6","text":"Ankur Moitra and Gregory Valiant. Settling the polynomial learnability of mixtures of gaussians. ","element":"span"},{"text":"In Foundations of Computer Science, 2010.","element":"span"}],[{"id":"id-16","text":"Fedor Nazarov and Yuval Peres. Trace reconstruction with ","element":"span"},{"style":{"height":20.34},"width":231.08,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/13-0.png","element":"img","alt":" exp(O(n1/3)","inline":true,"padRight":true},{"text":"samples. In Symposium on Theory of Computing, 2017.","element":"span"}],[{"id":"id-37","text":"Alex D. Scott. Reconstructing sequences. ","element":"span"},{"text":"Discrete Mathematics, 1997.","element":"span"}],[{"id":"id-24","text":"Ananda Theertha Suresh, Alon Orlitsky, Jayadev Acharya, and Ashkan Jafarpour. Near-optimal- ","element":"span"},{"text":"sample estimators for spherical gaussian mixtures. In Advances in Neural Information Processing Systems, pages 1395–1403, 2014.","element":"span"}],[{"id":"id-0","text":"D Michael Titterington, Adrian FM Smith, and Udi E Makov. ","element":"span"},{"text":"Statistical analysis of finite mixture distributions. Wiley, 1985.","element":"span"}],[{"id":"id-42","text":"Eric W. Weisstein. Geometric distribution. ","element":"span"},{"text":"From MathWorld–A Wolfram Web Resource, 2019. URL ","element":"span"},{"href":"http://mathworld.wolfram.com/GeometricDistribution.html","text":"http://mathworld.wolfram.com/GeometricDistribution.html","element":"a"},{"text":".","element":"span"}]]},{"heading":"4. Omitted Proofs","paragraphs":[[{"text":"Additional calculations for Lemma ","element":"span"},{"href":"#id-29","text":"4","element":"a"},{"text":". ","element":"span"},{"text":"We consider each distribution in turn:","element":"span"}],[{"text":"• ","element":"span"},{"text":"Gaussian: Observe that ","element":"span"},{"style":{"height":17.6},"width":174.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/13-1.png","element":"img","alt":" E[Gt(X)]","inline":true,"padRight":true},{"text":"is precisely the characteristic function. Clearly we have ","element":"span"},{"style":{"height":17.6},"width":208.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/13-2.png","element":"img","alt":"∥Gt∥∞ = 1","inline":true,"padRight":true},{"text":"and further","element":"span"}],[{"style":{"width":"54%"},"width":938,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/13-3.png","element":"img"}],[{"text":"• ","element":"span"},{"text":"Poisson: If ","element":"span"},{"style":{"height":19.14},"width":824.84,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/13-4.png","element":"img","alt":" Gt(x) = (1 + it)x then since |1 + it|2 = 1 + t2 ","inline":true,"padRight":true},{"text":"the second claim follows. For the first:","element":"span"}],[{"style":{"width":"95%"},"width":1658,"height":269,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/13-5.png","element":"img"}],[{"text":"• ","element":"span"},{"text":"Negative Binomial: Let ","element":"span"},{"style":{"height":28.39},"width":1179.28,"height":70.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/13-6.png","element":"img","alt":" wt = 1/p − (1/p − 1)e−it then |wt|2 = 1+(1−p)2−2(1−p) cos tp2 =","inline":true}],[{"style":{"width":"62%"},"width":1086,"height":215,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/13-7.png","element":"img"}],[{"text":"4.1. Additional calculations for Theorem ","element":"span"},{"href":"#id-31","text":"7","element":"a"},{"text":".","element":"span"}],[{"text":"• ","element":"span"},{"text":"Gaussian: The characteristic function of a Gaussian ","element":"span"},{"style":{"height":19.14},"width":305,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/14-0.png","element":"img","alt":" X ∼ N(µ, σ2) is","inline":true}],[{"style":{"width":"76%"},"width":1314,"height":1120,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/14-1.png","element":"img"}],[{"text":"• ","element":"span"},{"text":"Chi-Squared: Let ","element":"span"},{"style":{"height":13.6},"width":469.12,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/14-2.png","element":"img","alt":" X ∼ M and X′ ∼ M′","inline":true},{"text":". Then, for ","element":"span"},{"style":{"height":19.13},"width":588.92,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/14-3.png","element":"img","alt":" w = exp(1/2 − e−2it/2), from","inline":true,"padRight":true},{"text":"Lemma ","element":"span"},{"href":"#id-29","text":"4","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"91%"},"width":1579,"height":635,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/14-4.png","element":"img"}],[{"text":"where we have used the pdf of chi-squared distribution and the tail bounds for chi-squared. Now using Lemma ","element":"span"},{"href":"#id-30","text":"5","element":"a"},{"text":", and taking ","element":"span"},{"style":{"height":19.15},"width":141.56,"height":47.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/14-5.png","element":"img","alt":" |t| ≤ πL,","inline":true}],[{"style":{"width":"96%"},"width":1668,"height":224,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/14-6.png","element":"img"}],[{"text":"• ","element":"span"},{"text":"Negative-Binomial: Let ","element":"span"},{"style":{"height":13.6},"width":421.6,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-0.png","element":"img","alt":" X ∼ M and X′ ∼ M′","inline":true},{"text":". Then, for ","element":"span"},{"style":{"height":19.14},"width":571.64,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-1.png","element":"img","alt":" w = 1/p − (1/p − 1)e−it, from","inline":true,"padRight":true},{"text":"Lemma ","element":"span"},{"href":"#id-29","text":"4","element":"a"},{"text":", taking ","element":"span"},{"style":{"height":17.6},"width":215.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-2.png","element":"img","alt":" G(x) = wx,","inline":true}],[{"style":{"width":"83%"},"width":1440,"height":410,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":23.1},"width":1498.04,"height":57.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-4.png","element":"img","alt":" u(x) =�x+N−1x �(1 − p)Npx. We have |w| ≤ ec(1−p)t2/p2 ≤ ec(1−p)/p2 for t < 1.","inline":true,"padRight":true},{"text":"Using Lemma ","element":"span"},{"href":"#id-30","text":"6","element":"a"},{"text":", with ","element":"span"},{"style":{"height":17.6},"width":460.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-5.png","element":"img","alt":" X ∼ NB(N, p), we have,","inline":true}],[{"style":{"width":"92%"},"width":1602,"height":619,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-6.png","element":"img"}],[{"text":"4.2. Proof of Theorem ","element":"span"},{"href":"#id-38","text":"17","element":"a"}],[{"text":"Let ","element":"span"},{"text":"a ","element":"span"},{"text":"be the characteristic vector of a subset ","element":"span"},{"style":{"height":17.6},"width":476.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-7.png","element":"img","alt":" S ⊂ U. Let sℓ = mℓ(S)","inline":true,"padRight":true},{"text":"on this set and let ","element":"span"},{"text":"s ","element":"span"},{"text":"= ","element":"span"},{"style":{"height":17.6},"width":313.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-8.png","element":"img","alt":"(s0, s1, . . . , sk−1)","inline":true},{"text":". We need to prove ","element":"span"},{"text":"a ","element":"span"},{"text":"is uniquely determined by ","element":"span"},{"text":"s","element":"span"},{"text":". Let us define","element":"span"}],[{"style":{"width":"30%"},"width":532,"height":103,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-9.png","element":"img"}],[{"text":"Claim 1 For a prime number ","element":"span"},{"style":{"height":24.43},"width":1085.88,"height":61.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-10.png","element":"img","alt":" p and i ̸≡p 0, we have ni,p(a) ≡p s0 − �j�p−1j �sj(−i)p−1−j","inline":true}],[{"text":"Proof","element":"span"}],[{"style":{"width":"28%"},"width":496,"height":103,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/15-11.png","element":"img"}],[{"text":"Recall that Fermat’s theorem (","element":"span"},{"href":"#id-39","referenceIndex":19,"text":"Hardy et al. ","element":"a"},{"text":"(","element":"span"},{"href":"#id-39","referenceIndex":19,"text":"1979","element":"a"},{"text":")) says that for any prime ","element":"span"},{"text":"p ","element":"span"},{"text":"and any number ","element":"span"},{"style":{"height":17.89},"width":138.2,"height":44.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-0.png","element":"img","alt":" α ̸≡p 0,","inline":true,"padRight":true},{"text":"we must have that ","element":"span"},{"style":{"height":19.82},"width":189.52,"height":49.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-1.png","element":"img","alt":" αp−1 ≡p 1","inline":true},{"text":". Hence, for a prime number ","element":"span"},{"text":"p ","element":"span"},{"text":"and some number ","element":"span"},{"style":{"height":17.89},"width":278.72,"height":44.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-2.png","element":"img","alt":" i ̸≡p 0, we have","inline":true}],[{"style":{"width":"91%"},"width":1576,"height":591,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-3.png","element":"img"}],[{"text":"Since the value of ","element":"span"},{"style":{"height":13.09},"width":65.52,"height":32.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-4.png","element":"img","alt":" ni,p","inline":true,"padRight":true},{"text":"is at most ","element":"span"},{"style":{"height":17.6},"width":130.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-5.png","element":"img","alt":" ⌈qn/p⌉","inline":true},{"text":", we can obtain the value of ","element":"span"},{"style":{"height":13.09},"width":65.52,"height":32.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-6.png","element":"img","alt":" ni,p","inline":true,"padRight":true},{"text":"exactly if ","element":"span"},{"text":"p ","element":"span"},{"text":"is chosen to be greater than ","element":"span"},{"style":{"height":17.6},"width":83.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-7.png","element":"img","alt":"√qn","inline":true},{"text":". Now, let us denote the vector ","element":"span"},{"style":{"height":19.22},"width":379.92,"height":48.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-8.png","element":"img","alt":" vi,p ∈ Fnq where the ℓ","inline":true},{"text":"th entry is","element":"span"}],[{"style":{"width":"28%"},"width":485,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-9.png","element":"img"}],[{"text":"Therefore, consider two different subsets ","element":"span"},{"style":{"height":15.6},"width":177.44,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-10.png","element":"img","alt":" S, S′ ⊂ U","inline":true,"padRight":true},{"text":"and assume that their characteristic vectors are ","element":"span"},{"text":"a ","element":"span"},{"text":"and ","element":"span"},{"text":"b ","element":"span"},{"text":"respectively. Therefore, if ","element":"span"},{"text":"a ","element":"span"},{"text":"and ","element":"span"},{"text":"b ","element":"span"},{"text":"both give rise to the same value of ","element":"span"},{"style":{"height":17.49},"width":401.71,"height":43.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-11.png","element":"img","alt":" s, then a.vi,p = b.vi,p.","inline":true,"padRight":true},{"text":"Hence, if the set of vectors","element":"span"}],[{"style":{"width":"51%"},"width":896,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-12.png","element":"img"}],[{"text":"spans ","element":"span"},{"style":{"height":18.62},"width":47.88,"height":46.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-13.png","element":"img","alt":" Fnq ","inline":true,"padRight":true},{"text":", then it must imply that ","element":"span"},{"text":"a ","element":"span"},{"text":"= ","element":"span"},{"text":"b ","element":"span"},{"text":"and our proof will be complete. Consider a subset ","element":"span"},{"style":{"height":14},"width":124.04,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-14.png","element":"img","alt":" T ⊂ S","inline":true,"padRight":true},{"text":"defined by","element":"span"}],[{"style":{"width":"52%"},"width":900,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-15.png","element":"img"}],[{"text":"Now, there are two possible cases. First, let us assume that the vectors in ","element":"span"},{"text":"T ","element":"span"},{"text":"are not all linearly independent in ","element":"span"},{"style":{"height":17.09},"width":42.88,"height":42.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-16.png","element":"img","alt":" Fq","inline":true},{"text":". In that case, we must have a set of tuples ","element":"span"},{"style":{"height":17.6},"width":620.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-17.png","element":"img","alt":" (i1, p1), (i2, p2), . . . , (im, pm) such","inline":true,"padRight":true},{"text":"that","element":"span"}],[{"id":"id-40","style":{"width":"59%"},"width":1030,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-18.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.89},"width":387.64,"height":44.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-19.png","element":"img","alt":" 0 ̸= αj ∈ Fq for all j","inline":true},{"text":". Now, by the Chinese Remainder Theorem, we can find an integer ","element":"span"},{"text":"r ","element":"span"},{"text":"such that ","element":"span"},{"style":{"height":19.41},"width":653,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-20.png","element":"img","alt":" r ≡p1 i1 and r ≡pj 0 for all pj ̸= p1","inline":true},{"text":". Define an infinite dimensional vector ","element":"span"},{"style":{"height":12.8},"width":211.52,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-21.png","element":"img","alt":" ˜v where the","inline":true},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-22.png","element":"img","alt":"ℓ","inline":true},{"text":"th entry is","element":"span"}],[{"style":{"width":"25%"},"width":445,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/16-23.png","element":"img"}],[{"text":"Since, ","element":"span"},{"style":{"height":19.41},"width":153.52,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-0.png","element":"img","alt":" ij ̸≡pj 0","inline":true},{"text":", it is evident that ","element":"span"},{"style":{"height":18.29},"width":371.4,"height":45.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-1.png","element":"img","alt":" ˜v[r] ̸≡q 0 Now, let s","inline":true,"padRight":true},{"text":"be the smallest number such that ","element":"span"},{"style":{"height":17.6},"width":158.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-2.png","element":"img","alt":" ˜v[s] ̸= 0","inline":true,"padRight":true},{"text":"and ","element":"span"},{"text":"s > n ","element":"span"},{"text":"because of our assumption in Eq. ","element":"span"},{"href":"#id-40","text":"1","element":"a"},{"text":". Now consider the vector ","element":"span"},{"style":{"height":15.09},"width":158.72,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-3.png","element":"img","alt":" vt where","inline":true}],[{"style":{"width":"22%"},"width":387,"height":128,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-4.png","element":"img"}],[{"text":"Now, ","element":"span"},{"style":{"height":19.15},"width":540.88,"height":47.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-5.png","element":"img","alt":" vit = 0 for all i < t and vtt ̸= 0","inline":true},{"text":". Hence, the set ","element":"span"},{"style":{"height":18},"width":139.4,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-6.png","element":"img","alt":" {vt}nt=1 ","inline":true,"padRight":true},{"text":"are in the span of ","element":"span"},{"text":"S ","element":"span"},{"text":"and also span ","element":"span"},{"style":{"height":18.81},"width":59.96,"height":47.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-7.png","element":"img","alt":" Fnq .","inline":true,"padRight":true},{"text":"For the second case, let us assume that the vectors in ","element":"span"},{"text":"T ","element":"span"},{"text":"are linearly independent. We require the size of ","element":"span"},{"text":"T ","element":"span"},{"text":"> n ","element":"span"},{"text":"so that the vectors in ","element":"span"},{"style":{"height":19.61},"width":184.68,"height":49.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-8.png","element":"img","alt":" T span Fnq ","inline":true,"padRight":true},{"text":". From the prime number theorem we know that","element":"span"}],[{"style":{"width":"22%"},"width":397,"height":126,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-9.png","element":"img"}],[{"text":"and hence we simply need that","element":"span"}],[{"style":{"width":"21%"},"width":379,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-10.png","element":"img"}],[{"text":"Therefore, ","element":"span"},{"style":{"height":18.45},"width":470,"height":46.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-11.png","element":"img","alt":" k > (1 + o(1))√qn log qn","inline":true,"padRight":true},{"text":"is sufficient.","element":"span"}],[{"text":"4.3. Algebraic method for Geometric distribution","element":"span"}],[{"text":"We will denote the Geometric distribution with success parameter ","element":"span"},{"text":"0 ","element":"span"},{"text":"< p < ","element":"span"},{"text":"1 ","element":"span"},{"text":"as Geo","element":"span"},{"text":"(","element":"span"},{"text":"p","element":"span"},{"text":") ","element":"span"},{"text":"and it has the following form: for a random variable ","element":"span"},{"text":"X ","element":"span"},{"text":"distributed according to Geo","element":"span"},{"text":"(","element":"span"},{"text":"p","element":"span"},{"text":")","element":"span"},{"text":", ","element":"span"},{"text":"Pr(","element":"span"},{"text":"X ","element":"span"},{"text":"= ","element":"span"},{"text":"x","element":"span"},{"text":") = ","element":"span"},{"style":{"height":17.6},"width":616.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-12.png","element":"img","alt":"(1 − p)xp where x ∈ {0, 1, 2, . . . }.","inline":true}],[{"text":"Theorem 18 (Learning mixtures of Geometric Distribution) Let ","element":"span"},{"style":{"height":22.82},"width":529.84,"height":57.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-13.png","element":"img","alt":" M = 1k�ki=1 Geo(pi) be a","inline":true,"padRight":true},{"text":"uniform mixture of ","element":"span"},{"text":"k ","element":"span"},{"text":"Geometric distributions, with unknown probabilities","element":"span"}],[{"style":{"width":"46%"},"width":803,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-14.png","element":"img"}],[{"text":"Then, the first ","element":"span"},{"style":{"height":17.6},"width":84.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-15.png","element":"img","alt":" 4√n","inline":true,"padRight":true},{"text":"moments suffice to learn the parameters ","element":"span"},{"style":{"height":11.6},"width":34.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-16.png","element":"img","alt":" pi","inline":true,"padRight":true},{"text":"and there exists an algorithm that, when given ","element":"span"},{"style":{"height":36.4},"width":307.76,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-17.png","element":"img","alt":" O�k2� √nǫ �8√n�","inline":true},{"text":"samples from ","element":"span"},{"text":"M","element":"span"},{"text":", exactly identifies the parameters ","element":"span"},{"style":{"height":19.94},"width":309.52,"height":49.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-18.png","element":"img","alt":" {pi}ki=1 with high","inline":true,"padRight":true},{"text":"probability.","element":"span"}],[{"text":"Computing the moments. ","element":"span"},{"text":"We compute the ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-19.png","element":"img","alt":" ℓ","inline":true},{"text":"th moment in the natural way again. Let ","element":"span"},{"style":{"height":15.2},"width":226.96,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-20.png","element":"img","alt":" Y1, . . . , Yt ∼","inline":true}],[{"style":{"width":"57%"},"width":1001,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-21.png","element":"img"}],[{"text":"Lemma 19 (Restating Lemma ","element":"span"},{"href":"#id-41","text":"12","element":"a"},{"text":") ","element":"span"},{"style":{"height":31.6},"width":1083.68,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-22.png","element":"img","alt":" Pr[|Sℓ − EXℓ| ≥ γ] ≤ EX2ℓtγ2 ≤ (2ℓ)!γ2t infα�EeαXα2ℓ �where the","inline":true,"padRight":true},{"text":"last inequality assumes the all the moments of ","element":"span"},{"text":"X ","element":"span"},{"text":"are non-negative.","element":"span"}],[{"text":"The following corollary, tailors the above lemma for a mixture of geometric distributions.","element":"span"}],[{"id":"id-35","style":{"width":"3%"},"width":65,"height":5,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-23.png","element":"img"}],[{"text":"Corollary 20 If ","element":"span"},{"style":{"height":31.6},"width":1101.56,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/17-24.png","element":"img","alt":" X ∼ �ki=1 Geo(pi)/k then Pr[|Sℓ − EXℓ| ≥ γ] ≤ 2tγ2� 4ℓmini pi","inline":true}],[{"text":"Proof Given a random variable ","element":"span"},{"style":{"height":17.6},"width":224.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-0.png","element":"img","alt":" Z ∼ Geo(p)","inline":true},{"text":", we will show that ","element":"span"},{"style":{"height":35.37},"width":327.08,"height":88.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-1.png","element":"img","alt":" EZk ≤ 2�2kp�k+1","inline":true},{"text":"for all integer valued ","element":"span"},{"style":{"height":14.8},"width":104.08,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-2.png","element":"img","alt":" k ≥ 0","inline":true},{"text":". It is known that (","element":"span"},{"href":"#id-42","referenceIndex":32,"text":"Weisstein","element":"a"},{"text":", ","element":"span"},{"href":"#id-42","referenceIndex":32,"text":"2019","element":"a"},{"text":")","element":"span"}],[{"style":{"width":"21%"},"width":375,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-3.png","element":"img"}],[{"text":"where Li","element":"span"},{"style":{"height":17.6},"width":102.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-4.png","element":"img","alt":"−k(z)","inline":true,"padRight":true},{"text":"is the polylogarithmic function of order ","element":"span"},{"style":{"height":12.8},"width":57.08,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-5.png","element":"img","alt":" −k","inline":true,"padRight":true},{"text":"and argument ","element":"span"},{"text":"z","element":"span"},{"text":", defined explicitly as","element":"span"}],[{"style":{"width":"42%"},"width":727,"height":137,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-6.png","element":"img"}],[{"text":"with","element":"span"},{"style":{"height":24.83},"width":59.88,"height":62.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-7.png","element":"img","alt":"�kj�","inline":true},{"text":"being the Eulerian numbers (see below). Hence, it can be observed that ","element":"span"},{"style":{"height":15.14},"width":80.4,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-8.png","element":"img","alt":" EZk ","inline":true,"padRight":true},{"text":"is a polyno-","element":"span"}],[{"text":"mial in ","element":"span"},{"style":{"height":25.03},"width":835.56,"height":62.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-9.png","element":"img","alt":"1p of degree k. Denoting Ck = max0≤j≤k−1�kj�","inline":true},{"text":"and substituting it, we get that","element":"span"}],[{"style":{"width":"85%"},"width":1479,"height":370,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-10.png","element":"img"}],[{"text":"Putting everything together and by appealing to Lemma ","element":"span"},{"href":"#id-41","text":"12","element":"a"},{"text":", we get the statement of the corollary.","element":"span"}],[{"text":"For the geometric distribution,","element":"span"}],[{"style":{"width":"22%"},"width":396,"height":130,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-11.png","element":"img"}],[{"text":"where ","element":"span"},{"text":"f ","element":"span"},{"text":"is a degree ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-12.png","element":"img","alt":" ℓ","inline":true,"padRight":true},{"text":"polynomial with integer coefficients. If ","element":"span"},{"style":{"height":17.6},"width":157.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-13.png","element":"img","alt":" 1/pi − 1","inline":true,"padRight":true},{"text":"is an integer multiple of ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-14.png","element":"img","alt":" ǫ","inline":true,"padRight":true},{"text":"then this implies ","element":"span"},{"style":{"height":17.6},"width":197.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-15.png","element":"img","alt":" k(EXℓ)/ǫℓ ","inline":true,"padRight":true},{"text":"is integral and therefore any mixture with a different ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-16.png","element":"img","alt":" ℓ","inline":true},{"text":"th moment must differ by at least ","element":"span"},{"style":{"height":17.6},"width":77.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-17.png","element":"img","alt":" ǫℓ/k","inline":true},{"text":". Hence, learning the ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-18.png","element":"img","alt":" ℓ","inline":true},{"text":"th moment up to ","element":"span"},{"style":{"height":17.6},"width":244.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-19.png","element":"img","alt":" γℓ < ǫℓ/(2k)","inline":true,"padRight":true},{"text":"implies learning the moment exactly.","element":"span"}],[{"text":"Lemma 21 ","element":"span"},{"style":{"height":36.4},"width":307.76,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-20.png","element":"img","alt":" O�k2�√nǫ �8√n�","inline":true},{"text":"samples are sufficient to exactly learn the first ","element":"span"},{"style":{"height":17.6},"width":84.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-21.png","element":"img","alt":" 4√n","inline":true,"padRight":true},{"text":"moments of a","element":"span"}],[{"text":"uniform mixture of ","element":"span"},{"text":"k ","element":"span"},{"text":"Geometric distributions ","element":"span"},{"style":{"height":22.05},"width":296.6,"height":55.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-22.png","element":"img","alt":"�ki=1 Geo(pi)/k","inline":true,"padRight":true},{"text":"with probability at least ","element":"span"},{"text":"7","element":"span"},{"text":"/","element":"span"},{"text":"8 ","element":"span"},{"text":"where ","element":"span"},{"text":"each ","element":"span"},{"style":{"height":23.66},"width":626.83,"height":59.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-23.png","element":"img","alt":"1pi ∈ {1, 1 + ǫ, 1 + 2ǫ, . . . , 1 + nǫ}.","inline":true}],[{"text":"Proof Let ","element":"span"},{"style":{"height":17.6},"width":182.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-24.png","element":"img","alt":" T = 4√n","inline":true},{"text":". From Corollary ","element":"span"},{"href":"#id-35","text":"20 ","element":"a"},{"text":"and the preceding discussion, learning the ","element":"span"},{"style":{"height":12.8},"width":208.84,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-25.png","element":"img","alt":" ℓth moment","inline":true,"padRight":true},{"text":"exactly with failure probability ","element":"span"},{"style":{"height":19.54},"width":325.64,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-26.png","element":"img","alt":" 1/91+T−ℓ requires","inline":true}],[{"style":{"width":"73%"},"width":1271,"height":92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-27.png","element":"img"}],[{"text":"samples. And hence, we can compute all ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-28.png","element":"img","alt":" ℓ","inline":true},{"text":"th moments exactly for ","element":"span"},{"style":{"height":17.6},"width":345.51,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-29.png","element":"img","alt":" 1 ≤ ℓ ≤ 4√n using","inline":true}],[{"style":{"width":"40%"},"width":698,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-30.png","element":"img"}],[{"text":"samples with failure probability ","element":"span"},{"style":{"height":22.05},"width":669.08,"height":55.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/18-31.png","element":"img","alt":"�Tℓ=1 1/91+T−ℓ < �∞i=1 1/9i = 1/8.","inline":true}],[{"text":"How many moments needed to determine the parameters? ","element":"span"},{"text":"It remains to show the first ","element":"span"},{"style":{"height":17.6},"width":84.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-0.png","element":"img","alt":" 4√n","inline":true,"padRight":true},{"text":"moments suffice to determine the ","element":"span"},{"style":{"height":11.6},"width":34.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-1.png","element":"img","alt":" pi","inline":true,"padRight":true},{"text":"values in the mixture ","element":"span"},{"style":{"height":21.86},"width":394.52,"height":54.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-2.png","element":"img","alt":" X ∼ �ki=1 Geo(pi)/k","inline":true},{"text":". To do this suppose ","element":"span"},{"text":"there exists another mixture ","element":"span"},{"style":{"height":21.86},"width":387.32,"height":54.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-3.png","element":"img","alt":" Y ∼ �ki=1 Geo(qi)/k","inline":true,"padRight":true},{"text":"and we will argue that","element":"span"}],[{"style":{"width":"35%"},"width":617,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-4.png","element":"img"}],[{"text":"implies ","element":"span"},{"style":{"height":20.05},"width":366.6,"height":50.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-5.png","element":"img","alt":" {pi}i∈[k] = {qi}i∈[k]","inline":true},{"text":". To argue this, define integers ","element":"span"},{"style":{"height":29.9},"width":1685.68,"height":74.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-6.png","element":"img","alt":" αi, βi ∈ {0, 1, . . . , n} such that pi =1","inline":true}],[{"style":{"width":"95%"},"width":1650,"height":472,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-7.png","element":"img"}],[{"text":"Hence, if the first ","element":"span"},{"text":"T ","element":"span"},{"text":"moments match, ","element":"span"},{"style":{"height":17.6},"width":696.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-8.png","element":"img","alt":" mℓ(A) = mℓ(B) for all ℓ = 0, 1, . . . , T","inline":true},{"text":". But, again Theorem ","element":"span"},{"href":"#id-36","text":"16 ","element":"a"},{"text":"establishes that if ","element":"span"},{"style":{"height":17.6},"width":173.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-9.png","element":"img","alt":" T = 4√n","inline":true,"padRight":true},{"text":"then this implies ","element":"span"},{"text":"A ","element":"span"},{"text":"= ","element":"span"},{"text":"B","element":"span"},{"text":".","element":"span"}],[{"text":"Alternative Technique. ","element":"span"},{"text":"In the previous analysis the parameters of the geometric distribution (","element":"span"},{"style":{"height":11.6},"width":34.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-10.png","element":"img","alt":"pi","inline":true},{"text":"’s) had to belong to the set ","element":"span"},{"style":{"height":22.46},"width":438.64,"height":56.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-11.png","element":"img","alt":" {1, 11+ǫ, 11+2ǫ, . . . , 11+nǫ}","inline":true},{"text":". The reason we had to choose this set is ","element":"span"},{"text":"because the moments were polynomials in inverse of the parameters ( ","element":"span"},{"style":{"height":23.66},"width":28.28,"height":59.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-12.png","element":"img","alt":"1pi ","inline":true,"padRight":true},{"text":"’s). However it is also pos- ","element":"span"},{"text":"sible to obtain a sample complexity bound when the parameters belong to the set ","element":"span"},{"style":{"height":17.6},"width":290.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-13.png","element":"img","alt":" {0, ǫ, 2ǫ, . . . , 1}.","inline":true,"padRight":true},{"text":"This can be done by estimating the probability mass function of the mixture at the discrete points ","element":"span"},{"text":"{","element":"span"},{"text":"0","element":"span"},{"text":", ","element":"span"},{"text":"1","element":"span"},{"text":", ","element":"span"},{"text":"2","element":"span"},{"text":", . . . ","element":"span"},{"text":"}","element":"span"},{"text":". We have the following theorem in this case.","element":"span"}],[{"style":{"width":"105%"},"width":1823,"height":288,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-14.png","element":"img"}],[{"text":"Recall that for a random variable ","element":"span"},{"style":{"height":13.6},"width":163.4,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-15.png","element":"img","alt":" X ∼ M","inline":true,"padRight":true},{"text":"distributed according to the mixture of geometric distributions, we have","element":"span"}],[{"style":{"width":"35%"},"width":611,"height":373,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-16.png","element":"img"}],[{"text":"and more generally,","element":"span"}],[{"style":{"width":"32%"},"width":561,"height":112,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/19-17.png","element":"img"}],[{"text":"which is a polynomial in degree ","element":"span"},{"text":"k ","element":"span"},{"text":"+ 1","element":"span"},{"text":". Now, for the mixture ","element":"span"},{"style":{"height":21.86},"width":429.32,"height":54.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-0.png","element":"img","alt":" X ∼ 1/k �ki=1 Geo(pi)","inline":true},{"text":", we need to ","element":"span"},{"text":"argue that estimating the probabilities ","element":"span"},{"style":{"height":20.8},"width":646.8,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-1.png","element":"img","alt":" Pr(X = ℓ) for ℓ = 0, 1, . . . , 4�1/ǫ","inline":true,"padRight":true},{"text":"is sufficient to recover the parameters ","element":"span"},{"style":{"height":11.6},"width":34.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-2.png","element":"img","alt":" pi","inline":true},{"text":". Again, suppose there exists another mixture ","element":"span"},{"style":{"height":21.86},"width":585.64,"height":54.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-3.png","element":"img","alt":" Y ∼ 1/k �ki=1 Geo(qi) such that","inline":true}],[{"style":{"width":"51%"},"width":882,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-4.png","element":"img"}],[{"text":"and we will argue that this implies ","element":"span"},{"style":{"height":20.05},"width":383.4,"height":50.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-5.png","element":"img","alt":" {pi}i∈[k] = {qi}i∈[k]","inline":true},{"text":". As before, define integers ","element":"span"},{"style":{"height":16.4},"width":152.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-6.png","element":"img","alt":" αi, βi ∈","inline":true}],[{"style":{"width":"99%"},"width":1723,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-7.png","element":"img"}],[{"text":"and it can be shown after some algebraic manipulations that","element":"span"}],[{"style":{"width":"95%"},"width":1644,"height":222,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-8.png","element":"img"}],[{"text":"Notice that ","element":"span"},{"style":{"height":17.6},"width":305.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-9.png","element":"img","alt":" m0(A) = m0(B)","inline":true,"padRight":true},{"text":"trivially because both of them contain ","element":"span"},{"text":"k ","element":"span"},{"text":"components. Again, Theorem ","element":"span"},{"href":"#id-36","text":"16 ","element":"a"},{"text":"establishes that if ","element":"span"},{"style":{"height":20.8},"width":754.48,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-10.png","element":"img","alt":" mℓ(A) = mℓ(B) for ℓ ∈ {0, 1, . . . 4�1/ǫ}","inline":true,"padRight":true},{"text":"then this implies ","element":"span"},{"text":"A ","element":"span"},{"text":"= ","element":"span"},{"text":"B","element":"span"},{"text":".","element":"span"}],[{"text":"Computing the probabilities. ","element":"span"},{"text":"Suppose ","element":"span"},{"style":{"height":15.2},"width":241.92,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-11.png","element":"img","alt":" Y1, Y2, . . . , Yt","inline":true,"padRight":true},{"text":"are i.i.d. with ","element":"span"},{"style":{"height":22.05},"width":451.16,"height":55.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-12.png","element":"img","alt":" X ∼ 1/k �ki=1 Geo(pi).","inline":true,"padRight":true},{"text":"Let us denote ","element":"span"},{"style":{"height":15.28},"width":42.88,"height":38.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-13.png","element":"img","alt":" Sℓ","inline":true,"padRight":true},{"text":"as the empirical probability that we calculate as,","element":"span"}],[{"style":{"width":"22%"},"width":393,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-14.png","element":"img"}],[{"text":"It is obvious that ","element":"span"},{"style":{"height":17.6},"width":326.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-15.png","element":"img","alt":" ESℓ = Pr(X = ℓ)","inline":true},{"text":". Now, using Chernoff bound, we have","element":"span"}],[{"style":{"width":"41%"},"width":712,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-16.png","element":"img"}],[{"text":"Again, recall that","element":"span"}],[{"style":{"width":"24%"},"width":425,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-17.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":72.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-18.png","element":"img","alt":" f(·)","inline":true,"padRight":true},{"text":"is a polynomial of degree ","element":"span"},{"style":{"height":14},"width":94.96,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-19.png","element":"img","alt":" ℓ + 1","inline":true,"padRight":true},{"text":"with integer coefficients. If ","element":"span"},{"style":{"height":11.6},"width":34.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-20.png","element":"img","alt":" pi","inline":true,"padRight":true},{"text":"is an integer multiple of ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-21.png","element":"img","alt":"ǫ","inline":true,"padRight":true},{"text":"then this implies ","element":"span"},{"style":{"height":19.14},"width":163.4,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-22.png","element":"img","alt":" kSℓ/ǫℓ+1 ","inline":true,"padRight":true},{"text":"is integral and therefore any mixture with a different ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-23.png","element":"img","alt":" ℓ","inline":true},{"text":"th moment has a ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-24.png","element":"img","alt":" ℓ","inline":true,"padRight":true},{"text":"moment that differs by at least ","element":"span"},{"style":{"height":19.14},"width":121.88,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-25.png","element":"img","alt":" ǫℓ+1/k","inline":true},{"text":". Hence, learning the ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-26.png","element":"img","alt":" ℓ","inline":true},{"text":"th moment up to ","element":"span"},{"style":{"height":19.14},"width":280.52,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-27.png","element":"img","alt":" γℓ < ǫℓ+1/(2k)","inline":true,"padRight":true},{"text":"implies learning the moment exactly. We will use ","element":"span"},{"style":{"height":28.01},"width":332.32,"height":70.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-28.png","element":"img","alt":" t = 12k2ǫ8/√ǫ+2 log 64√ǫ ","inline":true,"padRight":true},{"text":"number of samples and we ","element":"span"},{"text":"will show it will be sufficient to succeed with a probability of at least ","element":"span"},{"style":{"height":21.26},"width":17,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-29.png","element":"img","alt":"78","inline":true},{"text":". We will estimate the ","element":"span"},{"text":"probabilities as mentioned above and therefore the failure probability can be calculated by using the Chernoff Bound and a union bound over ","element":"span"},{"style":{"height":25.6},"width":44.32,"height":64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-30.png","element":"img","alt":"4√ǫ ","inline":true,"padRight":true},{"text":"probabilities to be estimated. Therefore the probability ","element":"span"},{"text":"of failure is bounded above by,","element":"span"}],[{"style":{"width":"83%"},"width":1438,"height":231,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/20-31.png","element":"img"}],[{"text":"and hence the proof is complete.","element":"span"}],[{"text":"4.4. Proof of Lemma ","element":"span"},{"text":"14","element":"span"}],[{"text":"We will prove that for ","element":"span"},{"style":{"height":17.6},"width":262.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/21-0.png","element":"img","alt":" X ∼ Bin(n, p)","inline":true},{"text":", the leading term of ","element":"span"},{"style":{"height":21.45},"width":401.44,"height":53.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/21-1.png","element":"img","alt":" EXℓ is �ℓ−1i=0(n − i)pℓ","inline":true},{"text":". Since for ","element":"span"},{"style":{"height":15.2},"width":113.24,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/21-2.png","element":"img","alt":" n ≥ ℓ,","inline":true},{"style":{"height":21.46},"width":310.48,"height":53.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/21-3.png","element":"img","alt":"�ℓ−1i=0(n − i) ̸= 0","inline":true},{"text":", this implies that ","element":"span"},{"style":{"height":12},"width":84.64,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/21-4.png","element":"img","alt":" EXℓ","inline":true,"padRight":true},{"text":"is a polynomial of degree exactly ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/21-5.png","element":"img","alt":" ℓ","inline":true},{"text":". We will prove this by ","element":"span"},{"text":"induction. Since ","element":"span"},{"style":{"height":17.6},"width":265.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/21-6.png","element":"img","alt":" X ∼ Bin(n, p)","inline":true},{"text":", we know that ","element":"span"},{"text":"E","element":"span"},{"text":"X ","element":"span"},{"text":"= ","element":"span"},{"text":"np. ","element":"span"},{"text":"This verifies the base case. Now, in the induction step, let us assume that the leading term of ","element":"span"},{"style":{"height":21.86},"width":414,"height":54.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/21-7.png","element":"img","alt":" EXk is �k−1i=0 (n − i)pk","inline":true},{"text":". It is known that (see ","element":"span"},{"href":"#id-4","referenceIndex":4,"text":"Belkin and Sinha ","element":"a"},{"text":"(","element":"span"},{"href":"#id-4","referenceIndex":4,"text":"2010","element":"a"},{"text":"))","element":"span"}],[{"style":{"width":"38%"},"width":664,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/21-8.png","element":"img"}],[{"text":"Therefore it follows that the leading term of ","element":"span"},{"style":{"height":15.34},"width":172.04,"height":38.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/21-9.png","element":"img","alt":" EXk+1 is","inline":true}],[{"style":{"width":"59%"},"width":1026,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.06776/images/21-10.png","element":"img"}],[{"text":"This proves the induction step and the lemma.","element":"span"}]]}],"_version":"3.3.2"},"paperNode":"$1b:props:children:props:children:0:props:product"}]]]}]}]