36:[["$","audio",null,{"id":"tts"}],["$","$L3b",null,{"paperID":"2001.10631","publisher":"arxiv","paperJSON":{"title":"Sub-Gaussian Matrices on Sets: Optimal Tail Dependence and Applications","paperID":"2001.10631","avgLineHeight":14.88,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"Random linear mappings are widely used in modern signal processing, compressed sensing and machine learning. These mappings may be used to embed the data into a significantly lower dimension while at the same time preserving useful information. This is done by approximately preserving the distances between data points, which are assumed to belong to ","element":"span"},{"style":{"height":10.8},"width":48.8,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/0-0.png","element":"img","alt":" Rn","inline":true},{"text":". Thus, the performance of these mappings is usually captured by how close they are to an isometry on the data. Random Gaussian linear mappings have been the object of much study, while the sub-Gaussian settings is not yet fully understood. In the latter case, the performance depends on the sub-Gaussian norm of the rows. In many applications, e.g., compressed sensing, this norm may be large, or even growing with dimension, and thus it is important to characterize this dependence.","element":"span"}],[{"text":"We study when a sub-Gaussian matrix can become a near isometry on a set, show that previous best known dependence on the sub-Gaussian norm was sub-optimal, and present the optimal dependence. Our result not only answers a remaining question posed by Liaw, Mehrabian, Plan and Vershynin in 2017, but also generalizes their work. ","element":"span"},{"text":"We also develop a new Bernstein type inequality for sub-exponential random variables, and a new Hanson-Wright inequality for quadratic forms of sub-Gaussian random variables, in both cases improving the bounds in the sub-Gaussian regime under moment constraints. Finally, we illustrate popular applications such as Johnson-Lindenstrauss embeddings, randomized sketches and blind demodulation, whose theoretical guarantees can be improved by our results in the sub-Gaussian case.","element":"span"}],[{"text":"Keywords: ","element":"span"},{"text":"compressed sensing, dimension reduction, random matrices, sub-Gaussian, Bernstein’s inequality, Hanson-Wright inequality, blind demodulation","element":"span"}]]},{"heading":"1 Introduction","paragraphs":[[{"text":"Random linear mappings play a central role in dimension reduction, compressed sensing, and numerical linear algebra due to their propensity to preserve the geometry of a given set. The performance of a random linear mapping ","element":"span"},{"style":{"height":13.94},"width":194.76,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/0-1.png","element":"img","alt":" A ∈ Rm×n ","inline":true,"padRight":true},{"text":"is often determined by the uniform concentration bound of","element":"span"}],[{"style":{"height":23.49},"width":430.28,"height":58.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/0-2.png","element":"img","alt":"√m∥Ax∥2 around ∥x∥2","inline":true,"padRight":true},{"text":"for all vectors in a set of interest (in other words, how close the map ","element":"span"},{"style":{"height":25.6},"width":95.88,"height":64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/0-3.png","element":"img","alt":"1√mA","inline":true,"padRight":true},{"text":"is to being an isometry on the set). This is now well-understood by the standard techniques in the Gaussian random matrix case ","element":"span"},{"href":"#id-0","referenceIndex":8,"text":"[9,","element":"a"},{"href":"#id-1","referenceIndex":30,"text":"30,","element":"a"},{"href":"#id-2","referenceIndex":31,"text":"32]","element":"a"},{"text":". However, in many applications, non-gaussian random mappings are more useful because of their computational/storage benefits or simply the difficulty to generate Gaussian matrices using sampling devices ","element":"span"},{"href":"#id-3","referenceIndex":16,"text":"[17]","element":"a"},{"text":". For example, sparse or structured random matrices are preferred in both dimension reduction ","element":"span"},{"href":"#id-4","referenceIndex":7,"text":"[8] ","element":"a"},{"text":"and random sketching in numerical linear algebra ","element":"span"},{"text":"[1,","element":"span"},{"href":"#id-5","referenceIndex":14,"text":"14,","element":"a"},{"href":"#id-6","referenceIndex":24,"text":"25,","element":"a"},{"href":"#id-7","referenceIndex":33,"text":"34] ","element":"a"},{"text":"since they provide more efficient matrix multiplications than dense and unstructured matrices such as Gaussian ones. ","element":"span"},{"text":"Certain formulations in compressed sensing also naturally require random matrices such as randomly subsampled Fourier measurements ","element":"span"},{"href":"#id-8","referenceIndex":17,"text":"[18] ","element":"a"},{"text":"or Bernoulli random matrices ","element":"span"},{"href":"#id-9","referenceIndex":28,"text":"[28]","element":"a"},{"text":".","element":"span"}],[{"text":"There has been a series of recent works ","element":"span"},{"href":"#id-4","referenceIndex":7,"text":"[8, ","element":"a"},{"href":"#id-10","referenceIndex":20,"text":"20, ","element":"a"},{"href":"#id-6","referenceIndex":24,"text":"24] ","element":"a"},{"text":"to demonstrate the effectiveness of random mappings outside the Gaussian setup. ","element":"span"},{"text":"Unlike the Gaussian case in which we have a rotation invariance property, non-Gaussian setups require more sophisticated arguments to address various new technical challenges. In this article, we will be focusing on sub-Gaussian random mappings.","element":"span"}],[{"text":"Let us recall some definitions. For ","element":"span"},{"style":{"height":16.4},"width":271.12,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-0.png","element":"img","alt":" α ≥ 1, the ψα","inline":true},{"text":"-norm (which is the Orlicz norm taken with respect to function exp(","element":"span"},{"style":{"height":17.6},"width":109.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-1.png","element":"img","alt":"xα) −","inline":true,"padRight":true},{"text":"1) of a random variable ","element":"span"},{"text":"X ","element":"span"},{"text":"is defined as","element":"span"}],[{"style":{"width":"42%"},"width":796,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-2.png","element":"img"}],[{"text":"In particular, ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-3.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"= 2 gives the sub-Gaussian norm and ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-4.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"= 1 gives the sub-exponential norm. The random variable ","element":"span"},{"text":"X ","element":"span"},{"text":"is called sub-Gaussian if ","element":"span"},{"style":{"height":18.48},"width":225.92,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-5.png","element":"img","alt":" ∥X∥ψ2 < ∞","inline":true,"padRight":true},{"text":"and called sub-exponential if ","element":"span"},{"style":{"height":18.48},"width":238.08,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-6.png","element":"img","alt":" ∥X∥ψ1 < ∞.","inline":true}],[{"text":"For sub-Gaussian random variables, the ","element":"span"},{"style":{"height":16.4},"width":45.32,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-7.png","element":"img","alt":" ψ2","inline":true},{"text":"-norm roughly measures how fast the tail distribution decays – usually the bigger ","element":"span"},{"style":{"height":16.4},"width":45.32,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-8.png","element":"img","alt":" ψ2","inline":true},{"text":"-norm is, the heavier the tail. We will repeatedly use the fact that ","element":"span"},{"style":{"height":18.48},"width":223.84,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-9.png","element":"img","alt":"∥X∥ψ2 ≤ K","inline":true,"padRight":true},{"text":"if and only if the tail probability ","element":"span"},{"style":{"height":17.6},"width":183.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-10.png","element":"img","alt":" P(|X| ≥ t","inline":true},{"text":") is bounded by a Gaussian with standard deviation in the order of ","element":"span"},{"text":"K","element":"span"},{"text":". ","element":"span"},{"text":"A precise statement of this, along with some other properties of ","element":"span"},{"style":{"height":16.4},"width":50.32,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-11.png","element":"img","alt":"ψα","inline":true},{"text":"-norm, can be found in Appendix ","element":"span"},{"text":"A.","element":"span"}],[{"text":"The sub-Gaussian norms for many random variables can be calculated by looking at the moment generating function of their squares. For example, the sub-Gaussian norm for ","element":"span"},{"style":{"height":19.14},"width":332.84,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-12.png","element":"img","alt":" Normal(0, σ2) is","inline":true},{"style":{"height":31.6},"width":442.57,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-13.png","element":"img","alt":"�83 σ; for Bernoulli(p","inline":true},{"text":") it is log","element":"span"},{"style":{"height":24.48},"width":242.24,"height":61.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-14.png","element":"img","alt":"− 12 �1 + p−1�","inline":true},{"text":"; for Rademacher random variable it is log","element":"span"},{"style":{"height":23.09},"width":193.92,"height":57.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-15.png","element":"img","alt":"− 12(2) and","inline":true}],[{"text":"for any bounded (by ","element":"span"},{"text":"M","element":"span"},{"text":") random variable it is no more than ","element":"span"},{"style":{"height":23.09},"width":706.28,"height":57.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-16.png","element":"img","alt":" M log− 12 (2). For Exponential(λ), it","inline":true,"padRight":true},{"text":"is not a sub-Gaussian random variable, but has sub-exponential norm ","element":"span"},{"style":{"height":21.26},"width":36.96,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-17.png","element":"img","alt":"2λ.","inline":true,"padRight":true},{"text":"For a random vector ","element":"span"},{"style":{"height":16},"width":307.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-18.png","element":"img","alt":" a ∈ Rn we say a","inline":true,"padRight":true},{"text":"is sub-Gaussian if","element":"span"}],[{"style":{"width":"65%"},"width":1233,"height":223,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-19.png","element":"img"}],[{"text":"We say a random matrix ","element":"span"},{"style":{"height":13.94},"width":210.12,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-20.png","element":"img","alt":" A ∈ Rm×n ","inline":true,"padRight":true},{"text":"is isotropic and sub-Gaussian if its rows are independent, isotropic and sub-Gaussian random vectors in ","element":"span"},{"style":{"height":12},"width":52.68,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-21.png","element":"img","alt":" Rn","inline":true},{"text":". The sub-Gaussian parameter of ","element":"span"},{"text":"A ","element":"span"},{"text":"is defined as","element":"span"}],[{"style":{"width":"47%"},"width":884,"height":74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-22.png","element":"img"}],[{"text":"For random matrix ","element":"span"},{"style":{"height":13.94},"width":194.76,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-23.png","element":"img","alt":" A ∈ Rm×n","inline":true},{"text":", the isotropic condition guarantees ","element":"span"},{"style":{"height":25.6},"width":95.88,"height":64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-24.png","element":"img","alt":"1√mA","inline":true,"padRight":true},{"text":"will preserve Euclidean ","element":"span"},{"text":"norm in expectation. Some examples of isotropic and sub-Gaussian matrices are matrices whose entries ","element":"span"},{"style":{"height":17.89},"width":59.16,"height":44.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-25.png","element":"img","alt":" Aij","inline":true,"padRight":true},{"text":"are independent and sub-Gaussian with ","element":"span"},{"style":{"height":22.13},"width":88.44,"height":55.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/1-26.png","element":"img","alt":" EA2ij ","inline":true,"padRight":true},{"text":"= 1, uniformly subsampled (with replace- ","element":"span"},{"text":"ment and after proper normalization) rows of orthonormal basis or tight frames, etc. ","element":"span"},{"href":"#id-2","referenceIndex":31,"text":"[32]","element":"a"},{"text":". In the cases of Bernoulli matrices or sparse ternary matrices, which is a generalization of the database-friendly mappings in ","element":"span"},{"text":"[1]","element":"span"},{"text":", the sub-Gaussian parameter can depend on the signal dimension ","element":"span"},{"text":"n ","element":"span"},{"text":"if the probability of an entry being nonzero is ","element":"span"},{"text":"n","element":"span"},{"text":"-dependent.","element":"span"}],[{"text":"In the line of research regarding sub-Gaussian random mappings, Liaw et al. ","element":"span"},{"href":"#id-10","referenceIndex":20,"text":"[20] ","element":"a"},{"text":"showed that for isotropic and sub-Gaussian mapping ","element":"span"},{"text":"A ","element":"span"},{"text":"with sub-Gaussian parameter ","element":"span"},{"style":{"height":15.6},"width":460.64,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-0.png","element":"img","alt":" K, let T ⊂ Rn, then we","inline":true,"padRight":true},{"text":"have with high probability,","element":"span"}],[{"id":"id-12","style":{"width":"75%"},"width":1411,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-1.png","element":"img"}],[{"text":"Here ","element":"span"},{"text":"w","element":"span"},{"text":"(","element":"span"},{"text":"T","element":"span"},{"text":") is the Gaussian width given by","element":"span"}],[{"style":{"width":"73%"},"width":1374,"height":252,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-2.png","element":"img"}],[{"text":"which is the radius when ","element":"span"},{"text":"T ","element":"span"},{"text":"is symmetric.","element":"span"}],[{"text":"Gaussian width measures the complexity of a set. In particular, denote cone(","element":"span"},{"style":{"height":17.6},"width":298,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-3.png","element":"img","alt":"T) := {tx : t ≥","inline":true,"padRight":true},{"text":"0","element":"span"},{"style":{"height":19.14},"width":630.44,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-4.png","element":"img","alt":", x ∈ T}, then w2(cone(T) ∩ Sn−1","inline":true},{"text":") is a meaningful approximation for dimension ","element":"span"},{"href":"#id-11","referenceIndex":6,"text":"[6,","element":"a"},{"href":"#id-6","referenceIndex":24,"text":"24]","element":"a"},{"text":". Generally rad(","element":"span"},{"text":"T","element":"span"},{"text":") is also dominated by ","element":"span"},{"text":"w","element":"span"},{"text":"(","element":"span"},{"text":"T","element":"span"},{"text":"). For example, if 0 ","element":"span"},{"style":{"height":12.8},"width":72.28,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-5.png","element":"img","alt":" ∈ T","inline":true},{"text":", then by Jensen’s inequality,","element":"span"}],[{"style":{"width":"70%"},"width":1312,"height":84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-6.png","element":"img"}],[{"text":"In such case, ","element":"span"},{"href":"#id-12","text":"(1) ","element":"a"},{"text":"implies that with high probability, ","element":"span"},{"style":{"height":25.6},"width":95.88,"height":64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-7.png","element":"img","alt":"1√mA","inline":true,"padRight":true},{"text":"is a near isometry on ","element":"span"},{"text":"T ","element":"span"},{"text":"whenever ","element":"span"},{"style":{"height":19.14},"width":289.24,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-8.png","element":"img","alt":"m ≥ CK4w2(T","inline":true},{"text":") for some constant ","element":"span"},{"text":"C","element":"span"},{"text":".","element":"span"}],[{"text":"The dependency on ","element":"span"},{"text":"w","element":"span"},{"text":"(","element":"span"},{"text":"T","element":"span"},{"text":") in ","element":"span"},{"href":"#id-12","text":"(1) ","element":"a"},{"text":"is optimal. This is easy to see when ","element":"span"},{"text":"m ","element":"span"},{"text":"= 1 and ","element":"span"},{"text":"A ","element":"span"},{"text":"has i.i.d. ","element":"span"},{"text":"Normal","element":"span"},{"text":"(0","element":"span"},{"text":", ","element":"span"},{"text":"1) entries. But when it comes to the dependency on the sub-Gaussian parameter ","element":"span"},{"text":"K","element":"span"},{"text":", whether the ","element":"span"},{"style":{"height":14.74},"width":57.32,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-9.png","element":"img","alt":" K2 ","inline":true,"padRight":true},{"text":"factor can be improved is a question raised but left unanswered in ","element":"span"},{"href":"#id-10","referenceIndex":20,"text":"[20]","element":"a"},{"text":". Other important works regarding this type of bounds are either not explicit ","element":"span"},{"href":"#id-13","referenceIndex":15,"text":"[15,","element":"a"},{"href":"#id-6","referenceIndex":24,"text":"24] ","element":"a"},{"text":"or at least of the same order ","element":"span"},{"href":"#id-11","referenceIndex":6,"style":{"height":19.14},"width":232.8,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-10.png","element":"img","alt":" K2 [7,8,23].","inline":true}],[{"text":"In this article, we refine this dependency on the sub-Gaussian parameter from ","element":"span"},{"style":{"height":14.73},"width":57.32,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-11.png","element":"img","alt":" K2 ","inline":true,"padRight":true},{"text":"to the optimal ","element":"span"},{"style":{"height":18.05},"width":180.16,"height":45.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-12.png","element":"img","alt":"K√log K","inline":true},{"text":". This enhances the concentration bound substantially when the sub-Gaussian mapping is not well-behaved, for example, when ","element":"span"},{"text":"K ","element":"span"},{"text":"increases together with the signal dimension. We also relax the row-independent requirement by considering random mappings in the form of ","element":"span"},{"text":"BA ","element":"span"},{"text":"where ","element":"span"},{"text":"B ","element":"span"},{"text":"is an arbitrary matrix and ","element":"span"},{"text":"A ","element":"span"},{"text":"is mean zero, isotropic and sub-Gaussian. The mean zero assumption is additional when comparing to the assumptions for ","element":"span"},{"href":"#id-12","text":"(1)","element":"a"},{"text":", and not needed when ","element":"span"},{"text":"B ","element":"span"},{"text":"is only diagonal. However, it is necessary for arbitrary ","element":"span"},{"text":"B","element":"span"},{"text":". Our bound is broadly applicable since it only require these properties from the random matrix ","element":"span"},{"text":"A ","element":"span"},{"text":"without any other assumptions.","element":"span"}],[{"text":"Now we state our main theorem. In the following, ","element":"span"},{"style":{"height":17.6},"width":277.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-13.png","element":"img","alt":" ∥B∥F and ∥B∥","inline":true,"padRight":true},{"text":"denote Frobenius and operator norm of ","element":"span"},{"text":"B ","element":"span"},{"text":"respectively. The matrix ","element":"span"},{"style":{"height":15.94},"width":192.24,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-14.png","element":"img","alt":" B ∈ Rl×m ","inline":true,"padRight":true},{"text":"is diagonal means that the only possible non-zero entries are ","element":"span"},{"style":{"height":17.6},"width":551.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-15.png","element":"img","alt":" Bii where 1 ≤ i ≤ min{l, m}.","inline":true}],[{"id":"id-14","style":{"height":15.94},"width":603.12,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-16.png","element":"img","alt":"Theorem 1.1. Let B ∈ Rl×m ","inline":true,"padRight":true},{"text":"be a fixed matrix, let ","element":"span"},{"style":{"height":13.94},"width":211.08,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-17.png","element":"img","alt":" A ∈ Rm×n ","inline":true,"padRight":true},{"text":"be a mean zero, isotropic and sub-Gaussian matrix with sub-Gaussian parameter ","element":"span"},{"style":{"height":13.2},"width":344.52,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-18.png","element":"img","alt":" K and let T ⊂ Rn ","inline":true,"padRight":true},{"text":"be a bounded set. Then","element":"span"}],[{"style":{"width":"65%"},"width":1232,"height":83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/2-19.png","element":"img"}],[{"style":{"width":"83%"},"width":1556,"height":169,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-0.png","element":"img"}],[{"text":"Here ","element":"span"},{"text":"C ","element":"span"},{"text":"is an absolute constant. Furthermore, when ","element":"span"},{"text":"B ","element":"span"},{"text":"is a diagonal matrix, random matrix ","element":"span"},{"text":"A ","element":"span"},{"text":"only need to be isotropic and sub-Gaussian with sub-Gaussian parameter ","element":"span"},{"text":"K ","element":"span"},{"text":"for the conclusions to hold.","element":"span"}],[{"id":"id-57","text":"When ","element":"span"},{"text":"B ","element":"span"},{"text":"is the identity matrix, we have the following corollary.","element":"span"}],[{"style":{"height":16.74},"width":587.4,"height":41.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-1.png","element":"img","alt":"Corollary 1.2. Let A ∈ Rm×n ","inline":true,"padRight":true},{"text":"be an isotropic and sub-Gaussian matrix with sub-Gaussian parameter ","element":"span"},{"style":{"height":15.6},"width":357.48,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-2.png","element":"img","alt":" K, and let T ⊂ Rn ","inline":true,"padRight":true},{"text":"be a bounded set. Then","element":"span"}],[{"style":{"width":"78%"},"width":1475,"height":284,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-3.png","element":"img"}],[{"text":"The bound appearing on the right hand side of Theorem ","element":"span"},{"href":"#id-14","text":"1.1 ","element":"a"},{"text":"is optimal in general. The ","element":"span"},{"style":{"height":17.6},"width":79.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-4.png","element":"img","alt":" ∥B∥","inline":true,"padRight":true},{"text":"factor is optimal and this is easy to see when ","element":"span"},{"text":"B ","element":"span"},{"text":"has non-zero singular values being all equal (because the statement should be invariant under scaling for ","element":"span"},{"text":"B","element":"span"},{"text":"). We also give another example below in which the singular values are not all equal. The dependency on ","element":"span"},{"text":"K ","element":"span"},{"text":"is optimal and this follows from Proposition ","element":"span"},{"href":"#id-15","text":"4.5 ","element":"a"},{"text":"in Section ","element":"span"},{"href":"#id-16","text":"4.4 ","element":"a"},{"text":"with ","element":"span"},{"text":"T ","element":"span"},{"text":"being a singleton. As mentioned before, rad(","element":"span"},{"text":"T","element":"span"},{"text":") is generally dominated by ","element":"span"},{"text":"w","element":"span"},{"text":"(","element":"span"},{"text":"T","element":"span"},{"text":") and the dependency on ","element":"span"},{"text":"w","element":"span"},{"text":"(","element":"span"},{"text":"T","element":"span"},{"text":") is also optimal.","element":"span"}],[{"text":"Assuming rad(","element":"span"},{"text":"T","element":"span"},{"text":") is dominated by ","element":"span"},{"text":"w","element":"span"},{"text":"(","element":"span"},{"text":"T","element":"span"},{"text":"), Theorem ","element":"span"},{"href":"#id-14","text":"1.1 ","element":"a"},{"text":"then implies that with high probability, matrix ","element":"span"},{"style":{"height":24.85},"width":158.76,"height":62.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-5.png","element":"img","alt":"1∥B∥F BA","inline":true,"padRight":true},{"text":"is a near isometry on ","element":"span"},{"text":"T ","element":"span"},{"text":"whenever the stable rank of ","element":"span"},{"text":"B","element":"span"}],[{"style":{"width":"36%"},"width":685,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-6.png","element":"img"}],[{"text":"This result recovers ","element":"span"},{"href":"#id-12","text":"(1) ","element":"a"},{"text":"with improved dependency on ","element":"span"},{"style":{"height":15.09},"width":325.44,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-7.png","element":"img","alt":" K when B = Im.","inline":true}],[{"style":{"width":"96%"},"width":1799,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-8.png","element":"img"}],[{"style":{"height":20.67},"width":1429.32,"height":51.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-9.png","element":"img","alt":"Bij = 1 for 1 ≤ i ≤ l and 1 ≤ j ≤ m, then ∥B∥F = ∥B∥ =√lm. Suppose A","inline":true,"padRight":true},{"text":"has i.i.d. entries where","element":"span"}],[{"style":{"width":"51%"},"width":972,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-10.png","element":"img"}],[{"text":"It is easy to verify that ","element":"span"},{"style":{"height":16.8},"width":273,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-11.png","element":"img","alt":" EA ̸= 0 and A","inline":true,"padRight":true},{"text":"has isotropic rows ","element":"span"},{"style":{"height":20.13},"width":56.64,"height":50.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-12.png","element":"img","alt":" ATi ","inline":true,"padRight":true},{"text":". Moreover, for any ","element":"span"},{"style":{"height":17.6},"width":337.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-13.png","element":"img","alt":" y = (y1, . . . , yn) ∈","inline":true},{"style":{"height":15.13},"width":88.52,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-14.png","element":"img","alt":"Sn−1","inline":true},{"text":", notice that","element":"span"}],[{"style":{"width":"99%"},"width":1870,"height":604,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/3-15.png","element":"img"}],[{"text":"On the other hand, ","element":"span"},{"style":{"height":20.18},"width":538.2,"height":50.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-0.png","element":"img","alt":" ∥B∥ [w(T) + rad(T)] =√lm","inline":true},{"text":". So in this case, Theorem ","element":"span"},{"href":"#id-14","text":"1.1 ","element":"a"},{"text":"does not hold when ","element":"span"},{"text":"m ","element":"span"},{"text":"is sufficiently large.","element":"span"}],[{"text":"As an example demonstrating ","element":"span"},{"style":{"height":17.6},"width":79.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-1.png","element":"img","alt":" ∥B∥","inline":true,"padRight":true},{"text":"is optimal in general, consider the case when ","element":"span"},{"style":{"height":19.13},"width":318.72,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-2.png","element":"img","alt":" T = {x} ⊂ Sn−1,","inline":true,"padRight":true},{"text":"A ","element":"span"},{"text":"is standard gaussian so that ","element":"span"},{"style":{"height":17.6},"width":1267.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-3.png","element":"img","alt":" g := Ax ∼ Normal(0, Im) and B = diag(τ, 1, . . . , 1) where τ > 0.","inline":true,"padRight":true},{"text":"Also let ","element":"span"},{"style":{"height":12},"width":32.64,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-4.png","element":"img","alt":" gi","inline":true,"padRight":true},{"text":"be the coordinates of ","element":"span"},{"text":"g","element":"span"},{"text":", then","element":"span"}],[{"style":{"width":"65%"},"width":1226,"height":603,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-5.png","element":"img"}],[{"text":"where we used Jensen’s inequality in the second last line. This estimate is in the order of ","element":"span"},{"style":{"height":17.6},"width":161.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-6.png","element":"img","alt":" τ = ∥B∥","inline":true,"padRight":true},{"text":"when ","element":"span"},{"style":{"height":17.6},"width":192.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-7.png","element":"img","alt":" τ > C√m","inline":true,"padRight":true},{"text":"with some constant ","element":"span"},{"text":"C ","element":"span"},{"text":"large enough.","element":"span"}],[{"style":{"width":"96%"},"width":1804,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-8.png","element":"img"}],[{"text":"isotropic and sub-Gaussian conditions of ","element":"span"},{"text":"A ","element":"span"},{"text":"guarantee that ","element":"span"},{"text":"K ","element":"span"},{"text":"is bounded below from 1. To see this, let ","element":"span"},{"style":{"height":17.94},"width":708.08,"height":44.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-9.png","element":"img","alt":" X := Ax for some x ∈ Sn−1, then X","inline":true,"padRight":true},{"text":"has independent coordinates ","element":"span"},{"style":{"height":14.69},"width":48,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-10.png","element":"img","alt":" Xi","inline":true,"padRight":true},{"text":"satisfying ","element":"span"},{"style":{"height":19.73},"width":258.72,"height":49.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-11.png","element":"img","alt":" EX2i = 1 and","inline":true}],[{"style":{"width":"77%"},"width":1445,"height":212,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-12.png","element":"img"}],[{"text":"we can conclude that ","element":"span"},{"style":{"height":14.69},"width":152.36,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-13.png","element":"img","alt":" K ≥ K0","inline":true,"padRight":true},{"text":"and the equality is achieved when ","element":"span"},{"style":{"height":14.69},"width":207.36,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-14.png","element":"img","alt":" Xi = 1 a.s.","inline":true}],[{"text":"The proof for Theorem ","element":"span"},{"href":"#id-14","text":"1.1 ","element":"a"},{"text":"follows an analogous approach in Liaw et al. ","element":"span"},{"href":"#id-10","referenceIndex":20,"text":"[20]","element":"a"},{"text":". ","element":"span"},{"text":"One major difference is that we prove and apply two new concentration inequalities with improved parametric dependency in the sub-Gaussian regime. We believe these inequalities are interesting on their own as an application-oriented concentration inequality.","element":"span"}],[{"text":"The first one is a new Bernstein type inequality under bounded first absolute moment condition. ","element":"span"},{"id":"id-17","text":"This inequality provides a concentration bound for sum of sub-exponential random variables.","element":"span"}],[{"text":"Theorem 1.3 ","element":"span"},{"text":"(New Bernstein’s Inequality)","element":"span"},{"style":{"height":17.6},"width":427.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-15.png","element":"img","alt":". Let a = (a1, . . . , am)","inline":true,"padRight":true},{"text":"be a fixed non-zero vector and let ","element":"span"},{"style":{"height":15.2},"width":196.56,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-16.png","element":"img","alt":" Y1, . . . , Ym","inline":true,"padRight":true},{"text":"be independent, mean zero sub-exponential random variables satisfying ","element":"span"},{"style":{"height":17.6},"width":257.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-17.png","element":"img","alt":" E|Yi| ≤ 2 and","inline":true},{"style":{"height":21.26},"width":584.01,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-18.png","element":"img","alt":"∥Yi∥ψ1 ≤ K2i �assume Ki ≥ 65�","inline":true},{"text":". Then for every ","element":"span"},{"style":{"height":14.8},"width":261.48,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-19.png","element":"img","alt":" t ≥ 0 we have","inline":true}],[{"style":{"width":"86%"},"width":1625,"height":218,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-20.png","element":"img"}],[{"text":"Remark 1.4. ","element":"span"},{"text":"Theorem ","element":"span"},{"href":"#id-17","text":"1.3 ","element":"a"},{"text":"remains true (with a different absolute constant ","element":"span"},{"text":"c","element":"span"},{"text":") when the ","element":"span"},{"text":"2 ","element":"span"},{"text":"in ","element":"span"},{"style":{"height":17.6},"width":172.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/4-21.png","element":"img","alt":"E|Yi| ≤ 2","inline":true,"padRight":true},{"text":"is replaced with an arbitrary positive constant (see Remark ","element":"span"},{"href":"#id-18","text":"2.2 ","element":"a"},{"text":"for more detail).","element":"span"}],[{"text":"The second one is a new Hanson-Wright inequality under unit variance condition. This inequality provides a concentration bound for quadratic forms of independent random variables and is more general than the aforementioned Bernstein’s inequality. In the literature, results of similar flavor have been obtained ","element":"span"},{"href":"#id-19","referenceIndex":3,"text":"[3,","element":"a"},{"href":"#id-13","referenceIndex":15,"text":"16,","element":"a"},{"href":"#id-7","referenceIndex":33,"text":"33] ","element":"a"},{"text":"but under different assumptions. We will give a brief comparison between our result and a few notable ones in Section ","element":"span"},{"text":"3.","element":"span"}],[{"id":"id-20","text":"Theorem 1.5 ","element":"span"},{"text":"(New Hanson-Wright Inequality)","element":"span"},{"style":{"height":13.94},"width":295.08,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-0.png","element":"img","alt":". Let A ∈ Rn×n ","inline":true,"padRight":true},{"text":"be a fixed non-zero matrix and let ","element":"span"},{"style":{"height":17.6},"width":448.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-1.png","element":"img","alt":"X = (X1, . . . , Xn) ∈ Rn ","inline":true,"padRight":true},{"text":"be a random vector with independent, mean zero, sub-Gaussian coordinates satisfying ","element":"span"},{"style":{"height":20.02},"width":500.32,"height":50.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-2.png","element":"img","alt":" EX2i = 1 and ∥Xi∥ψ2 ≤ K","inline":true},{"text":". Then for every ","element":"span"},{"style":{"height":14.8},"width":261.48,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-3.png","element":"img","alt":" t ≥ 0 we have","inline":true}],[{"style":{"width":"90%"},"width":1691,"height":183,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-4.png","element":"img"}],[{"text":"Remark 1.6. ","element":"span"},{"text":"If ","element":"span"},{"text":"A ","element":"span"},{"text":"is a diagonal matrix, then Theorem ","element":"span"},{"href":"#id-20","text":"1.5 ","element":"a"},{"text":"recovers Theorem ","element":"span"},{"href":"#id-17","text":"1.3 ","element":"a"},{"text":"(assuming all ","element":"span"},{"style":{"height":14.69},"width":48.96,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-5.png","element":"img","alt":"Ki","inline":true,"padRight":true},{"text":"are equal) with ","element":"span"},{"style":{"height":19.54},"width":307.4,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-6.png","element":"img","alt":" Yi = X2i − EX2i ","inline":true,"padRight":true},{"text":". Therefore this can be viewed as a generalization of the new ","element":"span"},{"text":"Bernstein’s inequality given in Theorem ","element":"span"},{"href":"#id-17","text":"1.3.","element":"a"}],[{"style":{"width":"12%"},"width":228,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-7.png","element":"img"}],[{"text":"We use ","element":"span"},{"style":{"height":17.6},"width":95.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-8.png","element":"img","alt":" ∥ · ∥2","inline":true,"padRight":true},{"text":"for Euclidean norm of vectors, ","element":"span"},{"style":{"height":17.6},"width":289.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-9.png","element":"img","alt":" ∥ · ∥F and ∥ · ∥","inline":true,"padRight":true},{"text":"for Frobenius and operator norm of matrices respectively. We use ","element":"span"},{"style":{"height":8},"width":22,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-10.png","element":"img","alt":" ◦","inline":true,"padRight":true},{"text":"for Hadamard (entrywise) product. We say ","element":"span"},{"style":{"height":16.8},"width":388.04,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-11.png","element":"img","alt":" f ≲ g if f ≤ Cg for","inline":true,"padRight":true},{"text":"some absolute constant ","element":"span"},{"style":{"height":16.8},"width":507.72,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-12.png","element":"img","alt":" C and say f ≳ g if f ≥ Cg","inline":true,"padRight":true},{"text":"for some absolute constant ","element":"span"},{"text":"C","element":"span"},{"text":". Typically, ","element":"span"},{"text":"c ","element":"span"},{"text":"and ","element":"span"},{"text":"C ","element":"span"},{"text":"denote absolute constants (often ","element":"span"},{"text":"c ","element":"span"},{"text":"for small ones and ","element":"span"},{"text":"C ","element":"span"},{"text":"for large ones) which may vary from line to line.","element":"span"}],[{"style":{"width":"16%"},"width":300,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-13.png","element":"img"}],[{"text":"The rest of this paper is organized as follows: In Section ","element":"span"},{"text":"2, ","element":"span"},{"text":"we discuss and prove the new Bernstein’s inequality (Theorem ","element":"span"},{"href":"#id-17","text":"1.3)","element":"a"},{"text":". ","element":"span"},{"text":"In Section ","element":"span"},{"text":"3, ","element":"span"},{"text":"we first discuss and compare the new Hanson-Wright inequality (Theorem ","element":"span"},{"href":"#id-20","text":"1.5) ","element":"a"},{"text":"to other known variants of Hanson-Wright inequalities and then prove Theorem ","element":"span"},{"href":"#id-20","text":"1.5. ","element":"a"},{"text":"In Section ","element":"span"},{"text":"4, ","element":"span"},{"text":"we prove our main theorem regarding sub-Gaussian matrices on sets (Theorem ","element":"span"},{"href":"#id-14","text":"1.1) ","element":"a"},{"text":"and show that our dependency on ","element":"span"},{"text":"K ","element":"span"},{"text":"is optimal. ","element":"span"},{"text":"In Section ","element":"span"},{"text":"5, ","element":"span"},{"text":"we demonstrate how our result can improve theoretical guarantees of some popular applications such as JohnsonLindenstrauss embedding, randomized sketches and blind demodulation. In Section ","element":"span"},{"text":"6, ","element":"span"},{"text":"we make a brief conclusion for this paper.","element":"span"}]]},{"heading":"2 New Bernstein’s Inequality","paragraphs":[[{"text":"In this section we prove the new Bernstein’s inequality Theorem ","element":"span"},{"href":"#id-17","text":"1.3. ","element":"a"},{"text":"Let us first recall the standard Bernstein’s inequality for sub-exponential random variables ","element":"span"},{"href":"#id-2","referenceIndex":31,"text":"[32, ","element":"a"},{"text":"Theorem 2.8.2], which states that for independent, mean zero, sub-exponential random variables ","element":"span"},{"style":{"height":15.2},"width":259.92,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-14.png","element":"img","alt":" Y1, Y2, . . . , Ym","inline":true,"padRight":true},{"text":"and a vector ","element":"span"},{"text":"a ","element":"span"},{"text":"= (","element":"span"},{"style":{"height":17.6},"width":505.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-15.png","element":"img","alt":"a1, . . . , am) ∈ Rm, we have","inline":true}],[{"id":"id-21","style":{"width":"79%"},"width":1482,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/5-16.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":20.02},"width":354.24,"height":50.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-0.png","element":"img","alt":" K2 = maxi ∥Yi∥ψ1.","inline":true}],[{"text":"Compared to ","element":"span"},{"href":"#id-21","text":"(2)","element":"a"},{"text":", Theorem ","element":"span"},{"href":"#id-17","text":"1.3 ","element":"a"},{"text":"has an extra assumption on the first absolute moment of ","element":"span"},{"style":{"height":14.69},"width":77.2,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-1.png","element":"img","alt":" Yi –","inline":true,"padRight":true},{"text":"namely ","element":"span"},{"style":{"height":17.6},"width":142,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-2.png","element":"img","alt":" E|Yi| ≤","inline":true,"padRight":true},{"text":"2, but it improves the dependence on ","element":"span"},{"text":"K ","element":"span"},{"text":"in the sub-Gaussian regime from ","element":"span"},{"style":{"height":15.14},"width":114.64,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-3.png","element":"img","alt":" K4 to","inline":true},{"style":{"height":18.74},"width":170.08,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-4.png","element":"img","alt":"K2 log K","inline":true},{"text":". It is worth noting that such extra assumption comes naturally when considering isotropic random matrices/vectors. In fact, let ","element":"span"},{"text":"x ","element":"span"},{"text":"be a fixed point on the unit sphere and let ","element":"span"},{"style":{"height":10.69},"width":35.04,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-5.png","element":"img","alt":" ai","inline":true,"padRight":true},{"text":"be isotropic random vectors of the same dimension, then ","element":"span"},{"style":{"height":19.14},"width":327.76,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-6.png","element":"img","alt":" Yi := | ⟨ai, x⟩ |2 −","inline":true,"padRight":true},{"text":"1 is mean zero since ","element":"span"},{"style":{"height":10.69},"width":35.04,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-7.png","element":"img","alt":" ai","inline":true,"padRight":true},{"text":"is isotropic, and ","element":"span"},{"style":{"height":19.14},"width":350.12,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-8.png","element":"img","alt":" E|Yi| ≤ E| ⟨ai, x⟩ |2 ","inline":true,"padRight":true},{"text":"+ 1 = 2 by triangle inequality.","element":"span"}],[{"style":{"width":"26%"},"width":504,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-9.png","element":"img"}],[{"text":"We will first bound the moments of ","element":"span"},{"style":{"height":14.69},"width":37.44,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-10.png","element":"img","alt":" Yi","inline":true},{"text":", then bound their moment generating functions, and finally use Chernoff method to obtain the desired tail bound.","element":"span"}],[{"style":{"width":"36%"},"width":686,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-11.png","element":"img"}],[{"text":"The idea here is to write the moment as an integral and then estimate under the two constraints ","element":"span"},{"id":"id-22","style":{"height":20.02},"width":524.16,"height":50.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-12.png","element":"img","alt":"E|Yi| ≤ 2 and ∥Yi∥ψ1 ≤ K2.","inline":true}],[{"text":"Lemma 2.1 ","element":"span"},{"text":"(Moment Bounds)","element":"span"},{"text":". ","element":"span"},{"text":"Let ","element":"span"},{"text":"Y ","element":"span"},{"text":"be a sub-exponential random variable satisfying ","element":"span"},{"style":{"height":17.6},"width":174.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-13.png","element":"img","alt":" E|Y | ≤ 2","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":21.26},"width":604.89,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-14.png","element":"img","alt":" ∥Y ∥ψ1 ≤ K2 with K ≥ 65. Then","inline":true}],[{"style":{"width":"38%"},"width":713,"height":63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-15.png","element":"img"}],[{"style":{"height":21.55},"width":997.84,"height":53.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-16.png","element":"img","alt":"Proof. Define f(t) := P(|Y | ≥ t) et/K2. Since E|Y | ≤","inline":true,"padRight":true},{"text":"2, we have","element":"span"}],[{"id":"id-24","style":{"width":"70%"},"width":1324,"height":102,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-17.png","element":"img"}],[{"text":"Also, since ","element":"span"},{"style":{"height":20.01},"width":234.92,"height":50.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-18.png","element":"img","alt":" ∥Y ∥ψ1 ≤ K2","inline":true},{"text":", a change of variable ","element":"span"},{"style":{"height":21.15},"width":283.4,"height":52.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-19.png","element":"img","alt":" s = et/K2 gives","inline":true}],[{"style":{"width":"99%"},"width":1867,"height":369,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-20.png","element":"img"}],[{"text":"For the ","element":"span"},{"text":"p","element":"span"},{"text":"-th moment of ","element":"span"},{"text":"|","element":"span"},{"text":"Y ","element":"span"},{"text":"|","element":"span"},{"text":", with a change of variable ","element":"span"},{"style":{"height":15.6},"width":299.36,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-21.png","element":"img","alt":" s = up, we have","inline":true}],[{"style":{"width":"55%"},"width":1041,"height":103,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-22.png","element":"img"}],[{"text":"We will split this integral into two parts. Set ","element":"span"},{"style":{"height":18.73},"width":556.52,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-23.png","element":"img","alt":" T = 6pK2 log K. Since pup−1 ","inline":true,"padRight":true},{"text":"monotonically increases on [0","element":"span"},{"text":", T","element":"span"},{"text":"], we have","element":"span"}],[{"style":{"width":"53%"},"width":997,"height":210,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/6-24.png","element":"img"}],[{"text":"On the other hand, since","element":"span"}],[{"style":{"width":"99%"},"width":1866,"height":562,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/7-0.png","element":"img"}],[{"text":"Combining these two parts completes the proof with ","element":"span"},{"style":{"height":14.8},"width":125.76,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/7-1.png","element":"img","alt":" C ≤ 6.","inline":true}],[{"style":{"width":"59%"},"width":1110,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/7-2.png","element":"img"}],[{"text":"Let ","element":"span"},{"text":"Y ","element":"span"},{"text":"be the random variable as in Lemma ","element":"span"},{"href":"#id-22","text":"2.1, ","element":"a"},{"text":"the moment generating function of ","element":"span"},{"text":"Y ","element":"span"},{"text":"can be estimated through Taylor series","element":"span"}],[{"style":{"width":"50%"},"width":953,"height":416,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/7-3.png","element":"img"}],[{"text":"Here the first inequality is by Lemma ","element":"span"},{"href":"#id-22","text":"2.1 ","element":"a"},{"text":"(with ","element":"span"},{"style":{"height":15.09},"width":96.4,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/7-4.png","element":"img","alt":" C1 ≤","inline":true,"padRight":true},{"text":"6) and the second inequality uses ","element":"span"},{"style":{"height":17.6},"width":221.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/7-5.png","element":"img","alt":" p! ≥ (p/e)p.","inline":true,"padRight":true},{"text":"When ","element":"span"},{"style":{"height":20.1},"width":430.88,"height":50.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/7-6.png","element":"img","alt":" |λ|K2 log K ≤ 1/(2C1e","inline":true},{"text":"), the above summation converges and we have","element":"span"}],[{"style":{"width":"99%"},"width":1868,"height":304,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/7-7.png","element":"img"}],[{"text":"for absolute constants ","element":"span"},{"style":{"height":19.14},"width":589.44,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/7-8.png","element":"img","alt":" C0 = (C1e)2 and c = 1/(2C1e).","inline":true}],[{"id":"id-23","style":{"width":"99%"},"width":1866,"height":497,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/7-9.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.09},"width":188.84,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-0.png","element":"img","alt":" c0 and C0","inline":true,"padRight":true},{"text":"are absolute constants. When we minimize the above expression over ","element":"span"},{"style":{"height":16.4},"width":178.28,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-1.png","element":"img","alt":" λ, we get","inline":true,"padRight":true},{"text":"the optimal value","element":"span"}],[{"style":{"width":"51%"},"width":960,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-2.png","element":"img"}],[{"text":"Next we plug in ","element":"span"},{"style":{"height":17.49},"width":73.96,"height":43.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-3.png","element":"img","alt":" λopt","inline":true,"padRight":true},{"text":"into ","element":"span"},{"href":"#id-23","text":"(6) ","element":"a"},{"text":"to get","element":"span"}],[{"style":{"width":"99%"},"width":1866,"height":475,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-4.png","element":"img"}],[{"text":"The bound for ","element":"span"},{"style":{"height":17.74},"width":297.16,"height":44.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-5.png","element":"img","alt":" P (� aiYi < −u","inline":true},{"text":") is similarly obtained by considering ","element":"span"},{"style":{"height":14.69},"width":71.52,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-6.png","element":"img","alt":" −Yi","inline":true,"padRight":true},{"text":"instead of ","element":"span"},{"style":{"height":14.69},"width":37.44,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-7.png","element":"img","alt":" Yi","inline":true},{"text":". This com- ","element":"span"},{"id":"id-18","text":"pletes the proof.","element":"span"}],[{"text":"Remark 2.2. ","element":"span"},{"text":"If the random variables ","element":"span"},{"style":{"height":14.69},"width":37.44,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-8.png","element":"img","alt":" Yi","inline":true,"padRight":true},{"text":"have first absolute moment ","element":"span"},{"style":{"height":17.6},"width":178.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-9.png","element":"img","alt":" E|Yi| ≤ α","inline":true},{"text":", then the right hand side of Equation ","element":"span"},{"href":"#id-24","text":"(3) ","element":"a"},{"text":"becomes ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-10.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"and it is easy to see that Lemma ","element":"span"},{"href":"#id-22","text":"2.1 ","element":"a"},{"text":"still holds with ","element":"span"},{"style":{"height":14.8},"width":262.76,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-11.png","element":"img","alt":" C ≤ 6 + α. It","inline":true,"padRight":true},{"text":"follows that the ","element":"span"},{"style":{"height":15.09},"width":48.2,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-12.png","element":"img","alt":" C1","inline":true,"padRight":true},{"text":"in Step 2 will be no more than ","element":"span"},{"text":"6 + ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-13.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"and Theorem ","element":"span"},{"href":"#id-17","text":"1.3 ","element":"a"},{"text":"now holds with constant ","element":"span"},{"style":{"height":25.04},"width":441.16,"height":62.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-14.png","element":"img","alt":"c = 14(C1e)2 ≥ 14e2(6+α)2 .","inline":true}]]},{"heading":"3 New Hanson-Wright Inequality","paragraphs":[[{"text":"Hanson-Wright inequality gives a concentration bound for quadratic forms of random variables. The version in ","element":"span"},{"href":"#id-25","referenceIndex":26,"text":"[27] ","element":"a"},{"text":"states that for a random vector ","element":"span"},{"style":{"height":17.6},"width":476.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-15.png","element":"img","alt":" X = (X1, . . . , Xn) ∈ Rn ","inline":true,"padRight":true},{"text":"with independent, mean zero, sub-Gaussian coordinates, suppose max","element":"span"},{"style":{"height":18.48},"width":668.72,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-16.png","element":"img","alt":"i ∥Xi∥ψ2 ≤ K and let A be an n×n","inline":true,"padRight":true},{"text":"real matrix, then","element":"span"}],[{"id":"id-27","style":{"width":"84%"},"width":1582,"height":102,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-17.png","element":"img"}],[{"text":"In the same spirit as the new Bernstein’s inequality, we can improve the tail dependency on ","element":"span"},{"text":"K ","element":"span"},{"text":"in the sub-Gaussian regime from ","element":"span"},{"style":{"height":18.74},"width":293.44,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-18.png","element":"img","alt":" K4 to K2 log K","inline":true,"padRight":true},{"text":"under a further assumption ","element":"span"},{"style":{"height":19.73},"width":85.64,"height":49.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-19.png","element":"img","alt":" EX2i ","inline":true,"padRight":true},{"text":"= 1 for each ","element":"span"},{"style":{"height":14.69},"width":61.92,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-20.png","element":"img","alt":" Xi.","inline":true,"padRight":true},{"text":"This is the new Hanson-Wright inequality Theorem ","element":"span"},{"href":"#id-20","text":"1.5.","element":"a"}],[{"text":"It is not difficult to drop the requirement ","element":"span"},{"style":{"height":19.53},"width":85.64,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-21.png","element":"img","alt":" EX2i ","inline":true,"padRight":true},{"text":"= 1 in Theorem ","element":"span"},{"href":"#id-20","text":"1.5 ","element":"a"},{"text":"by a simple scaling, in which ","element":"span"},{"id":"id-26","text":"case we have the following corollary.","element":"span"}],[{"style":{"height":17.6},"width":886.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-22.png","element":"img","alt":"Corollary 3.1. Let X = (X1, . . . , Xn) ∈ Rn ","inline":true,"padRight":true},{"text":"be a random vector with independent, mean zero, sub-Gaussian coordinates satisfying ","element":"span"},{"text":"0 ","element":"span"},{"style":{"height":18.48},"width":278.08,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-23.png","element":"img","alt":" < ∥Xi∥ψ2 ≤ K","inline":true},{"text":", then for fixed square matrix ","element":"span"},{"text":"A","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"89%"},"width":1679,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-24.png","element":"img"}],[{"style":{"height":27.36},"width":1177.48,"height":68.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/8-25.png","element":"img","alt":"where α1 = mini�EX2i� 12 , α2 = maxi�EX2i� 12 and γ = α2/α1.","inline":true}],[{"style":{"height":23.3},"width":731.6,"height":58.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-0.png","element":"img","alt":"Proof. Let βi := (EX2i )12 for 1 ≤ i ≤ n","inline":true,"padRight":true},{"text":"and define diagonal matrices","element":"span"}],[{"style":{"width":"65%"},"width":1223,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-1.png","element":"img"}],[{"text":"Then ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":18.66},"width":244.4,"height":46.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-2.png","element":"img","alt":"X := D1/βX","inline":true,"padRight":true},{"text":"satisfies the assumption of Theorem ","element":"span"},{"href":"#id-20","text":"1.5 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":21.3},"width":615.76,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-3.png","element":"img","alt":" E ˜X2i = 1 and ∥ ˜Xi∥ψ2 ≤ K/βi ≤","inline":true},{"style":{"height":17.6},"width":103.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-4.png","element":"img","alt":"K/α1","inline":true},{"text":". Applying Theorem ","element":"span"},{"href":"#id-20","text":"1.5 ","element":"a"},{"text":"to ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":21.3},"width":389.6,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-5.png","element":"img","alt":"X and ˜A := DβADβ","inline":true,"padRight":true},{"text":"completes the proof.","element":"span"}],[{"style":{"width":"65%"},"width":1220,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-6.png","element":"img"}],[{"text":"Let us first compare Corollary ","element":"span"},{"href":"#id-26","text":"3.1 ","element":"a"},{"text":"to the standard Hanson-Wright inequality ","element":"span"},{"href":"#id-27","text":"(8) ","element":"a"},{"text":"in the case when ","element":"span"},{"style":{"height":11.6},"width":24,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-7.png","element":"img","alt":"γ","inline":true,"padRight":true},{"text":"= 1. The concentration bound in ","element":"span"},{"href":"#id-27","text":"(8) ","element":"a"},{"text":"implies that, with probability at least 1 ","element":"span"},{"style":{"height":17.53},"width":138.24,"height":43.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-8.png","element":"img","alt":" − 2e−t,","inline":true}],[{"id":"id-33","style":{"width":"72%"},"width":1364,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-9.png","element":"img"}],[{"text":"Meanwhile, Corollary ","element":"span"},{"href":"#id-26","text":"3.1 ","element":"a"},{"text":"implies that","element":"span"}],[{"id":"id-32","style":{"width":"83%"},"width":1565,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-10.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":22.9},"width":218.52,"height":57.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-11.png","element":"img","alt":" α = (EXi)12","inline":true,"padRight":true},{"text":". Note that ","element":"span"},{"style":{"height":18.48},"width":318.4,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-12.png","element":"img","alt":" α ≤ ∥Xi∥ψ2 ≤ K","inline":true},{"text":", so this bound improves the parameter dependence (up to a log factor) in the sub-Gaussian regime from ","element":"span"},{"style":{"height":15.14},"width":185.44,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-13.png","element":"img","alt":" K2 to αK","inline":true},{"text":". Such improvement can be significant when ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-14.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"is far less than ","element":"span"},{"text":"K","element":"span"},{"text":".","element":"span"}],[{"text":"Other variants of Hanson-Wright inequality have appeared in literature with similar improvements ","element":"span"},{"href":"#id-19","referenceIndex":3,"text":"[3,","element":"a"},{"href":"#id-7","referenceIndex":33,"text":"33]","element":"a"},{"text":". In particular, one of the results by Adamczak ","element":"span"},{"href":"#id-19","referenceIndex":3,"text":"[3] ","element":"a"},{"text":"works under the assumption that ","element":"span"},{"text":"X ","element":"span"},{"text":"satisfies the convex concentration property with constant ","element":"span"},{"text":"˜","element":"span"},{"text":"K","element":"span"},{"text":", that is, for every 1-Lipschitz convex function ","element":"span"},{"style":{"height":16},"width":218.72,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-15.png","element":"img","alt":" ϕ : Rn → R","inline":true},{"text":", we always have ","element":"span"},{"style":{"height":17.6},"width":341.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-16.png","element":"img","alt":" E|ϕ(X)| < ∞ and","inline":true}],[{"style":{"width":"56%"},"width":1063,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-17.png","element":"img"}],[{"text":"Then under such assumption,","element":"span"}],[{"id":"id-31","style":{"width":"78%"},"width":1466,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-18.png","element":"img"}],[{"text":"where Cov(","element":"span"},{"text":"X","element":"span"},{"text":") is the covariance matrix of ","element":"span"},{"text":"X","element":"span"},{"text":". When ","element":"span"},{"text":"X ","element":"span"},{"text":"has independent and mean zero coordinates, ","element":"span"},{"style":{"height":19.54},"width":472.8,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-19.png","element":"img","alt":"∥Cov(X)∥ = maxi EX2i .","inline":true,"padRight":true},{"text":"However, the convex concentration property is not the same as subGaussianity. More precisely, while it is true that ","element":"span"},{"text":"˜","element":"span"},{"text":"K ","element":"span"},{"text":"is independent of dimension when ","element":"span"},{"text":"X ","element":"span"},{"text":"has i.i.d. coordinates which are bounded almost surely ","element":"span"},{"href":"#id-28","referenceIndex":29,"text":"[29]","element":"a"},{"text":", this can fail when the boundedness assumption of ","element":"span"},{"style":{"height":14.69},"width":48,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-20.png","element":"img","alt":" Xi","inline":true,"padRight":true},{"text":"is replaced by sub-Gaussianity (i.e. ","element":"span"},{"text":"˜","element":"span"},{"text":"K ","element":"span"},{"text":"could depend on the dimension of ","element":"span"},{"style":{"height":15.09},"width":212.16,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-21.png","element":"img","alt":" X when Xi","inline":true,"padRight":true},{"text":"are i.i.d. and sub-Gaussian) ","element":"span"},{"href":"#id-29","referenceIndex":1,"text":"[2,","element":"a"},{"href":"#id-30","referenceIndex":12,"text":"13]","element":"a"},{"text":". Therefore the bound in ","element":"span"},{"href":"#id-31","text":"(11) ","element":"a"},{"text":"does not imply ","element":"span"},{"href":"#id-32","text":"(10) ","element":"a"},{"text":"nor ","element":"span"},{"href":"#id-33","text":"(9) ","element":"a"},{"text":"in general.","element":"span"}],[{"text":"In a more recent paper by Klochkov and Zhivotovskiy ","element":"span"},{"href":"#id-13","referenceIndex":15,"text":"[16]","element":"a"},{"text":", the authors proved a uniform version of the Hanson-Wright inequality which, when applying to a single matrix under the same assumption as Corollary ","element":"span"},{"href":"#id-26","text":"3.1, ","element":"a"},{"text":"yields the following bound:","element":"span"}],[{"id":"id-34","style":{"width":"73%"},"width":1376,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-22.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":18.48},"width":375.96,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-23.png","element":"img","alt":" M = ∥ maxi |Xi|∥ψ2","inline":true},{"text":". This bound also improves ","element":"span"},{"href":"#id-33","text":"(9) ","element":"a"},{"text":"in some cases as demonstrated in ","element":"span"},{"href":"#id-13","referenceIndex":15,"text":"[16]","element":"a"},{"text":". We shall compare this bound to ","element":"span"},{"href":"#id-32","text":"(10) ","element":"a"},{"text":"in the sub-Gaussian regime. On one hand, Jensen’s inequality tells us that the ","element":"span"},{"style":{"height":17.6},"width":161.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-24.png","element":"img","alt":" E∥AX∥2","inline":true,"padRight":true},{"text":"factor in ","element":"span"},{"href":"#id-34","text":"(12) ","element":"a"},{"text":"is bounded by the ","element":"span"},{"style":{"height":17.6},"width":129.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-25.png","element":"img","alt":" α∥A∥F","inline":true,"padRight":true},{"text":"factor in ","element":"span"},{"href":"#id-32","text":"(10)","element":"a"},{"text":". On the other, the factor ","element":"span"},{"text":"M ","element":"span"},{"text":"in ","element":"span"},{"href":"#id-34","text":"(12) ","element":"a"},{"text":"is only bounded by ","element":"span"},{"style":{"height":18.05},"width":271.28,"height":45.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/9-26.png","element":"img","alt":" M ≲ K√log n","inline":true},{"text":", which could depend on dimension ","element":"span"},{"text":"n","element":"span"},{"text":". Moreover, ","element":"span"},{"href":"#id-34","text":"(12) ","element":"a"},{"text":"only provides a one-sided bound instead of two-sided concentration bounds like Equations ","element":"span"},{"href":"#id-33","text":"(9) ","element":"a"},{"text":"to ","element":"span"},{"href":"#id-31","text":"(11)","element":"a"},{"text":".","element":"span"}],[{"style":{"width":"26%"},"width":503,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-0.png","element":"img"}],[{"text":"The main idea of proof is similar to ","element":"span"},{"href":"#id-25","referenceIndex":26,"text":"[27]","element":"a"},{"text":", that is to divide the sum into diagonal and off-diagonal, then bound the moment generating function of the latter through a decoupling and comparison argument. However, there are two significant differences. The first difference is the random variables used for comparison. We will use scaled Bernoulli multiplied by standard Gaussian in order to preserve the condition of second moment being 1. Using such random variables also leads to challenges in bounding the moment generating function, which is the second difference.","element":"span"}],[{"text":"Now we proceed with the proof. For any ","element":"span"},{"text":"t > ","element":"span"},{"text":"0, let","element":"span"}],[{"style":{"width":"33%"},"width":619,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-1.png","element":"img"}],[{"text":"be the the tail probability we want to bound. Let ","element":"span"},{"style":{"height":17.6},"width":264.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-2.png","element":"img","alt":" A1 := diag(A","inline":true},{"text":") be the diagonal of ","element":"span"},{"text":"A ","element":"span"},{"text":"and let","element":"span"}],[{"style":{"width":"91%"},"width":1716,"height":196,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-3.png","element":"img"}],[{"text":"We will seek bounds for ","element":"span"},{"style":{"height":16},"width":192.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-4.png","element":"img","alt":" p1 and p2.","inline":true}],[{"style":{"width":"30%"},"width":565,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-5.png","element":"img"}],[{"text":"The bound for ","element":"span"},{"style":{"height":11.6},"width":39.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-6.png","element":"img","alt":" p1","inline":true,"padRight":true},{"text":"is given by our new Bernstein’s inequality. Notice that","element":"span"}],[{"style":{"width":"40%"},"width":750,"height":62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-7.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":20.01},"width":1020.68,"height":50.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-8.png","element":"img","alt":" E|X2i − 1| ≤ 2 and ∥X2i − 1∥ψ1 ≤ C∥X2i ∥ψ1 ≤ CK2","inline":true},{"text":". So by Theorem ","element":"span"},{"href":"#id-17","text":"1.3 ","element":"a"},{"text":"and the simple ","element":"span"},{"text":"relationships between the norms of ","element":"span"},{"style":{"height":16},"width":360.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-9.png","element":"img","alt":" A1 and A, we have","inline":true}],[{"style":{"width":"77%"},"width":1446,"height":234,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-10.png","element":"img"}],[{"text":"To bound ","element":"span"},{"style":{"height":11.6},"width":39.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-11.png","element":"img","alt":" p2","inline":true},{"text":", we will derive a bound for the moment generating function of ","element":"span"},{"style":{"height":17.82},"width":328.48,"height":44.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-12.png","element":"img","alt":" XT A2X. Let X′","inline":true}],[{"text":"be an independent copy of ","element":"span"},{"text":"X","element":"span"},{"text":", then","element":"span"}],[{"style":{"width":"44%"},"width":825,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-13.png","element":"img"}],[{"id":"id-35","text":"The above follows directly from the following decoupling lemma.","element":"span"}],[{"text":"Lemma 3.2 ","element":"span"},{"text":"(Decoupling ","element":"span"},{"href":"#id-2","referenceIndex":31,"text":"[32]","element":"a"},{"text":")","element":"span"},{"style":{"height":18.29},"width":591.44,"height":45.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-14.png","element":"img","alt":". Let A = (aij) be a fixed n × n","inline":true,"padRight":true},{"text":"matrix, and let ","element":"span"},{"style":{"height":17.6},"width":383.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-15.png","element":"img","alt":" X = (X1, . . . , Xn) ∈","inline":true},{"style":{"height":12},"width":52.68,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-16.png","element":"img","alt":"Rn ","inline":true,"padRight":true},{"text":"be a random vector with independent mean zero coordinates. Then for every convex function","element":"span"}],[{"style":{"width":"69%"},"width":1298,"height":282,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/10-17.png","element":"img"}],[{"text":"See Theorem 6.1.1 and Remark 6.1.3 in ","element":"span"},{"href":"#id-2","referenceIndex":31,"text":"[32] ","element":"a"},{"text":"for a proof of Lemma ","element":"span"},{"href":"#id-35","text":"3.2.","element":"a"}],[{"style":{"width":"23%"},"width":436,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-0.png","element":"img"}],[{"text":"We will compare ","element":"span"},{"style":{"height":17.6},"width":216.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-1.png","element":"img","alt":" X (and X′","inline":true},{"text":") to scaled Bernoulli multiplied entrywise by standard Gaussian. ","element":"span"},{"id":"id-36","text":"But first let us look at the case of a single variable through the following lemma.","element":"span"}],[{"style":{"width":"86%"},"width":1627,"height":258,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-2.png","element":"img"}],[{"text":"Proof. ","element":"span"},{"text":"Using the inequality ","element":"span"},{"style":{"height":17.6},"width":333.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-3.png","element":"img","alt":" ex ≤ x + cosh(2x","inline":true},{"text":"), which is true for all ","element":"span"},{"style":{"height":12.8},"width":116.48,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-4.png","element":"img","alt":" x ∈ R","inline":true,"padRight":true},{"text":"(see Appendix ","element":"span"},{"text":"C)","element":"span"},{"text":", we have","element":"span"}],[{"style":{"width":"58%"},"width":1099,"height":112,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-5.png","element":"img"}],[{"text":"By Lemma ","element":"span"},{"href":"#id-22","text":"2.1 ","element":"a"},{"text":"we know that ","element":"span"},{"style":{"height":19.73},"width":371.24,"height":49.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-6.png","element":"img","alt":" EZ2q ≤ Cq0qqL2q−2","inline":true,"padRight":true},{"text":"for any positive integer ","element":"span"},{"text":"q ","element":"span"},{"text":"and some absolute ","element":"span"},{"text":"constant ","element":"span"},{"style":{"height":15.6},"width":184.16,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-7.png","element":"img","alt":" C0, hence","inline":true}],[{"style":{"width":"63%"},"width":1195,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-8.png","element":"img"}],[{"text":"On the other hand, a direct calculation gives","element":"span"}],[{"style":{"width":"89%"},"width":1674,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-9.png","element":"img"}],[{"text":"Choosing any ","element":"span"},{"style":{"height":17.42},"width":423.08,"height":43.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-10.png","element":"img","alt":" C such that C2 ≥ 8C0","inline":true,"padRight":true},{"text":"completes the proof.","element":"span"}],[{"text":"Now let ","element":"span"},{"style":{"height":16},"width":179.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-11.png","element":"img","alt":" g, r ∈ Rn ","inline":true,"padRight":true},{"text":"be random vectors such that ","element":"span"},{"style":{"height":17.6},"width":499.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-12.png","element":"img","alt":" g ∼ Normal(0, In) and r","inline":true,"padRight":true},{"text":"has entries ","element":"span"},{"style":{"height":23.58},"width":117.36,"height":58.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-13.png","element":"img","alt":" r2i i.i.d∼","inline":true},{"style":{"height":19.14},"width":1205.44,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-14.png","element":"img","alt":"L2 · Bernoulli(L−2) where L2 = K2 log K. Also let g′ and r′ ","inline":true,"padRight":true},{"text":"be independent copies of ","element":"span"},{"text":"g ","element":"span"},{"text":"and ","element":"span"},{"text":"r","element":"span"},{"text":". Let ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-15.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"be any vector in ","element":"span"},{"style":{"height":12},"width":52.68,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-16.png","element":"img","alt":" Rn","inline":true},{"text":", by Lemma ","element":"span"},{"href":"#id-36","text":"3.3 ","element":"a"},{"text":"and independence we have","element":"span"}],[{"id":"id-37","style":{"width":"95%"},"width":1779,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-17.png","element":"img"}],[{"text":"Note the above also holds for ","element":"span"},{"style":{"height":19.54},"width":272.8,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-18.png","element":"img","alt":" EX′ exp(αT X′","inline":true},{"text":"). Therefore","element":"span"}],[{"style":{"width":"61%"},"width":1146,"height":266,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-19.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":629.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-20.png","element":"img","alt":" R := diag(r) and R′ := diag(r′","inline":true},{"text":"). Here the two inequalities are repeated applications of Equation ","element":"span"},{"href":"#id-37","text":"(13)","element":"a"},{"text":".","element":"span"}],[{"style":{"width":"57%"},"width":1067,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/11-21.png","element":"img"}],[{"text":"Denote ","element":"span"},{"style":{"height":17.6},"width":271.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-0.png","element":"img","alt":" σi = σi(RAR′","inline":true},{"text":") the singular values of matrix ","element":"span"},{"style":{"height":13.2},"width":115.84,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-1.png","element":"img","alt":" RAR′","inline":true},{"text":". From the rotation invariance of ","element":"span"},{"text":"g ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":16.4},"width":198.56,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-2.png","element":"img","alt":" g′ we have","inline":true}],[{"style":{"width":"48%"},"width":900,"height":123,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-3.png","element":"img"}],[{"text":"For standard normal random variables ","element":"span"},{"style":{"height":17.2},"width":180,"height":43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-4.png","element":"img","alt":" gi and g′i,","inline":true}],[{"style":{"width":"78%"},"width":1473,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-5.png","element":"img"}],[{"text":"where the inequality uses (1 ","element":"span"},{"style":{"height":24.45},"width":519.56,"height":61.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-6.png","element":"img","alt":" − x)− 12 ≤ ex when x ∈ [0, 12","inline":true},{"text":") (see Appendix ","element":"span"},{"text":"C)","element":"span"},{"text":".","element":"span"}],[{"text":"Also note that ","element":"span"},{"style":{"height":39.81},"width":997.28,"height":99.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-7.png","element":"img","alt":" σi ≤ ∥RAR′∥ ≤ L2∥A∥, so if λ2 < 12L4∥A∥2 we have","inline":true}],[{"style":{"width":"66%"},"width":1247,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-8.png","element":"img"}],[{"text":"Next, use the following Lemma ","element":"span"},{"href":"#id-38","text":"3.4 ","element":"a"},{"text":"(with ","element":"span"},{"style":{"height":18.74},"width":425,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-9.png","element":"img","alt":" η = λ2L4 and p = L−2","inline":true},{"text":") to bound the moment generating function of ","element":"span"},{"style":{"height":19.92},"width":180.08,"height":49.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-10.png","element":"img","alt":" ∥RAR′∥2F ","inline":true,"padRight":true},{"text":"and we obtain","element":"span"}],[{"style":{"width":"62%"},"width":1173,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-11.png","element":"img"}],[{"id":"id-38","text":"Lemma 3.4. ","element":"span"},{"text":"Let ","element":"span"},{"text":"D ","element":"span"},{"text":"be a diagonal random matrix with i.i.d. entries ","element":"span"},{"style":{"height":17.6},"width":571.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-12.png","element":"img","alt":" Dii = di ∼ Bernoulli(p), and","inline":true,"padRight":true},{"text":"let ","element":"span"},{"style":{"height":12},"width":53.44,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-13.png","element":"img","alt":" D′ ","inline":true,"padRight":true},{"text":"be an independent copy of ","element":"span"},{"text":"D","element":"span"},{"text":". Given a fixed matrix ","element":"span"},{"text":"A","element":"span"},{"text":", then","element":"span"}],[{"style":{"width":"59%"},"width":1107,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-14.png","element":"img"}],[{"style":{"height":16.8},"width":674.28,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-15.png","element":"img","alt":"Proof. Denote Ai the i-th row of A","inline":true},{"text":". Notice that","element":"span"}],[{"style":{"width":"35%"},"width":668,"height":95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-16.png","element":"img"}],[{"text":"so for ","element":"span"},{"style":{"height":32.4},"width":427.04,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-17.png","element":"img","alt":" η ∈�0, 1∥A∥2�, we have","inline":true}],[{"style":{"width":"43%"},"width":818,"height":409,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-18.png","element":"img"}],[{"text":"Here the second last inequality uses ","element":"span"},{"style":{"height":19.34},"width":952.8,"height":48.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-19.png","element":"img","alt":" η∥Ai∥22 ≤ η∥A∥2 ≤ 1 and ex ≤ 1 + 2x when x ∈ [0,","inline":true,"padRight":true},{"text":"1]. The last ","element":"span"},{"text":"inequality uses 1 + ","element":"span"},{"style":{"height":14.34},"width":136.8,"height":35.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-20.png","element":"img","alt":" x ≤ ex.","inline":true}],[{"style":{"width":"27%"},"width":518,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/12-21.png","element":"img"}],[{"text":"From previous steps we get","element":"span"}],[{"style":{"width":"82%"},"width":1541,"height":472,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-0.png","element":"img"}],[{"text":"Optimizing this over ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-1.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"(similar to proof of Theorem ","element":"span"},{"href":"#id-17","text":"1.3) ","element":"a"},{"text":"yields a one sided bound for ","element":"span"},{"style":{"height":11.6},"width":39.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-2.png","element":"img","alt":" p2","inline":true},{"text":". The other side can then be obtained by considering ","element":"span"},{"style":{"height":17.6},"width":269.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-3.png","element":"img","alt":" −A2 (and −A","inline":true},{"text":") instead of ","element":"span"},{"style":{"height":15.49},"width":49.64,"height":38.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-4.png","element":"img","alt":" A2","inline":true},{"text":". Together they give","element":"span"}],[{"style":{"width":"77%"},"width":1446,"height":226,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-5.png","element":"img"}],[{"text":"Lastly, since ","element":"span"},{"style":{"height":17.6},"width":369.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-6.png","element":"img","alt":" p ≤ min{1, p1 + p2}","inline":true},{"text":", combining the bounds for ","element":"span"},{"style":{"height":11.6},"width":99.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-7.png","element":"img","alt":" p1, p2","inline":true,"padRight":true},{"text":"and then applying inequality min","element":"span"},{"style":{"height":20.34},"width":353.48,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-8.png","element":"img","alt":"{1, 4e−x} ≤ 2e−x/2 ","inline":true,"padRight":true},{"text":"(see Appendix ","element":"span"},{"text":"C) ","element":"span"},{"text":"complete the proof of Theorem ","element":"span"},{"href":"#id-20","text":"1.5.","element":"a"}]]},{"heading":"4 Sub-Gaussian Matrices on Sets","paragraphs":[[{"text":"In this section we prove Theorem ","element":"span"},{"href":"#id-14","text":"1.1 ","element":"a"},{"text":"and show that the ","element":"span"},{"style":{"height":17.86},"width":180.16,"height":44.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-9.png","element":"img","alt":" K√log K","inline":true,"padRight":true},{"text":"dependence on ","element":"span"},{"text":"K ","element":"span"},{"text":"is optimal. Section ","element":"span"},{"href":"#id-39","text":"4.1 ","element":"a"},{"text":"studies the simple case when ","element":"span"},{"text":"T ","element":"span"},{"text":"consists of only a single point. Section ","element":"span"},{"href":"#id-40","text":"4.2 ","element":"a"},{"text":"establishes the technical sub-Gaussian increments lemmas and Section ","element":"span"},{"href":"#id-41","text":"4.3 ","element":"a"},{"text":"proves Theorem ","element":"span"},{"href":"#id-14","text":"1.1 ","element":"a"},{"text":"through these lemmas and Talagrand’s Majorizing Measure Theorem. Section ","element":"span"},{"href":"#id-16","text":"4.4 ","element":"a"},{"text":"provides an example through scaled Bernoulli random variables that shows ","element":"span"},{"style":{"height":18.05},"width":340.8,"height":45.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-10.png","element":"img","alt":" K√log K is tight.","inline":true}],[{"id":"id-39","text":"4.1 ","element":"span"},{"text":"Concentration of Random Vectors","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":15.54},"width":571.4,"height":38.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-11.png","element":"img","alt":" X := Ax ∈ Rm with x ∈ Sn−1","inline":true},{"text":". The isotropic and sub-Gaussian assumption on ","element":"span"},{"text":"A ","element":"span"},{"text":"now implies ","element":"span"},{"text":"X ","element":"span"},{"text":"has independent coordinates satisfying ","element":"span"},{"style":{"height":20.01},"width":511.68,"height":50.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-12.png","element":"img","alt":" EX2i = 1 and ∥Xi∥ψ2 ≤ K.","inline":true,"padRight":true},{"text":"Lemma 5.3 in ","element":"span"},{"href":"#id-10","referenceIndex":20,"text":"[20] ","element":"a"},{"text":"states that","element":"span"}],[{"style":{"width":"23%"},"width":436,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-13.png","element":"img"}],[{"text":"In other words, ","element":"span"},{"style":{"height":17.6},"width":117.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-14.png","element":"img","alt":" ∥Ax∥2","inline":true,"padRight":true},{"text":"has a sub-Gaussian concentration around ","element":"span"},{"style":{"height":17.6},"width":75,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-15.png","element":"img","alt":"√m","inline":true},{"text":". It is worth noting that this concentration is independent of the ambient dimension ","element":"span"},{"text":"m","element":"span"},{"text":". We will follow a similar proof idea, but use the new inequalities (Theorem ","element":"span"},{"href":"#id-17","text":"1.3 ","element":"a"},{"text":"and Theorem ","element":"span"},{"href":"#id-20","text":"1.5) ","element":"a"},{"text":"to generalize and refine this result.","element":"span"}],[{"id":"id-45","style":{"height":16.4},"width":743.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-16.png","element":"img","alt":"Theorem 4.1. Let B be a fixed m × n","inline":true,"padRight":true},{"text":"matrix and let ","element":"span"},{"style":{"height":17.6},"width":448.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-17.png","element":"img","alt":" X = (X1, . . . , Xn) ∈ Rn ","inline":true,"padRight":true},{"text":"be a random vector with independent sub-Gaussian coordinates satisfying ","element":"span"},{"style":{"height":20.02},"width":511.36,"height":50.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-18.png","element":"img","alt":" EX2i = 1 and ∥Xi∥ψ2 ≤ K","inline":true},{"text":". If either one of ","element":"span"},{"text":"the following conditions further holds:","element":"span"}],[{"style":{"width":"20%"},"width":381,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/13-19.png","element":"img"}],[{"style":{"width":"69%"},"width":1302,"height":202,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-0.png","element":"img"}],[{"text":"Proof. ","element":"span"},{"text":"The conclusion is trivial if ","element":"span"},{"style":{"height":17.6},"width":79.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-1.png","element":"img","alt":" ∥B∥","inline":true,"padRight":true},{"text":"= 0, so we will assume ","element":"span"},{"text":"B ","element":"span"},{"text":"is non-zero.","element":"span"}],[{"id":"id-42","style":{"width":"99%"},"width":1866,"height":484,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-2.png","element":"img"}],[{"text":"Note that for ","element":"span"},{"style":{"height":16.4},"width":205.92,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-3.png","element":"img","alt":" α, β, s ≥ 0,","inline":true}],[{"style":{"width":"44%"},"width":829,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-4.png","element":"img"}],[{"text":"(This readily comes from the inequalities ","element":"span"},{"style":{"height":19.14},"width":1070.12,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-5.png","element":"img","alt":" |α2 − β2| ≥ |α − β|2 and |α2 − β2| ≥ |α − β|β whenever","inline":true}],[{"style":{"width":"9%"},"width":178,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-6.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":17.6},"width":512.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-7.png","element":"img","alt":" Z := ∥BX∥2 − ∥B∥F , then","inline":true}],[{"style":{"width":"41%"},"width":773,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-8.png","element":"img"}],[{"text":"To bound this probability, we observe that","element":"span"}],[{"style":{"width":"85%"},"width":1601,"height":253,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-9.png","element":"img"}],[{"text":"Combining these two bounds and then using property (b) in Appendix ","element":"span"},{"text":"A ","element":"span"},{"text":"complete the proof.","element":"span"}],[{"text":"(b) We will first use Bernstein’s inequality to obtain ","element":"span"},{"href":"#id-42","text":"(14)","element":"a"},{"text":". Denote ","element":"span"},{"style":{"height":15.09},"width":159.36,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-10.png","element":"img","alt":" bi := Bii","inline":true,"padRight":true},{"text":"the diagonal entries of","element":"span"}],[{"style":{"width":"70%"},"width":1316,"height":169,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-11.png","element":"img"}],[{"text":"For random variables ","element":"span"},{"style":{"height":19.54},"width":102.16,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-12.png","element":"img","alt":" X2i −","inline":true,"padRight":true},{"text":"1, notice that","element":"span"}],[{"style":{"width":"57%"},"width":1067,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-13.png","element":"img"}],[{"text":"where the ","element":"span"},{"style":{"height":16.4},"width":45.32,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-14.png","element":"img","alt":" ψ1","inline":true},{"text":"-norm estimate is from property (f) in Appendix ","element":"span"},{"text":"A. ","element":"span"},{"text":"So by Theorem ","element":"span"},{"href":"#id-17","text":"1.3 ","element":"a"},{"text":"and using the inequality ","element":"span"},{"style":{"height":20.8},"width":891.68,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-15.png","element":"img","alt":"� b4i ≤�maxi b2i�· � b2i = ∥B∥2∥B∥2F , we have","inline":true}],[{"style":{"width":"68%"},"width":1274,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/14-16.png","element":"img"}],[{"text":"The rest of the proof is the same as in (a).","element":"span"}],[{"id":"id-40","text":"4.2 ","element":"span"},{"text":"Sub-Gaussian Increments Lemma","element":"span"}],[{"text":"A key lemma for Theorem ","element":"span"},{"href":"#id-14","text":"1.1 ","element":"a"},{"text":"is to show that the random process ","element":"span"},{"style":{"height":17.6},"width":600.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-0.png","element":"img","alt":" Zx := ∥BAx∥2 − ∥B∥F ∥x∥2 has","inline":true,"padRight":true},{"text":"sub-Gaussian increments. That is, ","element":"span"},{"style":{"height":18.48},"width":761.68,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-1.png","element":"img","alt":" ∥Zx − Zy∥ψ2 ≤ M∥x − y∥2 for some M","inline":true,"padRight":true},{"text":"and for all ","element":"span"},{"style":{"height":16},"width":193.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-2.png","element":"img","alt":" x, y ∈ Rn.","inline":true,"padRight":true},{"text":"Theorem 1.3 in ","element":"span"},{"href":"#id-10","referenceIndex":20,"text":"[20] ","element":"a"},{"text":"showed sub-Gaussian increments for ","element":"span"},{"style":{"height":17.42},"width":450.92,"height":43.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-3.png","element":"img","alt":" B = Im with M = CK2","inline":true},{"text":". Here we improve and generalize this result to any ","element":"span"},{"style":{"height":18.26},"width":895.36,"height":45.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-4.png","element":"img","alt":" B with M = CK√log K ∥B∥. The K√log K","inline":true,"padRight":true},{"text":"factor is in fact optimal as suggested by Proposition ","element":"span"},{"href":"#id-15","text":"4.5 ","element":"a"},{"text":"in Section ","element":"span"},{"href":"#id-16","text":"4.4.","element":"a"}],[{"text":"We will prove two versions of the sub-Gaussian increment lemma. The first one (Lemma ","element":"span"},{"href":"#id-43","text":"4.2) ","element":"a"},{"text":"is for arbitrary ","element":"span"},{"text":"B","element":"span"},{"text":", but require the random matrix ","element":"span"},{"text":"A ","element":"span"},{"text":"to be mean zero. The second one (Lemma ","element":"span"},{"href":"#id-44","text":"4.3) ","element":"a"},{"text":"is only for diagonal ","element":"span"},{"text":"B","element":"span"},{"text":", but does not require zero mean from ","element":"span"},{"text":"A","element":"span"},{"text":".","element":"span"}],[{"text":"For Lemma ","element":"span"},{"href":"#id-43","text":"4.2 ","element":"a"},{"text":"below, the beginning of the proof follows the argument in ","element":"span"},{"href":"#id-10","referenceIndex":20,"text":"[20]","element":"a"},{"text":", except we will use Theorem ","element":"span"},{"href":"#id-45","text":"4.1 ","element":"a"},{"text":"for better tail bounds. Later on in the proof, we will use a different approach to bound one of the tail probabilities (i.e. ","element":"span"},{"style":{"height":11.6},"width":39.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-5.png","element":"img","alt":" p3","inline":true},{"text":") through the new Hanson-Wright inequality (Theorem ","element":"span"},{"href":"#id-20","text":"1.5)","element":"a"},{"text":".","element":"span"}],[{"id":"id-43","style":{"height":15.94},"width":557.04,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-6.png","element":"img","alt":"Lemma 4.2. Let B ∈ Rl×m ","inline":true,"padRight":true},{"text":"be a fixed matrix and let ","element":"span"},{"style":{"height":13.93},"width":204.84,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-7.png","element":"img","alt":" A ∈ Rm×n ","inline":true,"padRight":true},{"text":"be a mean zero, isotropic and sub-Gaussian matrix with sub-Gaussian parameter ","element":"span"},{"text":"K","element":"span"},{"text":". Then the random process","element":"span"}],[{"style":{"width":"76%"},"width":1441,"height":238,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-8.png","element":"img"}],[{"text":"Proof. ","element":"span"},{"text":"The statement is invariant under scaling for ","element":"span"},{"text":"B","element":"span"},{"text":". So without loss of generality, we will assume ","element":"span"},{"text":"B ","element":"span"},{"text":"has operator norm ","element":"span"},{"style":{"height":17.6},"width":170.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-9.png","element":"img","alt":" ∥B∥ = 1.","inline":true}],[{"style":{"width":"84%"},"width":1578,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-10.png","element":"img"}],[{"text":"Without loss of generality, assume ","element":"span"},{"style":{"height":16.8},"width":318.08,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-11.png","element":"img","alt":" x ̸= y and define","inline":true}],[{"style":{"width":"59%"},"width":1105,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-12.png","element":"img"}],[{"text":"We need to bound this tail probability by a Gaussian whose standard deviation is the order of ","element":"span"},{"style":{"height":17.85},"width":180.16,"height":44.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-13.png","element":"img","alt":"K√log K","inline":true},{"text":". Consider the following two cases:","element":"span"}],[{"style":{"width":"76%"},"width":1427,"height":203,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-14.png","element":"img"}],[{"text":"• ","element":"span"},{"text":"0 ","element":"span"},{"style":{"height":17.6},"width":481.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-15.png","element":"img","alt":" < s < 2∥B∥F . Write p as","inline":true}],[{"style":{"width":"88%"},"width":1658,"height":401,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/15-16.png","element":"img"}],[{"text":"Next we derive bounds for ","element":"span"},{"style":{"height":16},"width":260.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-0.png","element":"img","alt":" p1, p2 and p3.","inline":true}],[{"style":{"width":"14%"},"width":275,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-1.png","element":"img"}],[{"text":"From ","element":"span"},{"style":{"height":17.6},"width":372.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-2.png","element":"img","alt":" s ≥ 2∥B∥F we have","inline":true}],[{"style":{"width":"68%"},"width":1289,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-3.png","element":"img"}],[{"text":"Applying Theorem ","element":"span"},{"href":"#id-45","text":"4.1 ","element":"a"},{"text":"to the random vector ","element":"span"},{"text":"Au ","element":"span"},{"text":"we get","element":"span"}],[{"style":{"width":"62%"},"width":1179,"height":241,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-4.png","element":"img"}],[{"text":"Applying Theorem ","element":"span"},{"href":"#id-45","text":"4.1 ","element":"a"},{"text":"to the random vector ","element":"span"},{"text":"Ax ","element":"span"},{"text":"and note that ","element":"span"},{"style":{"height":21.26},"width":361.16,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-5.png","element":"img","alt":" ∥B∥F > 12s, we get","inline":true}],[{"style":{"width":"98%"},"width":1852,"height":516,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-6.png","element":"img"}],[{"text":"Notice that","element":"span"}],[{"style":{"width":"73%"},"width":1378,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-7.png","element":"img"}],[{"text":"Let us also denote ","element":"span"},{"style":{"height":15.49},"width":418.44,"height":38.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-8.png","element":"img","alt":" Xw := Aw for w ∈ Rn","inline":true},{"text":", then from ","element":"span"},{"style":{"height":19.94},"width":525.44,"height":49.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-9.png","element":"img","alt":" EXwXTw = ∥w∥22 In we have","inline":true}],[{"style":{"width":"68%"},"width":1283,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-10.png","element":"img"}],[{"text":"Thus we can further write ","element":"span"},{"text":"Z ","element":"span"},{"text":"as","element":"span"}],[{"style":{"width":"62%"},"width":1168,"height":401,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-11.png","element":"img"}],[{"text":"where the second equality uses the fact that ","element":"span"},{"text":"Z ","element":"span"},{"text":"is mean zero and in the last equality ","element":"span"},{"style":{"height":19.53},"width":320.08,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/16-12.png","element":"img","alt":" Yw := ∥BXw∥22−","inline":true}],[{"style":{"width":"99%"},"width":1871,"height":958,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/17-0.png","element":"img"}],[{"text":"so by Theorem ","element":"span"},{"href":"#id-20","text":"1.5 ","element":"a"},{"text":"we have","element":"span"}],[{"style":{"width":"64%"},"width":1205,"height":239,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/17-1.png","element":"img"}],[{"text":"Hence for 0 ","element":"span"},{"style":{"height":19.92},"width":334.08,"height":49.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/17-2.png","element":"img","alt":" ≤ t ≤ ∥w∥22∥B∥2F ,","inline":true}],[{"id":"id-46","style":{"width":"81%"},"width":1525,"height":112,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/17-3.png","element":"img"}],[{"text":"Now we apply Equation ","element":"span"},{"href":"#id-46","text":"(15) ","element":"a"},{"text":"to ","element":"span"},{"style":{"height":16},"width":260.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/17-4.png","element":"img","alt":" p4, p5 and p6.","inline":true}],[{"text":"• ","element":"span"},{"text":"For ","element":"span"},{"style":{"height":20.96},"width":1063.28,"height":52.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/17-5.png","element":"img","alt":" p4. Since s < 2∥B∥F and ∥u + v∥2 =�1 + ∥v∥22 ∈ [1,√","inline":true},{"text":"5), we can conclude that","element":"span"}],[{"style":{"width":"74%"},"width":1396,"height":268,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/17-6.png","element":"img"}],[{"text":"• ","element":"span"},{"text":"For ","element":"span"},{"style":{"height":11.6},"width":39.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/17-7.png","element":"img","alt":" p5","inline":true},{"text":". Notice that ","element":"span"},{"style":{"height":21.27},"width":669.52,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/17-8.png","element":"img","alt":" ∥u∥2 = 1 and 1 − 18∥v∥22 ∈ (12, 1], so","inline":true}],[{"style":{"width":"96%"},"width":1803,"height":372,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/17-9.png","element":"img"}],[{"style":{"width":"54%"},"width":1017,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-0.png","element":"img"}],[{"text":"So far we have showed that","element":"span"}],[{"style":{"width":"49%"},"width":926,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-1.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":26.66},"width":361.68,"height":66.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-2.png","element":"img","alt":" pi ≤ 2 exp( −cs2K2 log K ","inline":true,"padRight":true},{"text":") for some absolute constant ","element":"span"},{"style":{"height":16},"width":790.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-3.png","element":"img","alt":" c and 1 ≤ i ≤ 6. Note p ≤ 1 and the","inline":true,"padRight":true},{"text":"inequality min","element":"span"},{"style":{"height":20.34},"width":353.96,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-4.png","element":"img","alt":"{1, 8e−x} ≤ 2e−x/3 ","inline":true,"padRight":true},{"text":"(see Appendix ","element":"span"},{"text":"C)","element":"span"},{"text":", we get","element":"span"}],[{"style":{"width":"75%"},"width":1419,"height":214,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-5.png","element":"img"}],[{"text":"Without loss of generality, we can assume ","element":"span"},{"style":{"height":23.01},"width":708.6,"height":57.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-6.png","element":"img","alt":" ∥x∥1 = 1 and ∥y∥2 ≥ 1. Let ¯y := y∥y∥2 ","inline":true,"padRight":true},{"text":"be the projection ","element":"span"},{"text":"of ","element":"span"},{"text":"y ","element":"span"},{"text":"onto unit ball, then by triangle inequality,","element":"span"}],[{"style":{"width":"56%"},"width":1056,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-7.png","element":"img"}],[{"text":"Here ","element":"span"},{"style":{"height":14.69},"width":50.12,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-8.png","element":"img","alt":" R1","inline":true,"padRight":true},{"text":"it is bounded by ","element":"span"},{"style":{"height":19.13},"width":807.36,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-9.png","element":"img","alt":" CK√log K∥x − ¯y∥2 since x, ¯y ∈ Sn−1, and","inline":true}],[{"style":{"width":"67%"},"width":1257,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-10.png","element":"img"}],[{"text":"where the first equality uses ","element":"span"},{"style":{"height":18.29},"width":249.8,"height":45.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-11.png","element":"img","alt":" Zy = ∥y∥2Z¯y","inline":true},{"text":", the second equality is true since ","element":"span"},{"style":{"height":17.6},"width":391.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-12.png","element":"img","alt":" ∥y∥2 − 1 = ∥y − ¯y∥2","inline":true,"padRight":true},{"text":"and the last inequality follows from Theorem ","element":"span"},{"href":"#id-45","text":"4.1. ","element":"a"},{"text":"Combining these bounds we get","element":"span"}],[{"style":{"width":"50%"},"width":950,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-13.png","element":"img"}],[{"text":"Finally, note that ","element":"span"},{"style":{"height":17.6},"width":85.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-14.png","element":"img","alt":" ∥x∥2","inline":true,"padRight":true},{"text":"= 1, so by non-expansiveness of projection, ","element":"span"},{"style":{"height":17.6},"width":559.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-15.png","element":"img","alt":" ∥x − ¯y∥2 ≤ ∥x − y∥2, and by","inline":true,"padRight":true},{"text":"definition of projection, ","element":"span"},{"style":{"height":17.6},"width":381.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-16.png","element":"img","alt":" ∥y − ¯y∥2 ≤ ∥y − x∥2","inline":true},{"text":". This completes the proof.","element":"span"}],[{"text":"Next we show the second version of sub-Gaussian increment lemma, which requires ","element":"span"},{"text":"B ","element":"span"},{"text":"to be diagonal and does not need ","element":"span"},{"text":"A ","element":"span"},{"text":"to be mean zero. The proof is mostly the same as Lemma ","element":"span"},{"href":"#id-43","text":"4.2, ","element":"a"},{"text":"so we will only highlight the differences.","element":"span"}],[{"id":"id-44","style":{"height":15.94},"width":565.2,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-17.png","element":"img","alt":"Lemma 4.3. Let B ∈ Rl×m ","inline":true,"padRight":true},{"text":"be a fixed diagonal matrix and let ","element":"span"},{"style":{"height":13.94},"width":210.12,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-18.png","element":"img","alt":" A ∈ Rm×n ","inline":true,"padRight":true},{"text":"be a isotropic, subGaussian matrix with sub-Gaussian parameter ","element":"span"},{"text":"K","element":"span"},{"text":", then the random process","element":"span"}],[{"style":{"width":"76%"},"width":1441,"height":255,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-19.png","element":"img"}],[{"text":"Proof. ","element":"span"},{"text":"If ","element":"span"},{"text":"B ","element":"span"},{"text":"is not a square matrix, we can always add ","element":"span"},{"style":{"height":12.8},"width":106.12,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-20.png","element":"img","alt":" m − l","inline":true,"padRight":true},{"text":"rows of zeros to ","element":"span"},{"text":"B ","element":"span"},{"text":"(when ","element":"span"},{"text":"l < m","element":"span"},{"text":") or remove the last ","element":"span"},{"style":{"height":12.8},"width":107.16,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-21.png","element":"img","alt":" l − m","inline":true,"padRight":true},{"text":"rows of zeros from ","element":"span"},{"text":"B ","element":"span"},{"text":"(when ","element":"span"},{"text":"l > m","element":"span"},{"text":"). This will turn ","element":"span"},{"style":{"height":15.6},"width":447.2,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-22.png","element":"img","alt":" B into a m × m square","inline":true,"padRight":true},{"text":"matrix without changing the values of ","element":"span"},{"style":{"height":17.6},"width":470.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-23.png","element":"img","alt":" ∥BAx∥2, ∥B∥F and ∥B∥","inline":true},{"text":". So without loss of generality, we can assume ","element":"span"},{"text":"B ","element":"span"},{"text":"is a square matrix. Also without loss of generality, we can further assume ","element":"span"},{"style":{"height":17.6},"width":161.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/18-24.png","element":"img","alt":" ∥B∥ = 1","inline":true,"padRight":true},{"text":"since the conclusion is invariant under scaling for ","element":"span"},{"text":"B","element":"span"},{"text":".","element":"span"}],[{"text":"The remaining proof for Lemma ","element":"span"},{"href":"#id-44","text":"4.3 ","element":"a"},{"text":"is the same as proof for Lemma ","element":"span"},{"href":"#id-43","text":"4.2 ","element":"a"},{"text":"except for bounding ","element":"span"},{"style":{"height":15.2},"width":87.36,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-0.png","element":"img","alt":" p3 in","inline":true,"padRight":true},{"text":"Step 1. A bound for ","element":"span"},{"style":{"height":11.6},"width":39.08,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-1.png","element":"img","alt":" p3","inline":true,"padRight":true},{"text":"here can be obtained through the new Bernstein’s inequality (Theorem ","element":"span"},{"href":"#id-17","text":"1.3) ","element":"a"},{"text":"as detailed below.","element":"span"}],[{"text":"Recall that","element":"span"}],[{"style":{"width":"52%"},"width":980,"height":112,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-2.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.49},"width":781.44,"height":38.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-3.png","element":"img","alt":" bi := Bii and Ai is the i-th row of A.","inline":true,"padRight":true},{"text":"The random variables ","element":"span"},{"style":{"height":17.6},"width":466.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-4.png","element":"img","alt":" Yi := ⟨Ai, u⟩ ⟨Ai, v⟩ are","inline":true,"padRight":true},{"text":"independent, with","element":"span"}],[{"style":{"width":"73%"},"width":1379,"height":254,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-5.png","element":"img"}],[{"text":"Here we used ","element":"span"},{"style":{"height":17.6},"width":661.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-6.png","element":"img","alt":" ∥x∥2 = ∥y∥2 = ∥u∥2 = 1, ∥v∥2 ≤","inline":true,"padRight":true},{"text":"2, and that ","element":"span"},{"style":{"height":15.49},"width":44.64,"height":38.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-7.png","element":"img","alt":" Ai","inline":true,"padRight":true},{"text":"is isotropic. Furthermore, from property (d) in Appendix ","element":"span"},{"text":"A ","element":"span"},{"text":"we have","element":"span"}],[{"style":{"width":"84%"},"width":1581,"height":305,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-8.png","element":"img"}],[{"text":"Since 0 ","element":"span"},{"style":{"height":17.6},"width":402.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-9.png","element":"img","alt":" < s < 2∥B∥F , we get","inline":true}],[{"style":{"width":"73%"},"width":1384,"height":177,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-10.png","element":"img"}],[{"id":"id-41","text":"4.3 ","element":"span"},{"text":"Proof of Theorem ","element":"span"},{"href":"#id-14","text":"1.1","element":"a"}],[{"text":"Theorem ","element":"span"},{"href":"#id-14","text":"1.1 ","element":"a"},{"text":"follows form the sub-Gaussian increments lemmas and Talagrand’s Majorizing Measure ","element":"span"},{"id":"id-47","text":"Theorem. Let us first recall the Majorizing Measure Theorem. The following statement is from ","element":"span"},{"href":"#id-10","referenceIndex":20,"text":"[20]","element":"a"},{"text":".","element":"span"}],[{"text":"Theorem 4.4 ","element":"span"},{"text":"(Majorizing Measure Theorem)","element":"span"},{"style":{"height":17.6},"width":257.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-11.png","element":"img","alt":". Let (Zx)x∈T","inline":true,"padRight":true},{"text":"be a random process on a bounded set ","element":"span"},{"style":{"height":12.8},"width":150.6,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-12.png","element":"img","alt":"T ⊂ Rn","inline":true},{"text":". Assume that the process has sub-Gaussian increments, that is there exists ","element":"span"},{"style":{"height":14.8},"width":238.56,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-13.png","element":"img","alt":" M ≥ 0 such","inline":true,"padRight":true},{"text":"that","element":"span"}],[{"style":{"width":"76%"},"width":1436,"height":546,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/19-14.png","element":"img"}],[{"text":"The first part of Theorem ","element":"span"},{"href":"#id-47","text":"4.4 ","element":"a"},{"text":"can be found in ","element":"span"},{"href":"#id-1","referenceIndex":30,"text":"[31, ","element":"a"},{"text":"Theorem 2.4.12] and the second part can be found in ","element":"span"},{"href":"#id-11","referenceIndex":6,"text":"[7, ","element":"a"},{"text":"Theorem 3.2].","element":"span"}],[{"style":{"width":"95%"},"width":1792,"height":239,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/20-0.png","element":"img"}],[{"text":"Using Lemma ","element":"span"},{"href":"#id-43","text":"4.2 ","element":"a"},{"text":"and Theorem ","element":"span"},{"href":"#id-47","text":"4.4 ","element":"a"},{"text":"(Majorizing Measure Theorem), we get","element":"span"}],[{"style":{"width":"58%"},"width":1086,"height":86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/20-1.png","element":"img"}],[{"text":"Using property (e) in Appendix ","element":"span"},{"text":"A ","element":"span"},{"text":"and Lemma ","element":"span"},{"href":"#id-43","text":"4.2, ","element":"a"},{"text":"we get","element":"span"}],[{"style":{"width":"99%"},"width":1866,"height":357,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/20-2.png","element":"img"}],[{"text":"Since diam(","element":"span"},{"style":{"height":17.6},"width":247,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/20-3.png","element":"img","alt":"T) ≤ 2 rad(T","inline":true},{"text":"), applying Lemma ","element":"span"},{"href":"#id-43","text":"4.2 ","element":"a"},{"text":"and Theorem ","element":"span"},{"href":"#id-47","text":"4.4 ","element":"a"},{"text":"we know that the event","element":"span"}],[{"style":{"width":"98%"},"width":1840,"height":361,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/20-4.png","element":"img"}],[{"text":"holds with probability at least 1 ","element":"span"},{"style":{"height":17.75},"width":165.6,"height":44.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/20-5.png","element":"img","alt":" − 2e−u2.","inline":true,"padRight":true},{"text":"Combining these yields the desired high probability bound.","element":"span"}],[{"text":"Finally, when ","element":"span"},{"text":"B ","element":"span"},{"text":"is a diagonal matrix and ","element":"span"},{"text":"A ","element":"span"},{"text":"is not necessarily mean zero, we can repeat the above argument with Lemma ","element":"span"},{"href":"#id-44","text":"4.3 ","element":"a"},{"text":"instead of Lemma ","element":"span"},{"href":"#id-43","text":"4.2. ","element":"a"},{"text":"This completes the proof.","element":"span"}],[{"id":"id-16","text":"4.4 ","element":"span"},{"text":"An Example for Lower Bound","element":"span"}],[{"text":"Here we use scaled Bernoulli random variables to demonstrate that the ","element":"span"},{"style":{"height":17.86},"width":180.16,"height":44.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/20-6.png","element":"img","alt":" K√log K","inline":true,"padRight":true},{"text":"factor in Theo- ","element":"span"},{"id":"id-15","text":"rem ","element":"span"},{"href":"#id-14","text":"1.1 ","element":"a"},{"text":"is optimal in general.","element":"span"}],[{"style":{"height":17.6},"width":1147.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/20-7.png","element":"img","alt":"Proposition 4.5. Let K ≥ 3 and X = (X1, . . . , Xm) ∈ Rm ","inline":true,"padRight":true},{"text":"be a random vector with independent coordinates such that ","element":"span"},{"style":{"height":31.6},"width":1463.08,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/20-8.png","element":"img","alt":"1K2 log K X2i ∼ Bernoulli� 1K2 log K�, then ∥Xi∥ψ2 ≤ K, and for m ≥ K2 log K,","inline":true}],[{"id":"id-49","style":{"width":"99%"},"width":1869,"height":163,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/20-9.png","element":"img"}],[{"text":"Note that the expected number of non-zero coordinates for ","element":"span"},{"style":{"height":23.66},"width":621.76,"height":59.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-0.png","element":"img","alt":" X is mK2 log K , so m ≥ K2 log K","inline":true,"padRight":true},{"text":"essentially says ","element":"span"},{"text":"X ","element":"span"},{"text":"is non-zero in expectation, which is a mild assumption. For the proof of Proposition ","element":"span"},{"href":"#id-15","text":"4.5, ","element":"a"},{"text":"we will need the following lower bound (see ","element":"span"},{"href":"#id-48","referenceIndex":25,"text":"[26, ","element":"a"},{"text":"Lemma 4.7.2]) about Binomial distributions.","element":"span"}],[{"id":"id-50","style":{"width":"93%"},"width":1759,"height":140,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-1.png","element":"img"}],[{"text":"Here ","element":"span"},{"style":{"height":17.6},"width":122.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-2.png","element":"img","alt":" D(x∥y","inline":true},{"text":") is the Kullback-Leibler divergence between two Bernoulli distributions with parameters ","element":"span"},{"text":"x ","element":"span"},{"text":"and ","element":"span"},{"text":"y ","element":"span"},{"text":"respectively given by","element":"span"}],[{"style":{"width":"37%"},"width":704,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-3.png","element":"img"}],[{"text":"Moreover, for 0 ","element":"span"},{"text":"< y < x < ","element":"span"},{"text":"1,","element":"span"}],[{"style":{"width":"67%"},"width":1265,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-4.png","element":"img"}],[{"href":"#id-15","style":{"height":18.48},"width":799.36,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-5.png","element":"img","alt":"Proof of Proposition 4.5. ∥Xi∥ψ2 ≤ K","inline":true,"padRight":true},{"text":"follows directly from definition since","element":"span"}],[{"style":{"width":"60%"},"width":1124,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-6.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":19.14},"width":853.6,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-7.png","element":"img","alt":" λ > 0, Z := ∥X∥2 − √m and L2 := K2 log K","inline":true},{"text":", with a change of variable ","element":"span"},{"style":{"height":19.14},"width":355.04,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-8.png","element":"img","alt":" s = λt/L2 we have","inline":true}],[{"style":{"width":"52%"},"width":979,"height":341,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-9.png","element":"img"}],[{"text":"To show ","element":"span"},{"href":"#id-49","text":"(16)","element":"a"},{"text":", we need to find a ","element":"span"},{"style":{"height":19.14},"width":643.68,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-10.png","element":"img","alt":" λ such that E exp(λZ2/L2) > 2.","inline":true,"padRight":true},{"text":"So by a change of variable ","element":"span"},{"style":{"height":15.13},"width":162.44,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-11.png","element":"img","alt":"t = v2L2","inline":true},{"text":", it suffices to show","element":"span"}],[{"style":{"width":"53%"},"width":1010,"height":102,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-12.png","element":"img"}],[{"text":"Let","element":"span"}],[{"style":{"width":"67%"},"width":1266,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-13.png","element":"img"}],[{"text":"then","element":"span"}],[{"style":{"width":"59%"},"width":1113,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-14.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":21.84},"width":555.68,"height":54.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-15.png","element":"img","alt":"1L2∥X∥22 ∼ Binomial�m, 1L2�","inline":true},{"text":". So by ","element":"span"},{"href":"#id-50","text":"(17) ","element":"a"},{"text":"we have","element":"span"}],[{"style":{"width":"54%"},"width":1021,"height":360,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/21-16.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"height":15.09},"width":42.44,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-0.png","element":"img","alt":" λ0","inline":true,"padRight":true},{"text":":= 9 log 9. Here the second inequality uses 2","element":"span"},{"style":{"height":17.6},"width":138.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-1.png","element":"img","alt":"v/βv ≥","inline":true,"padRight":true},{"text":"1 on the interval of integration, and the last inequality holds because","element":"span"}],[{"style":{"width":"74%"},"width":1398,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-2.png","element":"img"}],[{"text":"Take ","element":"span"},{"style":{"height":16.4},"width":264.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-3.png","element":"img","alt":" λ = λ0 we get","inline":true}],[{"style":{"width":"28%"},"width":536,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-4.png","element":"img"}],[{"text":"This proves ","element":"span"},{"href":"#id-49","text":"(16) ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":18.83},"width":348.48,"height":47.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-5.png","element":"img","alt":" c = 1/√λ0 ≈ 0.22.","inline":true}]]},{"heading":"5 Applications","paragraphs":[[{"text":"5.1 ","element":"span"},{"text":"Johnson-Lindenstrauss Lemma","element":"span"}],[{"text":"One immediate application of our result is a guarantee for all isotropic and sub-Gaussian matrices as Johnson-Lindenstrauss (JL) embeddings for dimension reduction. We state this JL lemma below. It follows directly form Theorem ","element":"span"},{"href":"#id-45","text":"4.1.","element":"a"}],[{"id":"id-51","style":{"height":13.94},"width":538.44,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-6.png","element":"img","alt":"Lemma 5.1. Let A ∈ Rm×n ","inline":true,"padRight":true},{"text":"be an isotropic and sub-Gaussian matrix with sub-Gaussian parameter","element":"span"}],[{"style":{"width":"99%"},"width":1866,"height":332,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-7.png","element":"img"}],[{"text":"Proof. ","element":"span"},{"text":"By scaling we can assume ","element":"span"},{"style":{"height":17.6},"width":161.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-8.png","element":"img","alt":" ∥x − y∥2","inline":true,"padRight":true},{"text":"= 1. By Theorem ","element":"span"},{"href":"#id-45","text":"4.1 ","element":"a"},{"text":"(with ","element":"span"},{"style":{"height":17.6},"width":326.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-9.png","element":"img","alt":" B = Im) we have","inline":true}],[{"style":{"width":"45%"},"width":856,"height":61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-10.png","element":"img"}],[{"text":"the result then follows from property (a) in Appendix ","element":"span"},{"text":"A.","element":"span"}],[{"style":{"width":"96%"},"width":1802,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-11.png","element":"img"}],[{"text":"on the example in Proposition ","element":"span"},{"href":"#id-15","text":"4.5, ","element":"a"},{"text":"we can further show that (see Appendix ","element":"span"},{"text":"B) ","element":"span"},{"text":"the dependence on sub-Gaussian parameter ","element":"span"},{"text":"K ","element":"span"},{"text":"here is also optimal for small ","element":"span"},{"style":{"height":15.6},"width":59.36,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-12.png","element":"img","alt":" ε, δ","inline":true},{"text":". Similar results to Lemma ","element":"span"},{"href":"#id-51","text":"5.1 ","element":"a"},{"text":"have appeared in ","element":"span"},{"href":"#id-4","referenceIndex":7,"text":"[8, ","element":"a"},{"href":"#id-52","referenceIndex":22,"text":"22]","element":"a"},{"text":", but to the best of our knowledge, the previous known dependence on ","element":"span"},{"text":"K ","element":"span"},{"text":"was ","element":"span"},{"style":{"height":14.73},"width":71.04,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-13.png","element":"img","alt":"K4.","inline":true}],[{"text":"5.2 ","element":"span"},{"text":"Randomized Sketches","element":"span"}],[{"text":"Randomized sketching provide a method for approximating convex programs ","element":"span"},{"href":"#id-6","referenceIndex":24,"text":"[25, ","element":"a"},{"href":"#id-53","referenceIndex":34,"text":"35]","element":"a"},{"text":". In essence, a randomized sketch reduces the dimension of the original optimization problem through random projections, which can be beneficial in both computational time and memory storage. Following the problem formulation and ideas in ","element":"span"},{"href":"#id-6","referenceIndex":24,"text":"[25]","element":"a"},{"text":", consider convex program in the form of","element":"span"}],[{"id":"id-54","style":{"width":"61%"},"width":1150,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/22-14.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.14},"width":626.64,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-0.png","element":"img","alt":" B ∈ Rn×d, y ∈ Rd and C ⊂ Rd ","inline":true,"padRight":true},{"text":"is some convex set. Let ","element":"span"},{"style":{"height":13.94},"width":208.2,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-1.png","element":"img","alt":" A ∈ Rm×n ","inline":true,"padRight":true},{"text":"be an isotropic and sub-Gaussian matrix and solve instead the convex program","element":"span"}],[{"id":"id-55","style":{"width":"63%"},"width":1181,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-2.png","element":"img"}],[{"text":"This is called the ”sketched problem”. It reduce the dimension from ","element":"span"},{"text":"n ","element":"span"},{"text":"to ","element":"span"},{"text":"m ","element":"span"},{"text":"and can be viewed as an approximation to the original problem ","element":"span"},{"href":"#id-54","text":"(20)","element":"a"},{"text":". Moreover, say a solution ˆ","element":"span"},{"text":"x ","element":"span"},{"text":"to the sketched problem ","element":"span"},{"href":"#id-55","text":"(21) ","element":"a"},{"text":"is ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-3.png","element":"img","alt":" δ","inline":true},{"text":"-optimal to the original optimal solution ","element":"span"},{"href":"#id-54","style":{"height":17.6},"width":232,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-4.png","element":"img","alt":" x∗ of (20) if","inline":true}],[{"style":{"width":"21%"},"width":403,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-5.png","element":"img"}],[{"text":"Pilanci and Wainwright ","element":"span"},{"href":"#id-6","referenceIndex":24,"text":"[25] ","element":"a"},{"text":"gave a high probability guarantee for ˆ","element":"span"},{"style":{"height":16.4},"width":181.28,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-6.png","element":"img","alt":"x being δ","inline":true},{"text":"-optimal when ","element":"span"},{"text":"m ","element":"span"},{"text":"is sufficient large. The following Theorem ","element":"span"},{"href":"#id-56","text":"5.2 ","element":"a"},{"text":"improves the dependence on ","element":"span"},{"text":"K ","element":"span"},{"text":"in their guarantee from ","element":"span"},{"style":{"height":18.74},"width":298.72,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-7.png","element":"img","alt":" K4 to K2 log K","inline":true},{"text":". The proof of Theorem ","element":"span"},{"href":"#id-56","text":"5.2 ","element":"a"},{"text":"is also more concise thanks to the tools we have developed.","element":"span"}],[{"id":"id-56","style":{"height":17.6},"width":339.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-8.png","element":"img","alt":"Theorem 5.2 (δ","inline":true},{"text":"-optimal guarantee)","element":"span"},{"text":". ","element":"span"},{"text":"Let ","element":"span"},{"text":"A ","element":"span"},{"text":"be an isotropic and sub-Gaussian matrix with subGaussian parameter ","element":"span"},{"style":{"height":17.6},"width":474.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-9.png","element":"img","alt":" K. For any δ ∈ (0, 1), if","inline":true}],[{"style":{"width":"32%"},"width":616,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-10.png","element":"img"}],[{"text":"then a solution ","element":"span"},{"text":"ˆ","element":"span"},{"text":"x ","element":"span"},{"text":"to the sketched problem as given in ","element":"span"},{"href":"#id-55","text":"(21) ","element":"a"},{"text":"is ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-11.png","element":"img","alt":" δ","inline":true},{"text":"-optimal with probability at least ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":20.35},"width":704.84,"height":50.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-12.png","element":"img","alt":" − c1e−c2mδ2/(K2 log K). Here c0, c1, c2","inline":true,"padRight":true},{"text":"are absolute constants and ","element":"span"},{"text":"T ","element":"span"},{"text":"is the tangent cone of ","element":"span"},{"text":"C ","element":"span"},{"text":"at optimum ","element":"span"},{"style":{"height":16.4},"width":229.88,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-13.png","element":"img","alt":" x∗, given by","inline":true}],[{"style":{"width":"70%"},"width":1325,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-14.png","element":"img"}],[{"text":"We will use an argument similar to ","element":"span"},{"href":"#id-6","referenceIndex":24,"text":"[25] ","element":"a"},{"text":"to prove Theorem ","element":"span"},{"href":"#id-56","text":"5.2. ","element":"a"},{"text":"First let us state a deterministic ","element":"span"},{"id":"id-58","text":"result that says ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-15.png","element":"img","alt":" δ","inline":true},{"text":"-optimality can be obtained by controlling two quantities.","element":"span"}],[{"style":{"width":"90%"},"width":1685,"height":534,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-16.png","element":"img"}],[{"id":"id-59","text":"Next we show a technical Lemma that will be helpful when estimating ","element":"span"},{"style":{"height":15.09},"width":208.8,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-17.png","element":"img","alt":" Z1 and Z2.","inline":true}],[{"text":"Lemma 5.4. ","element":"span"},{"text":"Let ","element":"span"},{"text":"A ","element":"span"},{"text":"be an isotropic and sub-Gaussian matrix with sub-Gaussian parameter ","element":"span"},{"text":"K","element":"span"},{"text":", and let ","element":"span"},{"style":{"height":12.8},"width":142.92,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-18.png","element":"img","alt":" T ⊂ Rn ","inline":true,"padRight":true},{"text":"be a set with radius rad","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":129.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-19.png","element":"img","alt":"T) ≤ 2","inline":true},{"text":", then there exists absolute constants ","element":"span"},{"text":"C ","element":"span"},{"text":"and ","element":"span"},{"text":"c ","element":"span"},{"text":"such that for any ","element":"span"},{"style":{"height":17.6},"width":183.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-20.png","element":"img","alt":" δ ∈ (0, 1),","inline":true}],[{"style":{"width":"87%"},"width":1643,"height":167,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/23-21.png","element":"img"}],[{"style":{"height":18.05},"width":566.56,"height":45.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-0.png","element":"img","alt":"Proof. Denote L := K√log K","inline":true},{"text":". By Corollary ","element":"span"},{"href":"#id-57","text":"1.2 ","element":"a"},{"text":"we have","element":"span"}],[{"style":{"width":"92%"},"width":1730,"height":438,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-1.png","element":"img"}],[{"text":"holds with probability at least 1 ","element":"span"},{"style":{"height":17.74},"width":317.8,"height":44.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-2.png","element":"img","alt":" − 3e−mδ20/(9C20L2)","inline":true},{"text":". On this event,","element":"span"}],[{"style":{"width":"45%"},"width":858,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-3.png","element":"img"}],[{"text":"where we use the estimate","element":"span"},{"style":{"height":32.53},"width":820.8,"height":81.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-4.png","element":"img","alt":"�� 1√m∥Ax∥2 + ∥x∥2�� ≤ 2∥x∥2 + δ0 for x ∈ T.","inline":true}],[{"text":"Proof of Theorem ","element":"span"},{"href":"#id-56","text":"5.2. ","element":"a"},{"text":"We wish to control the ratio ","element":"span"},{"style":{"height":17.6},"width":114.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-5.png","element":"img","alt":" Z2/Z1","inline":true,"padRight":true},{"text":"in sight of Lemma ","element":"span"},{"href":"#id-58","text":"5.3. ","element":"a"},{"text":"By Lemma ","element":"span"},{"href":"#id-59","text":"5.4, ","element":"a"},{"text":"if ","element":"span"},{"style":{"height":19.14},"width":590.88,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-6.png","element":"img","alt":" m ≥ CK2 log Kw2(T)/δ2, then","inline":true}],[{"style":{"width":"41%"},"width":777,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-7.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":21.26},"width":853.76,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-8.png","element":"img","alt":" T := BT ∩ Sn−1 and Q := 1mAT A − I. Since","inline":true}],[{"style":{"width":"50%"},"width":939,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-9.png","element":"img"}],[{"text":"triangle inequality gives","element":"span"}],[{"style":{"width":"83%"},"width":1559,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-10.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":468.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-11.png","element":"img","alt":" u + T := {u + v : v ∈ T}","inline":true},{"text":". Applying Lemma ","element":"span"},{"href":"#id-59","text":"5.4 ","element":"a"},{"text":"to ","element":"span"},{"style":{"height":23.81},"width":428.36,"height":59.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-12.png","element":"img","alt":" Z(i)2 (i = 1, 2, 3) we get","inline":true}],[{"style":{"width":"93%"},"width":1746,"height":703,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-13.png","element":"img"}],[{"text":"Combining the bounds for ","element":"span"},{"style":{"height":15.09},"width":361.76,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-14.png","element":"img","alt":" Z1 and Z2 we have","inline":true}],[{"style":{"width":"20%"},"width":387,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-15.png","element":"img"}],[{"text":"with probability at least 1 ","element":"span"},{"style":{"height":17.55},"width":388.36,"height":43.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/24-16.png","element":"img","alt":" − 12e−cmδ2/(K2 log K)","inline":true},{"text":". This completes the proof.","element":"span"}],[{"text":"5.3 ","element":"span"},{"text":"Favorable Landscape for Blind Demodulation with Generative Priors","element":"span"}],[{"text":"In this section, we give a concrete example where the improvement on the sub-Gaussian parameter ","element":"span"},{"text":"K ","element":"span"},{"text":"can be important through blind demodulation with generative priors.","element":"span"}],[{"text":"Blind demodulation aims to recover two signals ","element":"span"},{"style":{"height":19.14},"width":198.64,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-0.png","element":"img","alt":" x0, y0 ∈ Rl ","inline":true,"padRight":true},{"text":"from observation ","element":"span"},{"style":{"height":16.4},"width":352.64,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-1.png","element":"img","alt":" z0 = x0 ◦y0, where","inline":true,"padRight":true},{"text":"◦ ","element":"span"},{"text":"denotes componentwise multiplication. Due to the inherent nature of ambiguity of the solutions from ","element":"span"},{"style":{"height":10.69},"width":37.16,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-2.png","element":"img","alt":" z0","inline":true},{"text":", one usually assume that the signals come with some structure. A traditional way to model this structure is through a sparsity prior with respect to a basis such as wavelet or the Discrete Cosine Transform basis in case the signals are images.","element":"span"}],[{"text":"On the other hand, with recent development in deep learning, the generative adversarial network (GAN) is turning out to be very effective in generating realistic synthetic images, which naturally indicates that we may model a certain type of image signals as outputs of GAN. Especially in the inverse problems like compressed sensing, phase retrieval including this blind demodulation, practitioners have observed an order of magnitude sample (observation) complexity improvement over the sparsity prior ","element":"span"},{"href":"#id-60","referenceIndex":4,"text":"[5,","element":"a"},{"href":"#id-61","referenceIndex":11,"text":"12,","element":"a"},{"href":"#id-62","referenceIndex":21,"text":"21]","element":"a"},{"text":".","element":"span"}],[{"text":"This alternative model is called the generative prior and as a consequence is becoming a new promising model for modern signal processing ","element":"span"},{"href":"#id-60","referenceIndex":4,"text":"[5, ","element":"a"},{"href":"#id-63","referenceIndex":9,"text":"10–","element":"a"},{"href":"#id-61","referenceIndex":11,"text":"12]","element":"a"},{"text":". ","element":"span"},{"text":"In Hand and Joshi ","element":"span"},{"href":"#id-63","referenceIndex":9,"text":"[10]","element":"a"},{"text":", the authors provide a global landscape guarantee for blind demodulation problem with generative priors and they applied our Bernstein’s inequality in their proof.","element":"span"}],[{"text":"With generative priors, unknown signals ","element":"span"},{"style":{"height":12},"width":101.96,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-3.png","element":"img","alt":" x0, y0","inline":true,"padRight":true},{"text":"are assumed to be in the range of two generative neural networks ","element":"span"},{"style":{"height":18.34},"width":250.6,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-4.png","element":"img","alt":" G(1) and G(2) ","inline":true,"padRight":true},{"text":"respectively. More precisely, ","element":"span"},{"style":{"height":18.34},"width":423.8,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-5.png","element":"img","alt":" G(1) : Rn → Rl is a d","inline":true},{"text":"-layer network, ","element":"span"},{"style":{"height":18.34},"width":388.2,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-6.png","element":"img","alt":"G(2) : Rp → Rl is a s","inline":true},{"text":"-layer network and they can be written as","element":"span"}],[{"style":{"width":"55%"},"width":1042,"height":174,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-7.png","element":"img"}],[{"text":"where relu is the Rectified Linear Unit activation function given by relu(","element":"span"},{"text":"x","element":"span"},{"text":") = max","element":"span"},{"text":"{","element":"span"},{"text":"x, ","element":"span"},{"text":"0","element":"span"},{"text":"} ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":26.4},"width":895.6,"height":66,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-8.png","element":"img","alt":"W (1)i , W (2)j for i ∈ {1, . . . , d} and j ∈ {1, . . . , s}","inline":true,"padRight":true},{"text":"are weight matrices.","element":"span"}],[{"text":"The weight matrices are normally obtained in the training process of the networks but the empirical evidence in ","element":"span"},{"href":"#id-19","referenceIndex":3,"text":"[4] ","element":"a"},{"text":"suggests that they behave a “random-like” quantity . ","element":"span"},{"text":"Based on this phenomenon, the authors of ","element":"span"},{"href":"#id-63","referenceIndex":9,"text":"[10] ","element":"a"},{"text":"made the following additional assumptions on the networks ","element":"span"},{"style":{"height":18.34},"width":71.08,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-9.png","element":"img","alt":" G(1)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":18.34},"width":71.08,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-10.png","element":"img","alt":" G(2) ","inline":true,"padRight":true},{"text":"to facilitate analysis further:","element":"span"}],[{"style":{"width":"71%"},"width":1335,"height":200,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-11.png","element":"img"}],[{"text":"The signals can then be recovered by finding their latent codes ","element":"span"},{"style":{"height":15.09},"width":629.96,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-12.png","element":"img","alt":" h0 ∈ Rn and m0 ∈ Rp such that","inline":true},{"style":{"height":20.34},"width":596.36,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-13.png","element":"img","alt":"x0 = G(1)(h0) and y0 = G(2)(m0","inline":true},{"text":"). This leads to the following empirical risk minimization program:","element":"span"}],[{"style":{"width":"66%"},"width":1242,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-14.png","element":"img"}],[{"text":"Note that there is a scaling ambiguity in this problem since it does not distinguish points on curve ","element":"span"},{"style":{"height":21.27},"width":357.04,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/25-15.png","element":"img","alt":" {(ch, 1cm) : c > 0}","inline":true,"padRight":true},{"text":"for any given (","element":"span"},{"text":"h, m","element":"span"},{"text":"), thus one can only hope to find the solution curve","element":"span"}],[{"style":{"height":21.26},"width":391.12,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-0.png","element":"img","alt":"{(ch0, 1cm0) : c > 0}","inline":true},{"text":". The authors in ","element":"span"},{"href":"#id-63","referenceIndex":9,"text":"[10] ","element":"a"},{"text":"showed that under assumptions A1-A3, two conditions ","element":"span"},{"text":"that are called the Weight Distributed Condition (WDC) and the joint-WDC are met. ","element":"span"},{"text":"These conditions guarantee a favorable landscape for the objective function ","element":"span"},{"text":"f","element":"span"},{"text":"(","element":"span"},{"text":"h, m","element":"span"},{"text":"), namely ","element":"span"},{"text":"f ","element":"span"},{"text":"has a descent direction at all points outside of a small neighborhood of four curves containing the solution. One of the important ingredients in their proof is concentration bounds for singular values of random matrices. When they showed that the joint-WDC condition is satisfied by concentration argument, they were able to improve the requirement in assumption A3 from, up to log factors, ","element":"span"},{"style":{"height":18.74},"width":497,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-1.png","element":"img","alt":"l ≳ n3 + p3 to l ≳ n2 + p2","inline":true},{"text":". Such improvement is made possible by our new Bernstein’s inequality with refined sub-exponential parameter dependence. This ","element":"span"},{"style":{"height":18.34},"width":137.96,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-2.png","element":"img","alt":" n2 + p2 ","inline":true,"padRight":true},{"text":"sample complexity matches the one in the previous recovery guarantees with sparsity prior (in which case ","element":"span"},{"text":"n ","element":"span"},{"text":"and ","element":"span"},{"text":"p ","element":"span"},{"text":"denotes the sparsity levels), but potentially better since the latent code dimension is oftentimes smaller than a sparsity level with respect to a particular basis. See Theorem 2, Theorem 5, Lemma 8, and Lemma 9 in ","element":"span"},{"href":"#id-63","referenceIndex":9,"text":"[10] ","element":"a"},{"text":"for more details.","element":"span"}]]},{"heading":"6 Conclusion","paragraphs":[[{"text":"In this article, we proved the optimal concentration bound for sub-Gaussian random matrices on sets. Namely, with high probability,","element":"span"}],[{"style":{"width":"58%"},"width":1086,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.94},"width":191.76,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-4.png","element":"img","alt":" B ∈ Rl×m ","inline":true,"padRight":true},{"text":"is an arbitrary matrix, ","element":"span"},{"style":{"height":13.94},"width":199.08,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-5.png","element":"img","alt":" A ∈ Rm×n ","inline":true,"padRight":true},{"text":"is an (mean zero) isotropic and sub-Gaussian random matrix, ","element":"span"},{"style":{"height":12.8},"width":137.64,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-6.png","element":"img","alt":" T ∈ Rn ","inline":true,"padRight":true},{"text":"is the set, ","element":"span"},{"text":"K ","element":"span"},{"text":"is the sub-Gaussian parameter of ","element":"span"},{"text":"A","element":"span"},{"text":", sr(","element":"span"},{"text":"B","element":"span"},{"text":") is the stable rank of ","element":"span"},{"text":"B","element":"span"},{"text":", ","element":"span"},{"text":"w","element":"span"},{"text":"(","element":"span"},{"text":"T","element":"span"},{"text":") is the gaussian width of ","element":"span"},{"style":{"height":19.82},"width":551.24,"height":49.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-7.png","element":"img","alt":" T and rad(T) := supy∈T ∥y∥2","inline":true},{"text":". Compared to the previous work in ","element":"span"},{"href":"#id-10","referenceIndex":20,"text":"[20]","element":"a"},{"text":", this result generalizes by allowing an arbitrary matrix ","element":"span"},{"text":"B ","element":"span"},{"text":"while improves the dependency on the sub-Gaussian parameter from ","element":"span"},{"style":{"height":14.74},"width":57.32,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-8.png","element":"img","alt":" K2 ","inline":true,"padRight":true},{"text":"to the optimal ","element":"span"},{"style":{"height":18.05},"width":180.16,"height":45.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-9.png","element":"img","alt":" K√log K","inline":true},{"text":". Consequently, this can lead to a tighter concentration bound even in the cases where the sub-Gaussian matrix ","element":"span"},{"text":"BA ","element":"span"},{"text":"have correlated rows. It is also worth noting that dependence on ","element":"span"},{"text":"w","element":"span"},{"text":"(","element":"span"},{"text":"T","element":"span"},{"text":") + rad(","element":"span"},{"text":"T","element":"span"},{"text":") is optimal in general as well.","element":"span"}],[{"text":"We also proved, under extra moment conditions, a new Bernstein type inequality and a new Hanson-Wright inequality. ","element":"span"},{"text":"The extra conditions here are bounded first absolute moment (e.g. ","element":"span"},{"style":{"height":17.6},"width":144.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-10.png","element":"img","alt":"E|Yi| ≤","inline":true,"padRight":true},{"text":"2) for Bernstein’s inequality and bounded second moment (e.g. ","element":"span"},{"style":{"height":19.73},"width":85.64,"height":49.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/26-11.png","element":"img","alt":" EX2i ","inline":true,"padRight":true},{"text":"= 1) for Hanson- ","element":"span"},{"text":"Wright inequality. ","element":"span"},{"text":"In many cases, these conditions can be easily met – for example, they are implied by the isotropic condition of random variables or vectors. In general, both of our new inequalities give improved tail bounds in the sub-Gaussian regime, which is the regime of interest in many applications as demonstrated in Section ","element":"span"},{"text":"5.","element":"span"}]]},{"heading":"7 Acknowledgements","paragraphs":[[{"text":"Y. Plan is partially supported by an NSERC Discovery Grant (22R23068), a PIMS CRG 33: HighDimensional Data Analysis, and a Tier II Canada Research Chair in Data Science. ","element":"span"},{"text":"¨O. Yılmaz is partially supported by an NSERC Discovery Grant (22R82411) and PIMS CRG 33: HighDimensional Data Analysis. H. Jeong is funded in part by the University of British Columbia Data Science Institute (UBC DSI) and by the Pacific Institute of Mathematical Sciences (PIMS). The authors of this paper would also like to thank Babhru Joshi for the helpful discussions and providing us with an important application in Section ","element":"span"},{"text":"5.","element":"span"}]]},{"heading":"References","paragraphs":[[{"id":"id-29","text":"[1] D. Achlioptas. ","element":"span"},{"text":"Database-friendly random projections: ","element":"span"},{"text":"Johnson-lindenstrauss with binary coins. ","element":"span"},{"text":"Journal of computer and System Sciences","element":"span"},{"text":", 66(4):671–687, 2003.","element":"span"}],[{"text":"[2] R. Adamczak. ","element":"span"},{"text":"Logarithmic sobolev inequalities and concentration of measure for convex functions and polynomial chaoses. ","element":"span"},{"text":"Bulletin of the Polish Academy of Sciences Mathematics","element":"span"},{"text":", 53(2):221–238, 2005.","element":"span"}],[{"id":"id-19","text":"[3] R. Adamczak. A note on the hanson-wright inequality for random vectors with dependencies. ","element":"span"},{"text":"Electronic Communications in Probability","element":"span"},{"text":", 20, 2015.","element":"span"}],[{"id":"id-60","text":"[4] S. Arora, Y. Liang, and T. Ma. Why are deep nets reversible: A simple theory, with implica- ","element":"span"},{"text":"tions for training. ","element":"span"},{"text":"arXiv preprint arXiv:1511.05653","element":"span"},{"text":", 2015.","element":"span"}],[{"text":"[5] A. Bora, A. Jalal, E. Price, and A. G. Dimakis. Compressed sensing using generative models. In ","element":"span"},{"text":"Proceedings of the 34th International Conference on Machine Learning-Volume 70","element":"span"},{"text":", pages 537–546. JMLR. org, 2017.","element":"span"}],[{"id":"id-11","text":"[6] E. J. Cand`es. Mathematics of sparsity (and a few other things). In ","element":"span"},{"text":"Proceedings of the International Congress of Mathematicians, Seoul, South Korea","element":"span"},{"text":", volume 123, 2014.","element":"span"}],[{"id":"id-4","text":"[7] S. Dirksen. Tail bounds via generic chaining. ","element":"span"},{"text":"Electronic Journal of Probability","element":"span"},{"text":", 20, 2015.","element":"span"}],[{"id":"id-0","text":"[8] S. Dirksen. Dimensionality reduction with subgaussian matrices: a unified theory. ","element":"span"},{"text":"Foundations of Computational Mathematics","element":"span"},{"text":", 16(5):1367–1396, 2016.","element":"span"}],[{"id":"id-63","text":"[9] Y. Gordon. On milman’s inequality and random subspaces which escape through a mesh in ","element":"span"},{"style":{"height":16.8},"width":936.76,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/27-0.png","element":"img","alt":"Rn. In Geometric Aspects of Functional Analysis","inline":true},{"text":", pages 84–106. Springer, 1988.","element":"span"}],[{"text":"[10] P. Hand and B. Joshi. Global guarantees for blind demodulation with generative priors. In ","element":"span"},{"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", pages 11531–11541, 2019.","element":"span"}],[{"id":"id-61","text":"[11] P. Hand, O. Leong, and V. Voroninski. Phase retrieval under a generative prior. In ","element":"span"},{"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", pages 9136–9146, 2018.","element":"span"}],[{"id":"id-30","text":"[12] P. Hand and V. Voroninski. Global guarantees for enforcing deep generative priors by empirical ","element":"span"},{"text":"risk. In ","element":"span"},{"text":"Proceedings of Machine Learning Research","element":"span"},{"text":", volume 75, pages 1–8, 2018.","element":"span"}],[{"text":"[13] P. Hitczenko, S. Kwapie´n, W. Li, G. Schechtman, T. Schlumprecht, and J. Zinn. Hypercontractivity and comparison of moments of iterated maxima and minima of independent random variables. ","element":"span"},{"text":"Electronic Journal of Probability","element":"span"},{"text":", 3, 1998.","element":"span"}],[{"id":"id-5","text":"[14] D. M. Kane and J. Nelson. Sparser johnson-lindenstrauss transforms. ","element":"span"},{"text":"Journal of the ACM (JACM)","element":"span"},{"text":", 61(1):4, 2014.","element":"span"}],[{"id":"id-13","text":"[15] B. Klartag and S. Mendelson. Empirical processes and random projections. ","element":"span"},{"text":"Journal of Functional Analysis","element":"span"},{"text":", 225(1):229, 2005.","element":"span"}],[{"id":"id-3","text":"[16] Y. Klochkov and N. Zhivotovskiy. Uniform hanson-wright type concentration inequalities for ","element":"span"},{"text":"unbounded entries via the entropy method. ","element":"span"},{"text":"arXiv preprint arXiv:1812.03548","element":"span"},{"text":", 2018.","element":"span"}],[{"id":"id-8","text":"[17] F. Krahmer, D. Needell, and R. Ward. Compressive sensing with redundant dictionaries and ","element":"span"},{"text":"structured measurements. ","element":"span"},{"text":"SIAM Journal on Mathematical Analysis","element":"span"},{"text":", 47(6):4606–4629, 2015.","element":"span"}],[{"text":"[18] F. Krahmer and H. Rauhut. Structured random measurements in signal processing. ","element":"span"},{"text":"GAMMMitteilungen","element":"span"},{"text":", 37(2):217–238, 2014.","element":"span"}],[{"text":"[19] K. G. Larsen and J. Nelson. The johnson-lindenstrauss lemma is optimal for linear dimensionality reduction. In ","element":"span"},{"text":"43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy","element":"span"},{"text":", pages 82:1–82:11, 2016.","element":"span"}],[{"id":"id-10","text":"[20] C. Liaw, A. Mehrabian, Y. Plan, and R. Vershynin. A simple tool for bounding the deviation ","element":"span"},{"text":"of random matrices on geometric sets. In ","element":"span"},{"text":"Geometric Aspects of Functional Analysis","element":"span"},{"text":", pages 277–299. Springer, 2017.","element":"span"}],[{"id":"id-62","text":"[21] A. Lucas, M. Iliadis, R. Molina, and A. K. Katsaggelos. ","element":"span"},{"text":"Using deep neural networks for inverse problems in imaging: beyond analytical methods. ","element":"span"},{"text":"IEEE Signal Processing Magazine","element":"span"},{"text":", 35(1):20–36, 2018.","element":"span"}],[{"id":"id-52","text":"[22] J. Matouˇsek. On variants of the johnson–lindenstrauss lemma. ","element":"span"},{"text":"Random Structures & Algorithms","element":"span"},{"text":", 33(2):142–156, 2008.","element":"span"}],[{"text":"[23] S. Mendelson, A. Pajor, and N. Tomczak-Jaegermann. Reconstruction and subgaussian operators in asymptotic geometric analysis. ","element":"span"},{"text":"Geometric and Functional Analysis","element":"span"},{"text":", 17(4):1248–1282, 2007.","element":"span"}],[{"id":"id-6","text":"[24] S. Oymak and J. A. Tropp. ","element":"span"},{"text":"Universality laws for randomized dimension reduction, with applications. ","element":"span"},{"text":"Information and Inference: A Journal of the IMA","element":"span"},{"text":", 7(3):337–446, 2017.","element":"span"}],[{"id":"id-48","text":"[25] M. Pilanci and M. J. Wainwright. Randomized sketches of convex programs with sharp guar- ","element":"span"},{"text":"antees. ","element":"span"},{"text":"IEEE Transactions on Information Theory","element":"span"},{"text":", 61(9):5096–5115, 2015.","element":"span"}],[{"id":"id-25","text":"[26] B. Robert. ","element":"span"},{"text":"Ash. Information Theory","element":"span"},{"text":". Dover Publications Inc., 1990.","element":"span"}],[{"text":"[27] M. Rudelson and R. Vershynin. Hanson-wright inequality and sub-gaussian concentration. ","element":"span"},{"text":"Electronic Communications in Probability","element":"span"},{"text":", 18, 2013.","element":"span"}],[{"id":"id-9","text":"[28] R. Saab, R. Wang, and ","element":"span"},{"text":"¨","element":"span"},{"text":"O. Yılmaz. ","element":"span"},{"text":"From compressed sensing to compressed bit-streams: practical encoders, tractable decoders. ","element":"span"},{"text":"IEEE Transactions on Information Theory","element":"span"},{"text":", 64(9):6098– 6114, 2018.","element":"span"}],[{"id":"id-28","text":"[29] P.-M. Samson. Concentration of measure inequalities for Markov chains and Φ-mixing pro- ","element":"span"},{"text":"cesses. ","element":"span"},{"text":"The Annals of Probability","element":"span"},{"text":", 28(1):416–461, 2000.","element":"span"}],[{"id":"id-1","text":"[30] G. Schechtman. Two observations regarding embedding subsets of euclidean spaces in normed ","element":"span"},{"text":"spaces. ","element":"span"},{"text":"Advances in Mathematics","element":"span"},{"text":", 200(1):125–135, 2006.","element":"span"}],[{"id":"id-2","text":"[31] M. Talagrand. ","element":"span"},{"text":"Upper and lower bounds for stochastic processes: modern methods and classical problems","element":"span"},{"text":", volume 60. Springer Science & Business Media, 2014.","element":"span"}],[{"text":"[32] R. Vershynin. ","element":"span"},{"text":"High-Dimensional Probability: An Introduction with Applications in Data Science","element":"span"},{"text":". Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018.","element":"span"}],[{"id":"id-7","text":"[33] V. Vu and K. Wang. Random weighted projections, random quadratic forms and random ","element":"span"},{"text":"eigenvectors. ","element":"span"},{"text":"Random Structures & Algorithms","element":"span"},{"text":", 47(4):792–821, 2015.","element":"span"}],[{"id":"id-53","text":"[34] D. P. Woodruff. Sketching as a tool for numerical linear algebra. ","element":"span"},{"style":{"height":16.8},"width":507.2,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/29-0.png","element":"img","alt":" Foundations and Trends R⃝","inline":true,"padRight":true},{"text":"in Theoretical Computer Science","element":"span"},{"text":", 10(1–2):1–157, 2014.","element":"span"}],[{"text":"[35] Y. Yang, M. Pilanci, M. J. Wainwright, et al. Randomized sketches for kernels: Fast and optimal nonparametric regression. ","element":"span"},{"text":"The Annals of Statistics","element":"span"},{"text":", 45(3):991–1023, 2017.","element":"span"}]]},{"heading":"A Properties of ψα-Norm","paragraphs":[[{"style":{"width":"90%"},"width":1700,"height":346,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-0.png","element":"img"}],[{"text":"(d) ","element":"span"},{"style":{"height":23.85},"width":1243.12,"height":59.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-1.png","element":"img","alt":" ∥XY ∥ψα ≤ ∥X∥ψpα∥Y ∥ψqα for p, q ∈ (1, ∞) such that 1p + 1q = 1","inline":true},{"text":". In particular, ","element":"span"},{"style":{"height":18.48},"width":207.76,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-2.png","element":"img","alt":" ∥XY ∥ψ1 ≤","inline":true},{"style":{"height":18.48},"width":257.36,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-3.png","element":"img","alt":"∥X∥ψ2∥Y ∥ψ2;","inline":true}],[{"style":{"width":"78%"},"width":1461,"height":270,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-4.png","element":"img"}],[{"text":"In particular, properties (a) and (b) implies that a random variable is sub-Gaussian (or sub-exponential) if and only if its tail probability is bounded by a Gaussian (or exponential) random variable. Properties (c) and (d) tell us if ","element":"span"},{"text":"X ","element":"span"},{"text":"and ","element":"span"},{"text":"Y ","element":"span"},{"text":"are both sub-Gaussian, then ","element":"span"},{"style":{"height":17.93},"width":316.24,"height":44.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-5.png","element":"img","alt":" X2, Y 2 and XY","inline":true,"padRight":true},{"text":"are all sub-exponential. Property (e) tells us for ","element":"span"},{"style":{"height":16},"width":237.12,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-6.png","element":"img","alt":" p ≥ 1, all p","inline":true},{"text":"-th moments of ","element":"span"},{"text":"X ","element":"span"},{"text":"exist whenever ","element":"span"},{"style":{"height":18.48},"width":123.64,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-7.png","element":"img","alt":"∥X∥ψα","inline":true,"padRight":true},{"text":"is finite. Property (f) tells us we can always center random variables without changing their ","element":"span"},{"style":{"height":16.4},"width":50.32,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-8.png","element":"img","alt":"ψα","inline":true},{"text":"-norm up to a constant factor. Property (g) tells us all sub-Gaussian random variables are also sub-exponential random variables.","element":"span"}],[{"style":{"width":"90%"},"width":1691,"height":488,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-9.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":20.34},"width":1031.24,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-10.png","element":"img","alt":" P(|X|α ≥ u) = P(|X| ≥ u1/α) ≤ 2 exp(−u/K2), we get","inline":true}],[{"style":{"width":"56%"},"width":1049,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-11.png","element":"img"}],[{"text":"(c) This follows from definition.","element":"span"}],[{"style":{"width":"53%"},"width":1003,"height":270,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/30-12.png","element":"img"}],[{"style":{"width":"83%"},"width":1555,"height":251,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-0.png","element":"img"}],[{"text":"Applying Young’s inequality again we have","element":"span"}],[{"style":{"width":"81%"},"width":1521,"height":267,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-1.png","element":"img"}],[{"text":"This shows ","element":"span"},{"style":{"height":18.48},"width":255.36,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-2.png","element":"img","alt":" ∥XY ∥ψα ≤ 1.","inline":true}],[{"text":"(e) Without loss of generality, we can assume ","element":"span"},{"style":{"height":18.48},"width":123.64,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-3.png","element":"img","alt":" ∥X∥ψα","inline":true,"padRight":true},{"text":"= 1. Then by property (a),","element":"span"}],[{"style":{"width":"35%"},"width":657,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-4.png","element":"img"}],[{"text":"With a change of variable ","element":"span"},{"style":{"height":12.8},"width":287.84,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-5.png","element":"img","alt":" u = tα we have","inline":true}],[{"style":{"width":"34%"},"width":649,"height":534,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-6.png","element":"img"}],[{"text":"where Γ(","element":"span"},{"style":{"height":5.6},"width":12,"height":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-7.png","element":"img","alt":"·","inline":true},{"text":") denotes the Gamma function. Note that for ","element":"span"},{"text":"s > ","element":"span"},{"text":"0,","element":"span"}],[{"style":{"width":"67%"},"width":1271,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-8.png","element":"img"}],[{"text":"where we used the fact that ","element":"span"},{"style":{"height":16.9},"width":111.08,"height":42.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-9.png","element":"img","alt":" xse− x2","inline":true,"padRight":true},{"text":"attains maximum at ","element":"span"},{"text":"x ","element":"span"},{"text":"= 2","element":"span"},{"text":"s ","element":"span"},{"text":"because","element":"span"}],[{"style":{"width":"33%"},"width":633,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-10.png","element":"img"}],[{"text":"Therefore","element":"span"}],[{"style":{"width":"76%"},"width":1433,"height":264,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-11.png","element":"img"}],[{"text":"Using property (d) and the fact that ","element":"span"},{"style":{"height":36.18},"width":519.68,"height":90.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-12.png","element":"img","alt":" ∥1∥ψα =� 1log 2�1/αwe have","inline":true}],[{"style":{"width":"42%"},"width":790,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/31-13.png","element":"img"}],[{"text":"This completes the proof with ","element":"span"},{"style":{"height":23.66},"width":390.24,"height":59.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-0.png","element":"img","alt":" C = 1 + 4log 4 ≈ 6.77.","inline":true}],[{"text":"(g) Without loss of generality, we can assume ","element":"span"},{"style":{"height":20.38},"width":121.64,"height":50.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-1.png","element":"img","alt":" ∥X∥ψβ","inline":true,"padRight":true},{"text":"= 1. Then by property (a),","element":"span"}],[{"style":{"width":"89%"},"width":1670,"height":419,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-2.png","element":"img"}],[{"text":"Therefore by property (b) we have","element":"span"}],[{"style":{"width":"69%"},"width":1301,"height":182,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-3.png","element":"img"}]]},{"heading":"B Dependence on Sub-Gaussian Parameter for JL Lemma","paragraphs":[[{"text":"Here we give an example to demonstrate the ","element":"span"},{"style":{"height":18.74},"width":169.6,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-4.png","element":"img","alt":" K2 log K","inline":true,"padRight":true},{"text":"dependence for sample complexity in the JL Lemma (Lemma ","element":"span"},{"href":"#id-51","text":"5.1) ","element":"a"},{"text":"is optimal for small ","element":"span"},{"style":{"height":12.8},"width":144.32,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-5.png","element":"img","alt":" ε and δ","inline":true},{"text":". This example is virtually the same as the one in Proposition ","element":"span"},{"href":"#id-15","text":"4.5, ","element":"a"},{"text":"however, this result is not implied by Proposition ","element":"span"},{"href":"#id-15","text":"4.5 ","element":"a"},{"text":"as the latter does not guarantee such dependence on ","element":"span"},{"text":"K ","element":"span"},{"text":"is optimal when ","element":"span"},{"style":{"height":12.8},"width":190.08,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-6.png","element":"img","alt":" ε is small.","inline":true}],[{"text":"Proposition B.1. ","element":"span"},{"text":"Suppose random matrix ","element":"span"},{"style":{"height":13.94},"width":218.76,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-7.png","element":"img","alt":" A ∈ Rm×n ","inline":true,"padRight":true},{"text":"has symmetric i.i.d. ","element":"span"},{"text":"entries ","element":"span"},{"style":{"height":17.89},"width":169.44,"height":44.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-8.png","element":"img","alt":" Aij such","inline":true,"padRight":true},{"text":"that ","element":"span"},{"style":{"height":24.93},"width":888.88,"height":62.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-9.png","element":"img","alt":" A2ij ∼ L2m · Bernoulli�L−2�. Assume L ≥ 2","inline":true,"padRight":true},{"text":"and denote ","element":"span"},{"text":"K ","element":"span"},{"text":"the positive number such that ","element":"span"},{"style":{"height":18.73},"width":276.64,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-10.png","element":"img","alt":"L2 = K2 log K","inline":true},{"text":". Also denote ","element":"span"},{"style":{"height":19.53},"width":336.48,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-11.png","element":"img","alt":" e1 = (1, 0, . . . , 0)T ","inline":true,"padRight":true},{"text":". If for any fixed ","element":"span"},{"style":{"height":21.26},"width":216.2,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-12.png","element":"img","alt":" ε, δ ∈ (0, 15)","inline":true},{"text":", the probability bound","element":"span"}],[{"style":{"width":"61%"},"width":1157,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-13.png","element":"img"}],[{"text":"(a) ","element":"span"},{"style":{"height":17.6},"width":107.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-14.png","element":"img","alt":"√mA","inline":true,"padRight":true},{"text":"is an isotropic and sub-Gaussian matrix with sub-Gaussian parameter being no more than ","element":"span"},{"text":"K","element":"span"},{"text":".","element":"span"}],[{"style":{"width":"67%"},"width":1265,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-15.png","element":"img"}],[{"text":"Proof. ","element":"span"},{"text":"Recall that ","element":"span"},{"style":{"height":18.48},"width":256.96,"height":46.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-16.png","element":"img","alt":" ∥Aij∥ψ2 ≤ K","inline":true},{"text":", so part (a) is straightforward to verify. Now we proceed with proof for part (b). Notice that","element":"span"}],[{"style":{"width":"99%"},"width":1870,"height":489,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/32-17.png","element":"img"}],[{"text":"Since","element":"span"},{"style":{"height":24.42},"width":310.08,"height":61.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/33-0.png","element":"img","alt":"√8k ≤ 4√mL and","inline":true}],[{"style":{"width":"63%"},"width":1183,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/33-1.png","element":"img"}],[{"text":"we get","element":"span"}],[{"style":{"width":"25%"},"width":469,"height":95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/33-2.png","element":"img"}],[{"text":"Therefore","element":"span"}],[{"style":{"width":"99%"},"width":1869,"height":164,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/33-3.png","element":"img"}]]},{"heading":"C A Few Inequalities","paragraphs":[[{"text":"Here we list and prove the non-standard inequalities used in our proofs.","element":"span"}],[{"style":{"width":"76%"},"width":1439,"height":248,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/33-4.png","element":"img"}],[{"style":{"height":19.13},"width":751.52,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/33-5.png","element":"img","alt":"Proof. (a) From e−2x ≥ 1 − 2x we have","inline":true}],[{"style":{"width":"83%"},"width":1562,"height":649,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/33-6.png","element":"img"}],[{"text":"Taking exponential we get ","element":"span"},{"style":{"height":25.02},"width":430.08,"height":62.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.10631/images/33-7.png","element":"img","alt":" αe−x < 2 exp(− x log 2log α ).","inline":true}]]}],"_version":"3.3.2"},"paperNode":"$28:props:children:props:children:0:props:product"}]]