36:[["$","audio",null,{"id":"tts"}],["$","$L3b",null,{"paperID":"2001.07883","publisher":"arxiv","paperJSON":{"title":"Learning functions varying along a central subspace","paperID":"2001.07883","avgLineHeight":13.55,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"Many functions of interest are in a high-dimensional space but exhibit low-dimensional structures. This paper studies regression of a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s","element":"span"},{"text":"-H¨older function ","element":"span"},{"style":{"height":16.59},"width":139.84,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/0-0.png","element":"img","alt":" f in RD ","inline":true,"padRight":true},{"text":"which varies along a central subspace of dimension ","element":"span"},{"style":{"height":12},"width":263.18,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/0-1.png","element":"img","alt":" d while d ≪ D","inline":true},{"text":". A direct approximation of ","element":"span"},{"style":{"height":16.58},"width":239.94,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/0-2.png","element":"img","alt":" f in RD with","inline":true,"padRight":true},{"text":"an ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/0-3.png","element":"img","alt":" ε","inline":true,"padRight":true},{"text":"accuracy requires the number of samples ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"in the order of ","element":"span"},{"style":{"height":14.18},"width":182.02,"height":35.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/0-4.png","element":"img","alt":" ε−(2s+D)/s","inline":true},{"text":". In this paper, we analyze the Generalized Contour Regression (GCR) algorithm for the estimation of the central subspace and use piecewise polynomials for function approximation. GCR is among the best estimators for the central subspace, but its sample complexity is an open question. We prove that GCR leads to a mean squared estimation error of ","element":"span"},{"style":{"height":17.39},"width":111.83,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/0-5.png","element":"img","alt":" O(n−1","inline":true},{"text":") for the central subspace, if a variance quantity is exactly known. The estimation error of this variance quantity is also given in this paper. The mean squared regression error of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is proved to be in the order of (","element":"span"},{"style":{"height":16},"width":173.02,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/0-6.png","element":"img","alt":"n/ log n)−","inline":true},{"style":{"height":5.6},"width":26.56,"height":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/0-7.png","element":"img","alt":"2s","inline":true},{"style":{"height":7.2},"width":62.39,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/0-8.png","element":"img","alt":"2s+d","inline":true,"padRight":true},{"text":"where the exponent depends on the dimension of the central subspace ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"instead of the ambient space ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D","element":"span"},{"text":". This result demonstrates that GCR is effective in learning the low-dimensional central subspace. We also propose a modified GCR with improved efficiency. The convergence rate is validated through several numerical experiments.","element":"span"}]]},{"heading":"1 Introduction","paragraphs":[[{"text":"A vast majority of statistical inference and machine learning problems can be modeled as regression, where the goal is to estimate an unknown function from a finite number of training samples. ","element":"span"},{"text":"Nowadays, new challenges are introduced to regression and prediction due to the rise of high-dimensional data in many fields of contemporary science. The well-known curse of dimensionality implies that, in order to achieve a fixed accuracy in prediction, the number of training data must grow exponentially with respect to the data dimension, which is beyond practical applications.","element":"span"}],[{"text":"Fortunately, functions of interest in applications often exhibit low-dimensional structures. In many situations, the response may depend on few variables, or a low-dimensional subspace. For example, in Bioinformatics, the cDNA microarray data have thousands of dimension but an effective classification of tumor types only depends on a subspace of dimension one or two ","element":"span"},{"href":"#id-0","referenceIndex":4,"text":"[4]","element":"a"},{"text":". In engineering, the coefficients of certain elliptic partial differential equations are parameterized by many variables but has a small effective dimension ","element":"span"},{"href":"#id-1","referenceIndex":8,"text":"[8]","element":"a"},{"text":". In photovoltaic industry, there are five input parameters in the single-diode solar cell model while the maximum power output only depends on a linear combination of these five parameters ","element":"span"},{"href":"#id-2","referenceIndex":9,"text":"[9]","element":"a"},{"text":". ","element":"span"},{"text":"Similar low-dimensional models appear in optimization ","element":"span"},{"href":"#id-3","referenceIndex":22,"text":"[22]","element":"a"},{"text":", optimal control ","element":"span"},{"href":"#id-4","referenceIndex":55,"text":"[55]","element":"a"},{"text":", uncertainty quantification ","element":"span"},{"href":"#id-5","referenceIndex":2,"text":"[2]","element":"a"},{"text":", text classification ","element":"span"},{"href":"#id-6","referenceIndex":30,"text":"[30] ","element":"a"},{"text":"and biomedicine ","element":"span"},{"href":"#id-7","referenceIndex":43,"text":"[43, ","element":"a"},{"href":"#id-8","referenceIndex":44,"text":"44]","element":"a"},{"text":".","element":"span"}],[{"id":"id-40","text":"These applications motivate us to consider the regression model","element":"span"}],[{"style":{"width":"99%"},"width":1801,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/0-9.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.13},"width":809.84,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-0.png","element":"img","alt":" x ∈ RD, Φ ∈ RD×d, g : Rd → R and ξi ∈ R","inline":true},{"text":". The columns of Φ consist of an orthonormal basis of a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"-dimensional subspace, i.e., Φ","element":"span"},{"style":{"height":14.84},"width":155.22,"height":37.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-1.png","element":"img","alt":"⊤Φ = Id","inline":true},{"text":". The random variable ","element":"span"},{"style":{"height":16.4},"width":31.09,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-2.png","element":"img","alt":" ξi","inline":true,"padRight":true},{"text":"models noise, which is independent of the ","element":"span"},{"style":{"height":10.62},"width":38.48,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-3.png","element":"img","alt":" xi","inline":true},{"text":"’s. In this model, the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is defined on ","element":"span"},{"style":{"height":15.13},"width":59.52,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-4.png","element":"img","alt":" RD ","inline":true,"padRight":true},{"text":"but only depends on the projection of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"to the subspace spanned by Φ, denoted by ","element":"span"},{"style":{"height":15.1},"width":217.77,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-5.png","element":"img","alt":" SΦ. Let SΦ","inline":true,"padRight":true},{"text":"be such a space of minimum dimension: ","element":"span"},{"style":{"height":17.6},"width":116.87,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-6.png","element":"img","alt":" g(v⊤z","inline":true},{"text":") is not a constant function for any ","element":"span"},{"style":{"height":15.1},"width":514.33,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-7.png","element":"img","alt":" v ∈ SΦ. Then SΦ is called","inline":true,"padRight":true},{"text":"the central subspace ","element":"span"},{"href":"#id-9","referenceIndex":10,"text":"[10]","element":"a"},{"text":", or the Effective Dimension-Reduction subspace ","element":"span"},{"href":"#id-10","referenceIndex":37,"text":"[37, ","element":"a"},{"href":"#id-11","referenceIndex":14,"text":"14, ","element":"a"},{"href":"#id-12","referenceIndex":54,"text":"54]","element":"a"},{"text":". In other words, the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"only varies along the central subspace, and remains the same along any orthogonal direction of the central subspace. This model is also called the single-index model for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"= 1 and the multi-index model for ","element":"span"},{"style":{"height":14.8},"width":114.72,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-8.png","element":"img","alt":" d ≥ 2.","inline":true}],[{"style":{"width":"95%"},"width":1728,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-9.png","element":"img"}],[{"text":"where the ","element":"span"},{"style":{"height":10.62},"width":38.49,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-10.png","element":"img","alt":" xi","inline":true},{"text":"’s are independently drawn from a probability measure ","element":"span"},{"style":{"height":19.13},"width":433.23,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-11.png","element":"img","alt":" ρ in RD. Given data","inline":true,"padRight":true},{"text":"and the a prior knowledge that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"varies along a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"-dimensional central subspace, we aim at constructing an empirical estimator ","element":"span"},{"style":{"height":19.13},"width":224.57,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-12.png","element":"img","alt":"�f : RD → R","inline":true,"padRight":true},{"text":"and studying the Mean Squared Error (MSE) ","element":"span"},{"style":{"height":20.99},"width":459.4,"height":52.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-13.png","element":"img","alt":"E{(xi,yi)}ni=1∥ �f − f∥L2(ρ).","inline":true}],[{"text":"The central interest of this problem is to estimate the central subspace Φ. ","element":"span"},{"text":"A class of methods is based on the gradient ","element":"span"},{"style":{"height":17.6},"width":739.57,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-14.png","element":"img","alt":" ∇f(x) = Φ∇g(Φ⊤x). When d = 1, ∇f","inline":true,"padRight":true},{"text":"is proportional to the Φ direction, which allows one to estimate Φ from average of the empirical gradients ","element":"span"},{"href":"#id-13","referenceIndex":46,"text":"[46, ","element":"a"},{"href":"#id-14","referenceIndex":24,"text":"24, ","element":"a"},{"href":"#id-15","referenceIndex":25,"text":"25]","element":"a"},{"text":". When ","element":"span"},{"style":{"height":14.8},"width":68.84,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-15.png","element":"img","alt":" d ≥","inline":true,"padRight":true},{"text":"2, the covariance matrix of ","element":"span"},{"style":{"height":16.4},"width":61.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-16.png","element":"img","alt":" ∇f","inline":true,"padRight":true},{"text":"is given by Φ","element":"span"},{"style":{"height":19.6},"width":748.29,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-17.png","element":"img","alt":"�∇g(Φ⊤x)[∇g(Φ⊤x)]⊤ρ(dx)Φ⊤, which","inline":true,"padRight":true},{"text":"gives an estimate of Φ as the eigenspace associated with the top ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"eigenvalues of this covariance matrix ","element":"span"},{"href":"#id-16","referenceIndex":28,"text":"[28, ","element":"a"},{"href":"#id-1","referenceIndex":8,"text":"8]","element":"a"},{"text":". If the gradient can be accurately estimated, these gradient-based methods lead to a ","element":"span"},{"style":{"height":17.6},"width":62.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-18.png","element":"img","alt":"√n","inline":true},{"text":"-consistent estimation of Φ ","element":"span"},{"href":"#id-15","referenceIndex":25,"text":"[25, ","element":"a"},{"href":"#id-1","referenceIndex":8,"text":"8]","element":"a"},{"text":". In general, the gradient estimation in ","element":"span"},{"style":{"height":18.73},"width":227.58,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-19.png","element":"img","alt":" RD requires","inline":true,"padRight":true},{"text":"an exponentially large number of samples in dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D","element":"span"},{"text":".","element":"span"}],[{"text":"The single-index model with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"= 1 has been extensively studied in literature. In this case, ","element":"span"},{"style":{"height":16},"width":206.3,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-20.png","element":"img","alt":"g : R → R","inline":true,"padRight":true},{"text":"is called the link function. The minimax mean squared regression error while the link function belongs to the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s","element":"span"},{"text":"-H¨older class was proved to be ","element":"span"},{"style":{"height":20.33},"width":255.58,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-21.png","element":"img","alt":" O(n−2s/(2s+1)","inline":true},{"text":") ","element":"span"},{"href":"#id-17","referenceIndex":20,"text":"[20, ","element":"a"},{"href":"#id-18","referenceIndex":33,"text":"33, ","element":"a"},{"href":"#id-19","referenceIndex":6,"text":"6]","element":"a"},{"text":". These results demonstrate that the optimal algorithm can automatically adapt to the central subspace of dimension 1. For estimation, a lot of methods based on non-convex optimization have been proposed ","element":"span"},{"href":"#id-18","referenceIndex":33,"text":"[33, ","element":"a"},{"href":"#id-20","referenceIndex":23,"text":"23, ","element":"a"},{"href":"#id-21","referenceIndex":29,"text":"29, ","element":"a"},{"href":"#id-22","referenceIndex":53,"text":"53] ","element":"a"},{"text":"while achieving the global minimum is not guaranteed. It was proved in ","element":"span"},{"href":"#id-20","referenceIndex":23,"text":"[23, ","element":"a"},{"text":"Chapter 22] that minimizing the empirical risk over the central subspace of dimension 1 and approximating ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"with piecewise polynomials simultaneously give rise to the MSE of ","element":"span"},{"style":{"height":20.33},"width":408.44,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-22.png","element":"img","alt":"O((n/ log n)−2s/(2s+1)","inline":true},{"text":"). The main challenge of this approach is to obtain the global minimum due to non-convexity in the optimization. ","element":"span"},{"text":"In the context of regression with point queries, an adaptive query algorithm was proposed for the single-index model in ","element":"span"},{"href":"#id-23","referenceIndex":7,"text":"[7] ","element":"a"},{"text":"with performance guarantees. This algorithm was generalized to the multi-index model in ","element":"span"},{"href":"#id-24","referenceIndex":18,"text":"[18]","element":"a"},{"text":". In the standard regression setting, adaptive queries are not allowed, and the samples are usually given before learning starts.","element":"span"}],[{"text":"Estimating the central subspace is related with sufficient dimension reduction in statistics. A class of methods related with inverse regression has been developed to estimate the central subspace Φ (see ","element":"span"},{"href":"#id-25","referenceIndex":41,"text":"[41, ","element":"a"},{"href":"#id-26","referenceIndex":34,"text":"34] ","element":"a"},{"text":"for a comprehensive review). Sliced Inverse Regression (SIR) ","element":"span"},{"href":"#id-27","referenceIndex":38,"text":"[38] ","element":"a"},{"text":"is the first and most well known method. The term “inverse regression” refers to the conditional expectation ","element":"span"},{"text":"E","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"y","element":"span"},{"text":"). ","element":"span"},{"text":"In SIR, the central subspace is estimated as the eigenspace associated with the top ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"eigenvalues of Cov(","element":"span"},{"style":{"height":17.6},"width":256.45,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/1-23.png","element":"img","alt":"E(x|y) − E(x","inline":true},{"text":")). Similar techniques include kernel inverse regression ","element":"span"},{"href":"#id-28","referenceIndex":56,"text":"[56]","element":"a"},{"text":", parametric inverse regression ","element":"span"},{"href":"#id-29","referenceIndex":3,"text":"[3] ","element":"a"},{"text":"and canonical correlation estimator ","element":"span"},{"href":"#id-30","referenceIndex":19,"text":"[19]","element":"a"},{"text":". These methods are referred as first-order methods since the first-order statistical information is utilized, such as the conditional mean. Their performance crucially depends on the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"and the distribution of data. If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is symmetric about ","element":"span"},{"style":{"fontWeight":"bold"},"text":"0 ","element":"span"},{"text":"along some directions, then those directions can not be recovered by the first-order methods. This issue about symmetry is better addressed by second-order methods which utilize the variance and covariance information of data. Popular second-order methods include sliced inverse variance estimation ","element":"span"},{"href":"#id-31","referenceIndex":11,"text":"[11]","element":"a"},{"text":", sliced average variance estimation (SAVE) ","element":"span"},{"href":"#id-32","referenceIndex":12,"text":"[12, ","element":"a"},{"href":"#id-33","referenceIndex":16,"text":"16]","element":"a"},{"text":", average variance estimation (MAVE) ","element":"span"},{"href":"#id-22","referenceIndex":53,"text":"[53]","element":"a"},{"text":", contour regression ","element":"span"},{"href":"#id-34","referenceIndex":36,"text":"[36]","element":"a"},{"text":", directional regression ","element":"span"},{"href":"#id-35","referenceIndex":35,"text":"[35]","element":"a"},{"text":", principal Hessian directions ","element":"span"},{"href":"#id-36","referenceIndex":39,"text":"[39]","element":"a"},{"text":", a hybrid SIR and SAVE ","element":"span"},{"href":"#id-37","referenceIndex":57,"text":"[57]","element":"a"},{"text":", etc. New methods are also developed in a multiscale framework ","element":"span"},{"href":"#id-38","referenceIndex":32,"text":"[32, ","element":"a"},{"href":"#id-39","referenceIndex":31,"text":"31]","element":"a"},{"text":". For the single-index model, Smallest Vector Regression (SVR) is proposed in ","element":"span"},{"href":"#id-38","referenceIndex":32,"text":"[32] ","element":"a"},{"text":"in a multiscale framework, which combines the idea of SIR and SAVE. A performance analysis is provided for the index estimation and regression, while the assumptions are favorable to monotonic ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":".","element":"span"}],[{"text":"This paper focuses on a second-order method called Generalized Contour Regression (GCR) introduced by Li, Zha and Chiaromonte ","element":"span"},{"href":"#id-34","referenceIndex":36,"text":"[36]","element":"a"},{"text":". In comparison with SIR and SAVE mentioned above, GCR has advantages in the case where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is not monotonic. Empirical experiments have demonstrated the success of GCR for the estimation of the central subspace, but its sample complexity is still not well understood yet. In this paper, we analyze the error behavior of the regression scheme which consists of GCR for the central subspace estimation and piecewise polynomial approximation of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":". We also propose a modified GCR with improved efficiency. Our contributions are:","element":"span"}],[{"text":"(i) We prove that GCR estimates the central subspace with a mean squared estimation error of ","element":"span"},{"style":{"height":19.13},"width":121,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-0.png","element":"img","alt":" O(n−1","inline":true},{"text":"), if a variance quantity is exactly known. The estimation error of this variance quantity is also given in this paper.","element":"span"}],[{"text":"(ii) Our regression scheme gives rise to the MSE of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"in the order of (","element":"span"},{"style":{"height":17.6},"width":188.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-1.png","element":"img","alt":"n/ log n)−","inline":true},{"style":{"height":6.4},"width":28.61,"height":16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-2.png","element":"img","alt":"2s","inline":true},{"style":{"height":12.8},"width":198.43,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-3.png","element":"img","alt":"2s+d . This","inline":true,"padRight":true},{"text":"demonstrates that the MSE decays exponentially with an exponent depending on the dimension of the central subspace ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":", instead of the ambient dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D","element":"span"},{"text":".","element":"span"}],[{"text":"(iii) Our modified GCR improves over the original GCR in efficiency. Numerical experiments demonstrate that the modified GCR has the same convergence rate as that of the original GCR in our statistical theory.","element":"span"}],[{"text":"This paper is organized as follows: We first introduce SCR and GCR in Section ","element":"span"},{"text":"2. ","element":"span"},{"text":"Our regression scheme and main results are stated in Section ","element":"span"},{"text":"3. ","element":"span"},{"text":"Numerical experiments are provided in Section ","element":"span"},{"text":"4 ","element":"span"},{"text":"to validate our theory. We present proofs in Section ","element":"span"},{"text":"5 ","element":"span"},{"text":"and conclude in Section ","element":"span"},{"text":"6.","element":"span"}],[{"text":"We use lowercase bold letters and capital letters to denote vectors and matrices respectively. ","element":"span"},{"style":{"height":17.6},"width":70.3,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-4.png","element":"img","alt":"∥x∥","inline":true,"padRight":true},{"text":"is the Euclidean norm of the vector ","element":"span"},{"style":{"height":17.6},"width":204.42,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-5.png","element":"img","alt":" x and ∥A∥","inline":true,"padRight":true},{"text":"is the spectral norm of the matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":". For two square matrices ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"of the same size, ","element":"span"},{"style":{"height":15.2},"width":406.52,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-6.png","element":"img","alt":" A ⪯ B means B − A","inline":true,"padRight":true},{"text":"is positive semi-definite. We use ","element":"span"},{"style":{"height":14.62},"width":48.51,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-7.png","element":"img","alt":" N0","inline":true,"padRight":true},{"text":"to denote nonnegative integers. For a function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":", supp(","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":") denotes the support of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":". For ","element":"span"},{"style":{"height":17.6},"width":207.17,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-8.png","element":"img","alt":" x ∈ R, ⌊x⌋","inline":true,"padRight":true},{"text":"denotes the largest integer that is less than or equal to ","element":"span"},{"style":{"height":17.6},"width":191.45,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-9.png","element":"img","alt":" x and ⌈x⌉","inline":true,"padRight":true},{"text":"denotes the smallest integer that is greater than or equal to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":". Throughout the paper, we use ","element":"span"},{"style":{"height":20.31},"width":210.46,"height":50.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-10.png","element":"img","alt":" SΦ and S⊥Φ","inline":true,"padRight":true},{"text":"to denote the central subspace and its orthogonal complement, respectively.","element":"span"}]]},{"heading":"2 Central subspace and contour regression","paragraphs":[[{"text":"The model in ","element":"span"},{"href":"#id-40","text":"(1) ","element":"a"},{"text":"has an ambiguity since the solutions are not unique. ","element":"span"},{"text":"The columns of Φ form an orthonormal basis of the central subspace, while the choice of orthonormal basis is not unique. This gives rise to an ambiguity between the basis Φ and the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":", which can be characterized by the following lemma:","element":"span"}],[{"id":"id-41","style":{"fontWeight":"bold"},"text":"Lemma 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider the model in ","element":"span"},{"href":"#id-40","style":{"fontStyle":"italic"},"text":"(1) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"where the columns of ","element":"span"},{"text":"Φ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"form an orthonormal basis of the central subspace. For any orthogonal matrix ","element":"span"},{"style":{"height":19.53},"width":871.55,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-11.png","element":"img","alt":" Q ∈ Rd×d, let �Φ = ΦQ and �g(z) = g(Qz) for","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"any ","element":"span"},{"style":{"height":15.93},"width":128.18,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-12.png","element":"img","alt":" z ∈ Rd","inline":true},{"style":{"fontStyle":"italic"},"text":". Then the columns of ","element":"span"},{"style":{"height":12},"width":32,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-13.png","element":"img","alt":"�Φ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"form another orthonormal basis of the central subspace and ","element":"span"},{"style":{"height":17.6},"width":356.5,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-14.png","element":"img","alt":"g(Φ⊤x) = �g(�Φ⊤x).","inline":true}],[{"text":"Lemma ","element":"span"},{"href":"#id-41","text":"1 ","element":"a"},{"text":"shows that the representation of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is not unique. If another set of orthonormal basis is picked, the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"changes accordingly. In this paper, we aim to recover one set of orthonormal basis ","element":"span"},{"style":{"height":0},"width":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-15.png","element":"img","alt":"�","inline":true},{"text":"Φ and the corresponding function ","element":"span"},{"style":{"height":12},"width":22.28,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-16.png","element":"img","alt":" �g","inline":true},{"text":". The subspace estimation error and regression error are represented by ","element":"span"},{"style":{"height":19.62},"width":403.24,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-17.png","element":"img","alt":" ∥Proj�Φ − ProjΦ∥ and","inline":true}],[{"style":{"width":"72%"},"width":1315,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/2-18.png","element":"img"}],[{"text":"respectively, which are invariant to the choice of orthonormal basis. After a change of basis, we can assume that the columns of ","element":"span"},{"text":"Φ are carefully chosen such that","element":"span"}],[{"id":"id-117","style":{"width":"66%"},"width":1200,"height":60,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-0.png","element":"img"}],[{"text":"according to ","element":"span"},{"href":"#id-42","referenceIndex":5,"text":"[5, ","element":"a"},{"text":"Theorem 2.2.1].","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"2.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Simple contour regression","element":"span"}],[{"text":"We start with Simple Contour Regression (SCR) ","element":"span"},{"href":"#id-34","referenceIndex":36,"text":"[36] ","element":"a"},{"text":"which utilizes the fact that the contour directions of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"are orthogonal to the central subspace. Let ","element":"span"},{"style":{"height":10.4},"width":82.48,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-1.png","element":"img","alt":" α >","inline":true,"padRight":true},{"text":"0 and define the following conditional covariance matrix:","element":"span"}],[{"style":{"width":"43%"},"width":784,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-2.png","element":"img"}],[{"text":"where (˜","element":"span"},{"style":{"height":17.6},"width":270.07,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-3.png","element":"img","alt":"x, ˜y) and (x, y","inline":true},{"text":") are two independent samples.","element":"span"}],[{"text":"When ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"= 1 and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is a monotonic function, we expect many of the (˜","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":") pairs satisfying ","element":"span"},{"style":{"height":17.6},"width":675.29,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-4.png","element":"img","alt":"|˜y − y| ≤ α will have |Φ⊤˜x − Φ⊤x|","inline":true,"padRight":true},{"text":"small while ","element":"span"},{"style":{"height":17.6},"width":263.62,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-5.png","element":"img","alt":" |w⊤˜x − w⊤x|","inline":true,"padRight":true},{"text":"can be arbitrarily large for any ","element":"span"},{"style":{"height":20.3},"width":145.96,"height":50.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-6.png","element":"img","alt":"w ∈ S⊥Φ ","inline":true,"padRight":true},{"text":". In this case, most of the ˜","element":"span"},{"style":{"height":8},"width":106.82,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-7.png","element":"img","alt":"x − x","inline":true,"padRight":true},{"text":"directions satisfying ","element":"span"},{"style":{"height":17.6},"width":209.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-8.png","element":"img","alt":" |˜y − y| ≤ α","inline":true,"padRight":true},{"text":"are aligned with ","element":"span"},{"style":{"height":20.3},"width":70.04,"height":50.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-9.png","element":"img","alt":" S⊥Φ .","inline":true}],[{"text":"For example, in Figure ","element":"span"},{"href":"#id-43","text":"1(","element":"a"},{"text":"a), ","element":"span"},{"style":{"height":17.6},"width":852.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-10.png","element":"img","alt":" f(x) = ex1 with x = (x1, x2) ∈ [−2, 2] × [−2,","inline":true,"padRight":true},{"text":"2]. This function can be expressed as Model ","element":"span"},{"href":"#id-40","text":"(1) ","element":"a"},{"text":"with Φ = [1","element":"span"},{"style":{"height":17.6},"width":596.61,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-11.png","element":"img","alt":", 0]⊤, w = [0, 1]⊤, and g(z) = ez","inline":true},{"text":". Consider two inputs ","element":"span"},{"style":{"height":17.6},"width":539.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-12.png","element":"img","alt":"p = (p1, p2) and �p = (�p1, �p2","inline":true},{"text":") satisfying ","element":"span"},{"style":{"height":19.13},"width":1034.46,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-13.png","element":"img","alt":" |f(p) − f(�p)| ≤ α. Since e−2|p1 − �p1| ≤ |ep1 − e�p1| =","inline":true},{"style":{"height":17.6},"width":258.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-14.png","element":"img","alt":"|f(p) − f(�p)|","inline":true},{"text":", the condition ","element":"span"},{"style":{"height":19.13},"width":1239.48,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-15.png","element":"img","alt":" |f(p) − f(�p)| ≤ α implies |Φ⊤(�p − p)| = |p1 − �p1| < e2α. On","inline":true,"padRight":true},{"text":"the other hand, ","element":"span"},{"style":{"height":17.6},"width":577.01,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-16.png","element":"img","alt":" |w⊤(�p − p)| = |p2 − �p2| ∈ [0,","inline":true,"padRight":true},{"text":"4], which is independent of ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-17.png","element":"img","alt":" α","inline":true},{"text":". In Figure ","element":"span"},{"href":"#id-43","text":"1(","element":"a"},{"text":"b), 200 samples are uniformly drawn from [","element":"span"},{"style":{"height":17.6},"width":636.95,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-18.png","element":"img","alt":"−0.5, 0.5] × [−0.5, 0.5]. The (x, �x","inline":true},{"text":") pair is connected if ","element":"span"},{"style":{"height":17.6},"width":332.21,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-19.png","element":"img","alt":"|f(x)−f(�x)| ≤ 0.","inline":true},{"text":"01. We observe that, almost all connections by SCR in Figure ","element":"span"},{"href":"#id-43","text":"1(","element":"a"},{"text":"b) are aligned with the ","element":"span"},{"style":{"fontWeight":"bold"},"text":"w ","element":"span"},{"text":"direction.","element":"span"}],[{"text":"SCR estimates the central subspace from the smallest ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"eigenvectors of ","element":"span"},{"style":{"height":17.6},"width":85.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-20.png","element":"img","alt":" K(α","inline":true},{"text":"). The success of SCR is guaranteed with the following condition:","element":"span"}],[{"id":"id-45","style":{"fontWeight":"bold"},"text":"Condition 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There exists ","element":"span"},{"style":{"height":14.22},"width":139.84,"height":35.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-21.png","element":"img","alt":" αc > 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that for any ","element":"span"},{"style":{"height":17.6},"width":216.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-22.png","element":"img","alt":" α ∈ (0, αc)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and unit vectors ","element":"span"},{"style":{"height":15.6},"width":161.47,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-23.png","element":"img","alt":" v ∈ SΦ,","inline":true},{"style":{"height":20.31},"width":145.97,"height":50.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-24.png","element":"img","alt":"w ∈ S⊥Φ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":", the following holds","element":"span"}],[{"style":{"width":"62%"},"width":1135,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-25.png","element":"img"}],[{"text":"We define an elliptical distribution ","element":"span"},{"href":"#id-26","referenceIndex":34,"text":"[34] ","element":"a"},{"text":"as","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 1 ","element":"span"},{"text":"(Elliptical distribution)","element":"span"},{"style":{"height":16},"width":131.37,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-26.png","element":"img","alt":". Let ρ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a probability measure in ","element":"span"},{"style":{"height":15.14},"width":59.52,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-27.png","element":"img","alt":" RD ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with density function ","element":"span"},{"style":{"height":16.4},"width":229.4,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-28.png","element":"img","alt":" h. We say ρ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"has an elliptical distribution if there exists a positive-definite matrix ","element":"span"},{"style":{"height":15.93},"width":200.65,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-29.png","element":"img","alt":" A ∈ RD×D","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that","element":"span"}],[{"style":{"width":"17%"},"width":311,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-30.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for some function ","element":"span"},{"style":{"height":16.8},"width":659.1,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-31.png","element":"img","alt":" κ : R → R. When A = I, we say ρ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"has a spherical distribution.","element":"span"}],[{"text":"It is known that if ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"has a spherical distribution, then ","element":"span"},{"text":"E","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":") = ","element":"span"},{"style":{"fontWeight":"bold"},"text":"0 ","element":"span"},{"text":"and Cov(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":") = ","element":"span"},{"style":{"fontStyle":"italic"},"text":"aI ","element":"span"},{"text":"for some ","element":"span"},{"style":{"fontStyle":"italic"},"text":"a > ","element":"span"},{"text":"0, see ","element":"span"},{"href":"#id-44","referenceIndex":17,"text":"[17, ","element":"a"},{"text":"Theorem 2.5 and 2.7]. Condition ","element":"span"},{"href":"#id-45","text":"1 ","element":"a"},{"text":"together with an spherical distribution of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"guarantee that the eigenvectors associated with the smallest ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"eigenvalues of ","element":"span"},{"href":"#id-34","referenceIndex":36,"style":{"height":17.6},"width":348.53,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-32.png","element":"img","alt":" K(α) span SΦ [36,","inline":true,"padRight":true},{"text":"Theorem 2.1].","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"2.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Generalized contour regression","element":"span"}],[{"text":"Condition ","element":"span"},{"href":"#id-45","text":"1 ","element":"a"},{"text":"does not necessarily hold when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"= 1 and the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is not monotonic, as well as when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d > ","element":"span"},{"text":"1. For example in Figure ","element":"span"},{"href":"#id-46","text":"2 ","element":"a"},{"text":"(a), ","element":"span"},{"style":{"height":19.41},"width":916.69,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-33.png","element":"img","alt":" f(x) = x21 with x = (x1, x2) ∈ [−2, 2] × [−2, 2].","inline":true,"padRight":true},{"text":"This function can be expressed in Model ","element":"span"},{"href":"#id-40","text":"(1) ","element":"a"},{"text":"with Φ = [1","element":"span"},{"style":{"height":19.13},"width":707.02,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-34.png","element":"img","alt":", 0]⊤, w = [0, 1]⊤ and g(z) = z2. Let","inline":true},{"style":{"height":17.6},"width":544.05,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-35.png","element":"img","alt":"p = (p1, p2) and �p = (�p1, �p2","inline":true},{"text":") be two inputs satisfying ","element":"span"},{"style":{"height":17.6},"width":745.46,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/3-36.png","element":"img","alt":" p1 > 0, �p1 < 0, and |f(p) − f(�p)| ≤ α","inline":true}],[{"id":"id-43","style":{"width":"96%"},"width":1749,"height":586,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-0.png","element":"img"}],[{"text":"Figure 1: (a) Function ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":279.7,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-1.png","element":"img","alt":" f(x1, x2) = ex1","inline":true},{"text":". When 200 samples are uniformly drawn in [","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":218.91,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-2.png","element":"img","alt":"−0.5, 0.5] ×","inline":true,"padRight":true},{"text":"[","element":"figcaption","subtype":"caption"},{"style":{"height":15.2},"width":142.91,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-3.png","element":"img","alt":"−0.5, 0.","inline":true},{"text":"5], (b) and (c) display the connected (","element":"figcaption","subtype":"caption"},{"style":{"height":11.2},"width":72.88,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-4.png","element":"img","alt":"x, �x","inline":true},{"text":") pairs by SCR (b) and GCR (c), respectively. An (","element":"figcaption","subtype":"caption"},{"style":{"height":11.2},"width":72.88,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-5.png","element":"img","alt":"x, �x","inline":true},{"text":") pair is connected by SCR in (b) if ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":375.47,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-6.png","element":"img","alt":" |f(x) − f(�x)| ≤ 0.","inline":true},{"text":"01 and by GCR in (c) if ","element":"figcaption","subtype":"caption"},{"style":{"height":18.22},"width":953.28,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-7.png","element":"img","alt":"�Vy(x, �x; r) ≤ 0.001 with r = 0.01. Here �Vy(x, �x; r","inline":true},{"text":") is an empirical estimator of ","element":"figcaption","subtype":"caption"},{"style":{"height":18.44},"width":270.76,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-8.png","element":"img","alt":" Vf(x, �x) to be","inline":true,"padRight":true},{"text":"defined in ","element":"figcaption","subtype":"caption"},{"href":"#id-47","text":"(24)","element":"a","subtype":"caption"},{"text":".","element":"figcaption","subtype":"caption"}],[{"id":"id-46","style":{"width":"97%"},"width":1751,"height":586,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-9.png","element":"img"}],[{"text":"Figure 2: (a) Function ","element":"figcaption","subtype":"caption"},{"style":{"height":19.41},"width":267.75,"height":48.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-10.png","element":"img","alt":" f(x1, x2) = x21","inline":true},{"text":". When 200 samples are uniformly drawn in [","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":220.77,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-11.png","element":"img","alt":"−0.5, 0.5] ×","inline":true,"padRight":true},{"text":"[","element":"figcaption","subtype":"caption"},{"style":{"height":15.2},"width":142.91,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-12.png","element":"img","alt":"−0.5, 0.","inline":true},{"text":"5], (b) and (c) display the connected (","element":"figcaption","subtype":"caption"},{"style":{"height":11.2},"width":72.88,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-13.png","element":"img","alt":"x, �x","inline":true},{"text":") pairs by SCR (b) and GCR (c), respectively. An (","element":"figcaption","subtype":"caption"},{"style":{"height":11.2},"width":72.88,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-14.png","element":"img","alt":"x, �x","inline":true},{"text":") pair is connected by SCR in (b) if ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":375.47,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-15.png","element":"img","alt":" |f(x) − f(�x)| ≤ 0.","inline":true},{"text":"01 and by GCR in (c) if ","element":"figcaption","subtype":"caption"},{"style":{"height":18.22},"width":632.1,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-16.png","element":"img","alt":"�Vy(x, �x; r) ≤ 0.001 with r = 0.01.","inline":true}],[{"text":"for some small ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-17.png","element":"img","alt":" α","inline":true},{"text":". Due to symmetry, ","element":"span"},{"style":{"height":17.6},"width":465.13,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-18.png","element":"img","alt":" |Φ⊤(�p − p)| = |�p1| + |p1|","inline":true,"padRight":true},{"text":"can be very large, which violates Condition ","element":"span"},{"href":"#id-45","text":"1. ","element":"a"},{"text":"When 200 samples are uniformly drawn in [","element":"span"},{"style":{"height":17.6},"width":729.22,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-19.png","element":"img","alt":"−0.5, 0.5]×[−0.5, 0.5], an (x, ˜x) pair is","inline":true,"padRight":true},{"text":"connected by SCR in Figure ","element":"span"},{"href":"#id-46","text":"2 ","element":"a"},{"text":"(b) if ","element":"span"},{"style":{"height":17.6},"width":330.62,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-20.png","element":"img","alt":" |f(x)−f(˜x)| ≤ 0.","inline":true},{"text":"01. The ideal connections are along the ","element":"span"},{"style":{"height":10.62},"width":41.94,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-21.png","element":"img","alt":" x2","inline":true,"padRight":true},{"text":"direction, but SCR gives many misleading connections since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is not monotonic. When ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d > ","element":"span"},{"text":"1, Condition ","element":"span"},{"href":"#id-45","text":"1 ","element":"a"},{"text":"can be easily violated as well. For any fixed ","element":"span"},{"style":{"height":15.93},"width":139.33,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-22.png","element":"img","alt":" x ∈ RD","inline":true},{"text":", the contour ","element":"span"},{"style":{"height":17.6},"width":313.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-23.png","element":"img","alt":" {�x|f(�x) = f(x)}","inline":true,"padRight":true},{"text":"is a curve or a surface, so ","element":"span"},{"style":{"height":17.6},"width":243.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-24.png","element":"img","alt":" ∥Φ⊤(�x − x)∥","inline":true,"padRight":true},{"text":"is not necessarily small.","element":"span"}],[{"text":"Fortunately, most misleading connections can be identified from a large variance of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"along the segment between the two inputs. In Figure ","element":"span"},{"href":"#id-46","text":"2, ","element":"a"},{"style":{"height":17.6},"width":257.27,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-25.png","element":"img","alt":" |f(p) − f(�p)|","inline":true,"padRight":true},{"text":"is small but the variance of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"along the segment between ","element":"span"},{"style":{"height":16},"width":157.1,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-26.png","element":"img","alt":" p and �p","inline":true,"padRight":true},{"text":"is large. This criterion helps to rule out the misleading connection between ","element":"span"},{"style":{"height":16},"width":167.15,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-27.png","element":"img","alt":" p and �p.","inline":true}],[{"text":"Replacing the condition of ","element":"span"},{"style":{"height":17.6},"width":229.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-28.png","element":"img","alt":" |y − ˜y| ≤ α","inline":true,"padRight":true},{"text":"in SCR by a variance condition gives rise to the Generalized Contour Regression (GCR) ","element":"span"},{"href":"#id-34","referenceIndex":36,"text":"[36]","element":"a"},{"text":". Let ","element":"span"},{"style":{"height":18.22},"width":855.78,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-29.png","element":"img","alt":" ℓ(xi, xj) = {x = (1 − t)xi + txj, t ∈ [0, 1]} be","inline":true,"padRight":true},{"text":"the segment between ","element":"span"},{"style":{"height":17.42},"width":180.89,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-30.png","element":"img","alt":" xi and xj","inline":true},{"text":". The variance of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"along this line is:","element":"span"}],[{"style":{"width":"38%"},"width":692,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/4-31.png","element":"img"}],[{"text":"In GCR, the covariance matrix is taken to be","element":"span"}],[{"id":"id-48","style":{"width":"71%"},"width":1293,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-0.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":10.4},"width":80.34,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-1.png","element":"img","alt":" α >","inline":true,"padRight":true},{"text":"0 is a parameter. In ","element":"span"},{"href":"#id-34","referenceIndex":36,"text":"[36]","element":"a"},{"text":", the authors assume that for any unit vectors ","element":"span"},{"style":{"height":15.6},"width":157.62,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-2.png","element":"img","alt":" v ∈ SΦ,","inline":true},{"style":{"height":20.31},"width":145.97,"height":50.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-3.png","element":"img","alt":"w ∈ S⊥Φ ","inline":true,"padRight":true},{"text":", there is some constant ","element":"span"},{"style":{"height":12.8},"width":221.12,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-4.png","element":"img","alt":" α such that","inline":true}],[{"style":{"width":"82%"},"width":1498,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-5.png","element":"img"}],[{"text":"It was proved that, when ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-6.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"text":"has an elliptical distribution, ","element":"span"},{"href":"#id-48","text":"(5) ","element":"a"},{"text":"always holds for sufficiently small ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-7.png","element":"img","alt":"α","inline":true},{"text":". In this work, we refine the assumption ","element":"span"},{"href":"#id-48","text":"(5) ","element":"a"},{"text":"to the following one which requires a gap between the two sides of ","element":"span"},{"href":"#id-48","text":"(5)","element":"a"},{"text":":","element":"span"}],[{"id":"id-49","style":{"fontWeight":"bold"},"text":"Assumption 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There exist ","element":"span"},{"style":{"height":14.8},"width":362.34,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-8.png","element":"img","alt":" αthresh > 0, c0 > 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that for any ","element":"span"},{"text":"0 ","element":"span"},{"style":{"height":15.24},"width":456.53,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-9.png","element":"img","alt":" < α < αthresh and unit","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"vectors ","element":"span"},{"style":{"height":20.31},"width":308.36,"height":50.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-10.png","element":"img","alt":" v ∈ SΦ, w ∈ S⊥Φ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":", the following hold","element":"span"}],[{"style":{"width":"70%"},"width":1278,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-11.png","element":"img"}],[{"text":"This assumption with a gap is used to prove an eigengap of the matrix ","element":"span"},{"style":{"height":17.6},"width":79.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-12.png","element":"img","alt":" G(α","inline":true},{"text":"), which is necessary to control the central subspace estimation error.","element":"span"}],[{"text":"An (","element":"span"},{"style":{"height":15.04},"width":72.88,"height":37.61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-13.png","element":"img","alt":"x, ˜x","inline":true},{"text":") pair is said to be connected if ","element":"span"},{"style":{"height":18.44},"width":239.72,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-14.png","element":"img","alt":" Vf(x, ˜x) ≤ α","inline":true},{"text":". In Figure ","element":"span"},{"href":"#id-46","text":"2, ","element":"a"},{"text":"even though ","element":"span"},{"style":{"height":17.6},"width":242.7,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-15.png","element":"img","alt":" |f(p)−f(�p)|","inline":true,"padRight":true},{"text":"is small, the variance ","element":"span"},{"style":{"height":18.44},"width":139.48,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-16.png","element":"img","alt":" Vf(p, �p","inline":true},{"text":") is large. When ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-17.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"is small, the condition of ","element":"span"},{"style":{"height":18.44},"width":242.5,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-18.png","element":"img","alt":" Vf(p, �p) ≤ α","inline":true,"padRight":true},{"text":"is violated so the (","element":"span"},{"style":{"height":11.6},"width":75.27,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-19.png","element":"img","alt":"p, �p","inline":true},{"text":") pair is not connected. We expect all connected pairs by GCR to be aligned with the ","element":"span"},{"style":{"height":10.62},"width":41.94,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-20.png","element":"img","alt":" x2","inline":true,"padRight":true},{"text":"direction, such as (","element":"span"},{"style":{"height":11.6},"width":75.27,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-21.png","element":"img","alt":"p, �p","inline":true},{"text":"). For such connected pairs, it is very likely that ","element":"span"},{"style":{"height":16},"width":298.13,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-22.png","element":"img","alt":" p1�p1 > 0, which","inline":true,"padRight":true},{"text":"implies ","element":"span"},{"style":{"height":17.6},"width":1651.23,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-23.png","element":"img","alt":" |Φ⊤(�p − p)| = |�p1 − p1| ≤ α/ min(p1, �p1) while |w⊤(�p − p)| = |�p2 − p2| ∈ [0, 4] is","inline":true,"padRight":true},{"text":"independent of ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-24.png","element":"img","alt":" α","inline":true},{"text":". When 200 samples are uniformly drawn in [","element":"span"},{"style":{"height":17.6},"width":612.37,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-25.png","element":"img","alt":"−0.5, 0.5] × [−0.5, 0.5], an (x, ˜x)","inline":true,"padRight":true},{"text":"pair is connected by GCR in Figure ","element":"span"},{"href":"#id-46","text":"2(","element":"a"},{"text":"c) if an empirical estimator of ","element":"span"},{"style":{"height":18.44},"width":137.08,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-26.png","element":"img","alt":" Vf(x, ˜x","inline":true},{"text":") (see ","element":"span"},{"href":"#id-47","text":"(24)","element":"a"},{"text":") is no more than 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"001. We observe that, most connections by GCR are along the ","element":"span"},{"style":{"height":10.62},"width":41.94,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-27.png","element":"img","alt":" x2","inline":true,"padRight":true},{"text":"direction, and many misleading connections by SCR are ruled out.","element":"span"}],[{"text":"Assumption ","element":"span"},{"href":"#id-49","text":"1 ","element":"a"},{"text":"together with a spherical distribution of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"imply that, when ","element":"span"},{"style":{"height":12.44},"width":225.75,"height":31.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-28.png","element":"img","alt":" α < αthresh","inline":true,"padRight":true},{"text":"the eigenspace of ","element":"span"},{"style":{"height":17.6},"width":79.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-29.png","element":"img","alt":" G(α","inline":true},{"text":") associated with the smallest ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"eigenvalues is ","element":"span"},{"style":{"height":15.1},"width":51.42,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-30.png","element":"img","alt":" SΦ","inline":true,"padRight":true},{"text":"and the rest of the eigenvectors span ","element":"span"},{"style":{"height":20.31},"width":70.04,"height":50.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-31.png","element":"img","alt":" S⊥Φ .","inline":true}],[{"id":"id-50","style":{"height":16.8},"width":534.58,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-32.png","element":"img","alt":"Proposition 1. Assume ρ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"has a spherical distribution. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":15.1},"width":580.56,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-33.png","element":"img","alt":" λ1 ≥ λ2 ≥ . . . ≥ λD be the","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"eigenvalues of ","element":"span"},{"style":{"height":17.6},"width":96.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-34.png","element":"img","alt":" G(α)","inline":true},{"style":{"fontStyle":"italic"},"text":". Under Assumption ","element":"span"},{"href":"#id-49","style":{"fontStyle":"italic"},"text":"1, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"the followings hold for any ","element":"span"},{"style":{"height":17.6},"width":292.89,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-35.png","element":"img","alt":" α ∈ (0, αthresh):","inline":true}],[{"text":"• ","element":"span"},{"style":{"fontStyle":"italic"},"text":"For any integer ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":17.42},"width":1150.06,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-36.png","element":"img","alt":" ≤ j ≤ D − d and D − d + 1 ≤ k ≤ D, we have λj − λk ≥ c0.","inline":true}],[{"text":"• ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The eigenvectors of ","element":"span"},{"style":{"height":17.6},"width":96.35,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-37.png","element":"img","alt":" G(α)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"associated with ","element":"span"},{"style":{"height":23.75},"width":325.91,"height":59.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-38.png","element":"img","alt":" {λj}D−dj=1 span S⊥Φ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and the eigenvectors associated ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with ","element":"span"},{"style":{"height":21.65},"width":434.04,"height":54.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-39.png","element":"img","alt":" {λk}Dk=D−d+1 span SΦ.","inline":true}],[{"text":"Proposition ","element":"span"},{"href":"#id-50","text":"1 ","element":"a"},{"text":"is proved in Supplementary materials ","element":"span"},{"text":"A. ","element":"span"},{"text":"The second part of Proposition ","element":"span"},{"href":"#id-50","text":"1 ","element":"a"},{"text":"was first proved in ","element":"span"},{"href":"#id-34","referenceIndex":36,"text":"[36] ","element":"a"},{"text":"based on the assumption in ","element":"span"},{"href":"#id-48","text":"(5)","element":"a"},{"text":". We further show that, under Assumption ","element":"span"},{"href":"#id-49","text":"1, ","element":"a"},{"text":"there is a eigengap between the first ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"eigenvalues and the rest of the eigenvalues for ","element":"span"},{"style":{"height":17.6},"width":108.33,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-40.png","element":"img","alt":" G(α).","inline":true}]]},{"heading":"3 Main Results","paragraphs":[[{"style":{"fontWeight":"bold"},"text":"3.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Assumptions","element":"span"}],[{"text":"In order to guarantee the success of GCR, we introduce the following assumptions on ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-41.png","element":"img","alt":" ρ","inline":true},{"text":", requiring that i) ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-42.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"text":"is supported on a bounded domain; ii) ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-43.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"text":"has a spherical distribution.","element":"span"}],[{"id":"id-51","style":{"fontWeight":"bold"},"text":"Assumption 2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose the probability measure ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/5-44.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfies the following assumptions:","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"i) ","element":"span"},{"style":{"height":17.6},"width":479.41,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-0.png","element":"img","alt":" [Boundedness] supp(ρ)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is bounded by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B > ","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":": for every ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"style":{"fontStyle":"italic"},"text":"sampled from ","element":"span"},{"style":{"height":17.6},"width":231.82,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-1.png","element":"img","alt":" ρ, ∥x∥ ≤ B","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"almost surely.","element":"span"}],[{"style":{"width":"63%"},"width":1147,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-2.png","element":"img"}],[{"text":"Assumption ","element":"span"},{"href":"#id-51","text":"2(","element":"a"},{"text":"ii) is common in inverse regression, but GCR is still more robust against nonsphericity (or nonellipticity) than SIR and SCR (see Experiment 1 in Section ","element":"span"},{"text":"4)","element":"span"},{"text":". We next define Lipschitz and H¨older functions and make some regularity assumption on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 2 ","element":"span"},{"text":"(Lipschitz functions)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is Lipschitz with Lipschitz constant ","element":"span"},{"style":{"height":17.42},"width":99.96,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-3.png","element":"img","alt":" Lg if","inline":true}],[{"style":{"width":"44%"},"width":810,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-4.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Definition 3 ","element":"span"},{"text":"(H¨older functions)","element":"span"},{"style":{"height":17.82},"width":1177.14,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-5.png","element":"img","alt":". Let s = k + β for some k ∈ N0 and 0 < β ≤ 1, and Cg > 0.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"A function ","element":"span"},{"style":{"height":20.15},"width":533.6,"height":50.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-6.png","element":"img","alt":" g : Rd → R is called (s, Cg)","inline":true},{"style":{"fontStyle":"italic"},"text":"-smooth if for every ","element":"span"},{"style":{"height":17.6},"width":649.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-7.png","element":"img","alt":" α = (α1, ..., αd), αi ∈ N0, |α| ≤ k,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"the partial derivative ","element":"span"},{"style":{"height":32.28},"width":340.58,"height":80.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-8.png","element":"img","alt":" Dαg := ∂kg∂xα11 ···∂xαdd","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"exists and satisfies","element":"span"}],[{"style":{"width":"62%"},"width":1126,"height":155,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-9.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":24.4},"width":288.2,"height":61.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-10.png","element":"img","alt":" |α| = �dj=1 αj.","inline":true}],[{"text":"In this paper, the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is assumed to be (","element":"span"},{"style":{"height":18.22},"width":273.43,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-11.png","element":"img","alt":"s, Cg) smooth.","inline":true}],[{"id":"id-52","style":{"fontWeight":"bold"},"text":"Assumption 3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The function ","element":"span"},{"style":{"height":20.15},"width":395.78,"height":50.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-12.png","element":"img","alt":" g : Rd → R is (s, Cg)","inline":true},{"style":{"fontStyle":"italic"},"text":"-smooth with ","element":"span"},{"style":{"height":14},"width":113.46,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-13.png","element":"img","alt":" s ≥ 1.","inline":true}],[{"text":"Assumption ","element":"span"},{"href":"#id-51","text":"2 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-52","text":"3 ","element":"a"},{"text":"imply the followings:","element":"span"}],[{"text":"i) ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is Lipschitz as ","element":"span"},{"style":{"height":17.42},"width":259.16,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-14.png","element":"img","alt":" s ≥ 1, and Cg","inline":true,"padRight":true},{"text":"is an upper bound of the Lipschitz constant of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":".","element":"span"}],[{"text":"ii) ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is bounded by Assumption ","element":"span"},{"href":"#id-51","text":"2 ","element":"a"},{"text":"(i) and the Lipschitz property above: ","element":"span"},{"style":{"height":17.6},"width":348.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-15.png","element":"img","alt":" |g| ≤ M where M","inline":true,"padRight":true},{"text":"denotes an upper bound of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"which satisfies ","element":"span"},{"style":{"height":18.22},"width":361.99,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-16.png","element":"img","alt":" M ≤ |g(0)| + CgB.","inline":true}],[{"id":"id-69","style":{"height":16.4},"width":569.58,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-17.png","element":"img","alt":"Assumption 4. The noise ξ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"has zero mean and is bounded: ","element":"span"},{"style":{"height":17.6},"width":600.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-18.png","element":"img","alt":" Eξ = 0 and ξ ∈ [−σ, σ] almost","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"surely.","element":"span"}],[{"id":"id-63","style":{"fontWeight":"bold"},"text":"3.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Central subspace error","element":"span"}],[{"text":"Our central interest is to estimate the central subspace. In this paper, we prove the central subspace estimation error by GCR based on the exact variance quantity ","element":"span"},{"style":{"height":18.44},"width":137.08,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-19.png","element":"img","alt":" Vf(�x, x","inline":true},{"text":"). The empirical estimation of ","element":"span"},{"style":{"height":18.44},"width":137.08,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-20.png","element":"img","alt":" Vf(�x, x","inline":true},{"text":") is discussed in Section ","element":"span"},{"href":"#id-53","text":"3.4. ","element":"a"},{"text":"If ","element":"span"},{"style":{"height":18.44},"width":137.08,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-21.png","element":"img","alt":" Vf(�x, x","inline":true},{"text":") is exactly known, we define","element":"span"}],[{"style":{"width":"74%"},"width":1341,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-22.png","element":"img"}],[{"text":"such that","element":"span"}],[{"id":"id-56","style":{"width":"65%"},"width":1182,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-23.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":18.44},"width":310.28,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-24.png","element":"img","alt":" 1{Vf(˜x, x) ≤ α}","inline":true,"padRight":true},{"text":"is the indicator function which is equal to 1 if ","element":"span"},{"style":{"height":18.44},"width":239.72,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-25.png","element":"img","alt":" Vf(˜x, x) ≤ α","inline":true,"padRight":true},{"text":"and 0 otherwise. The matrices ","element":"span"},{"style":{"height":17.6},"width":276.03,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-26.png","element":"img","alt":" H(α) and G(α","inline":true},{"text":") have the same set of eigenvectors. According to Proposition ","element":"span"},{"href":"#id-50","text":"1, ","element":"a"},{"text":"for any 0 ","element":"span"},{"style":{"height":12.44},"width":254.91,"height":31.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-27.png","element":"img","alt":" < α < αthresh","inline":true},{"text":", the eigenvectors associated with the smallest ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"eigenvalues of ","element":"span"},{"style":{"height":17.6},"width":101.86,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-28.png","element":"img","alt":" H(α)","inline":true,"padRight":true},{"text":"form an orthonormal basis of the central subspace.","element":"span"}],[{"text":"The empirical counterpart of ","element":"span"},{"style":{"height":17.6},"width":84.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-29.png","element":"img","alt":" H(α","inline":true},{"text":") based on the i.i.d. samples ","element":"span"},{"style":{"height":18.09},"width":272.38,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-30.png","element":"img","alt":" {(xi, yi)}ni=1 is","inline":true}],[{"id":"id-76","style":{"width":"80%"},"width":1456,"height":119,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/6-31.png","element":"img"}],[{"text":"and ","element":"span"},{"style":{"height":17.6},"width":302.94,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-0.png","element":"img","alt":" E �H(α) = H(α).","inline":true,"padRight":true},{"text":"The central subspace estimator based on ","element":"span"},{"style":{"height":12},"width":39,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-1.png","element":"img","alt":"�H","inline":true,"padRight":true},{"text":"is denoted by ","element":"span"},{"style":{"height":15.93},"width":317.62,"height":39.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-2.png","element":"img","alt":"�Φ ∈ RD×d whose","inline":true,"padRight":true},{"text":"columns are the eigenvectors associated with the smallest ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"eigenvalues of ","element":"span"},{"style":{"height":12},"width":39,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-3.png","element":"img","alt":"�H","inline":true},{"text":". This estimation error is quantified by ","element":"span"},{"style":{"height":19.62},"width":319.04,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-4.png","element":"img","alt":" ∥Proj�Φ − ProjΦ∥","inline":true},{"text":". To show that Proj","element":"span"},{"style":{"height":8.8},"width":25,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-5.png","element":"img","alt":"�Φ","inline":true,"padRight":true},{"text":"is close to Proj","element":"span"},{"style":{"height":8.8},"width":25,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-6.png","element":"img","alt":"Φ","inline":true},{"text":", we first derive a concentration inequality for ","element":"span"},{"style":{"height":17.6},"width":113.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-7.png","element":"img","alt":"�H(α).","inline":true}],[{"id":"id-54","style":{"height":18.09},"width":485.72,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-8.png","element":"img","alt":"Theorem 1. Let {xi}ni=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be i.i.d. samples from a probability measure ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-9.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfying Assumption ","element":"span"},{"href":"#id-51","style":{"fontStyle":"italic"},"text":"2. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"Assume ","element":"span"},{"style":{"height":18.44},"width":153.56,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-10.png","element":"img","alt":" Vf(�x, x)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is known for any ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":89.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-11.png","element":"img","alt":"�x, x)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"pair. For any ","element":"span"},{"style":{"height":17.6},"width":571.65,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-12.png","element":"img","alt":" α ∈ (0, αthresh) and any t > 0,","inline":true}],[{"style":{"width":"78%"},"width":1417,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-13.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"style":{"fontStyle":"italic"},"text":"are given in Assumption ","element":"span"},{"href":"#id-51","style":{"fontStyle":"italic"},"text":"2.","element":"a"}],[{"text":"Theorem ","element":"span"},{"href":"#id-54","text":"1 ","element":"a"},{"text":"is proved in Section ","element":"span"},{"href":"#id-55","text":"5.1 ","element":"a"},{"text":"with a new concentration inequality for matrix-valued U-statistics. To further bound the distance between Proj","element":"span"},{"style":{"height":18.82},"width":236.27,"height":47.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-14.png","element":"img","alt":"�Φ and ProjΦ","inline":true},{"text":", we will use the relation between ","element":"span"},{"style":{"height":17.6},"width":280.51,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-15.png","element":"img","alt":" H(α) and G(α","inline":true},{"text":") in ","element":"span"},{"href":"#id-56","text":"(7)","element":"a"},{"text":", and the eigengap implied from Assumption ","element":"span"},{"href":"#id-49","text":"1. ","element":"a"},{"text":"Denote","element":"span"}],[{"id":"id-107","style":{"width":"22%"},"width":414,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-16.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is Lipschitz and supp(","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-17.png","element":"img","alt":"ρ","inline":true},{"text":") is compact, we expect that, for a fixed ","element":"span"},{"style":{"height":15.2},"width":183.87,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-18.png","element":"img","alt":" α > 0, pα","inline":true,"padRight":true},{"text":"is bounded away from 0. The following example shows that when ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-19.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"text":"has a spherical distribution and under mild conditions, ","element":"span"},{"style":{"height":11.6},"width":43.95,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-20.png","element":"img","alt":" pα","inline":true,"padRight":true},{"text":"is bounded away from 0.","element":"span"}],[{"id":"id-57","style":{"fontWeight":"bold"},"text":"Example 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose Assumption ","element":"span"},{"href":"#id-51","style":{"fontStyle":"italic"},"text":"2 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-52","style":{"fontStyle":"italic"},"text":"3 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold, and the density function of ","element":"span"},{"style":{"height":17.6},"width":302.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-21.png","element":"img","alt":" ρ on supp(ρ) is","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"bounded below by ","element":"span"},{"style":{"height":21.45},"width":607.61,"height":53.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-22.png","element":"img","alt":" hmin > 0. If α < B2C2g/D, then","inline":true}],[{"style":{"width":"83%"},"width":1503,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-23.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":27.42},"width":138.67,"height":68.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-24.png","element":"img","alt":" C = 1Cdg","inline":true}],[{"text":"Example ","element":"span"},{"href":"#id-57","text":"1 ","element":"a"},{"text":"is proved in Supplementary materials ","element":"span"},{"text":"B. ","element":"span"},{"text":"The following theorem shows that, under the same conditions as in Theorem ","element":"span"},{"href":"#id-54","text":"1, ","element":"a"},{"text":"Proj","element":"span"},{"style":{"height":8.8},"width":25,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-25.png","element":"img","alt":"�Φ","inline":true,"padRight":true},{"text":"is close to Proj","element":"span"},{"style":{"height":8.8},"width":25,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-26.png","element":"img","alt":"Φ","inline":true,"padRight":true},{"text":"with high probability (see a proof in Section ","element":"span"},{"href":"#id-58","text":"5.2)","element":"a"},{"text":".","element":"span"}],[{"id":"id-60","style":{"height":18.09},"width":487.47,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-27.png","element":"img","alt":"Theorem 2. Let {xi}ni=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be i.i.d. samples from a probability measure ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-28.png","element":"img","alt":" ρ","inline":true},{"style":{"fontStyle":"italic"},"text":". Suppose Assumption ","element":"span"},{"href":"#id-49","style":{"fontStyle":"italic"},"text":"1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-51","style":{"fontStyle":"italic"},"text":"2 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold and ","element":"span"},{"style":{"height":18.44},"width":153.56,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-29.png","element":"img","alt":" Vf(�x, x)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is exactly known for any ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":89.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-30.png","element":"img","alt":"�x, x)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"pair. For any ","element":"span"},{"text":"0 ","element":"span"},{"style":{"height":15.24},"width":354.23,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-31.png","element":"img","alt":" < α < αthresh and","inline":true}],[{"id":"id-59","style":{"width":"99%"},"width":1799,"height":173,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-32.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":15.2},"width":91.23,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-33.png","element":"img","alt":" c0, B","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are given in Assumption ","element":"span"},{"href":"#id-49","style":{"fontStyle":"italic"},"text":"1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-51","style":{"fontStyle":"italic"},"text":"2.","element":"a"}],[{"text":"Integrating the probability in ","element":"span"},{"href":"#id-59","text":"(11) ","element":"a"},{"text":"gives rise to the following mean squared estimation error for the central subspace:","element":"span"}],[{"id":"id-61","style":{"fontWeight":"bold"},"text":"Corollary 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under the conditions in Theorem ","element":"span"},{"href":"#id-60","style":{"fontStyle":"italic"},"text":"2, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"we have","element":"span"}],[{"style":{"width":"63%"},"width":1147,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-34.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where","element":"span"}],[{"id":"id-67","style":{"width":"94%"},"width":1702,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-35.png","element":"img"}],[{"text":"Corollary ","element":"span"},{"href":"#id-61","text":"1 ","element":"a"},{"text":"is proved in Section ","element":"span"},{"href":"#id-62","text":"5.3. ","element":"a"},{"text":"It implies that, if ","element":"span"},{"style":{"height":18.44},"width":137.08,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-36.png","element":"img","alt":" Vf(�x, x","inline":true},{"text":") is exactly known, the meansquared estimation error of GCR for the central subspace converges in the rate of ","element":"span"},{"style":{"height":19.13},"width":151.91,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/7-37.png","element":"img","alt":" O(n−1).","inline":true}],[{"id":"id-64","style":{"width":"100%"},"width":1806,"height":475,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-0.png","element":"img"}],[{"id":"id-80","style":{"fontWeight":"bold"},"text":"3.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Regression error","element":"span"}],[{"text":"Given 2","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"samples denoted by ","element":"span"},{"style":{"height":19.62},"width":336.08,"height":49.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-1.png","element":"img","alt":" S = {(xi, yi)}2ni=1","inline":true},{"text":", we evenly split the data into two subsets ","element":"span"},{"style":{"height":20.82},"width":1001.38,"height":52.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-2.png","element":"img","alt":"S1 = {(xi, yi)}ni=1 and S2 = {(xi, yi)}2ni=n+1 while S1","inline":true,"padRight":true},{"text":"is used to compute ","element":"span"},{"style":{"height":17.6},"width":84.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-3.png","element":"img","alt":" �H(α","inline":true},{"text":") as described in ","element":"span"},{"text":"Section ","element":"span"},{"href":"#id-63","text":"3.2, ","element":"a"},{"text":"and ","element":"span"},{"style":{"height":15.02},"width":43.42,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-4.png","element":"img","alt":" S2","inline":true,"padRight":true},{"text":"is used to estimate the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":". After the central subspace is estimated as ","element":"span"},{"style":{"height":15.93},"width":194.02,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-5.png","element":"img","alt":"�Φ ∈ RD×d","inline":true},{"text":", the next step is to estimate the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"from the data ","element":"span"},{"style":{"height":20.82},"width":412.08,"height":52.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-6.png","element":"img","alt":" {(zi, yi)}2ni=n+1, where","inline":true},{"style":{"height":17.75},"width":358,"height":44.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-7.png","element":"img","alt":"zi = �Φ⊤xi ∈ Rd.","inline":true,"padRight":true},{"text":"Our regression scheme is summarized in Algorithm ","element":"span"},{"href":"#id-64","text":"1. ","element":"a"},{"text":"A rich class of nonparametric regression techniques ","element":"span"},{"href":"#id-20","referenceIndex":23,"text":"[23, ","element":"a"},{"href":"#id-65","referenceIndex":50,"text":"50, ","element":"a"},{"href":"#id-66","referenceIndex":51,"text":"51] ","element":"a"},{"text":"can be used to estimate ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":", such as kNN, kernel regression, polynomial partitioning estimates, etc. In this paper, we will present the results of polynomial partitioning estimates.","element":"span"}],[{"id":"id-110","style":{"width":"95%"},"width":1732,"height":175,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-8.png","element":"img"}],[{"text":"with noise ","element":"span"},{"style":{"height":16.4},"width":207.18,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-9.png","element":"img","alt":"�ξi given by","inline":true}],[{"style":{"width":"73%"},"width":1326,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-10.png","element":"img"}],[{"text":"Due to the mismatch between ","element":"span"},{"style":{"height":0},"width":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-11.png","element":"img","alt":"�","inline":true},{"text":"Φ and Φ, the noise ","element":"span"},{"style":{"height":16.4},"width":31.09,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-12.png","element":"img","alt":"�ξi","inline":true,"padRight":true},{"text":"has a bias and its conditional mean at","element":"span"}],[{"id":"id-138","style":{"width":"99%"},"width":1800,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-13.png","element":"img"}],[{"text":"where the expectation is taken over ","element":"span"},{"style":{"height":16.4},"width":238.16,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-14.png","element":"img","alt":" x ∼ ρ and ξ","inline":true},{"text":", conditioning on ","element":"span"},{"style":{"height":12},"width":173.74,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-15.png","element":"img","alt":"�Φ⊤x = z","inline":true},{"text":". This function is bounded such that ","element":"span"},{"style":{"height":18.22},"width":574.75,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-16.png","element":"img","alt":" ∥η∥∞ ≤ CgB∥�Φ − Φ∥. We use","inline":true}],[{"id":"id-112","style":{"width":"61%"},"width":1115,"height":61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-17.png","element":"img"}],[{"text":"to denote an upper bound of the squared bias in noise.","element":"span"}],[{"text":"The support of the measure ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-18.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"text":"is bounded by Assumption ","element":"span"},{"href":"#id-51","text":"2(","element":"a"},{"text":"i), which implies ","element":"span"},{"style":{"height":19.53},"width":268.73,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-19.png","element":"img","alt":" zi ∈ [−B, B]d.","inline":true,"padRight":true},{"text":"For a fixed positive integer ","element":"span"},{"style":{"height":15.6},"width":180.44,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-20.png","element":"img","alt":" K, let Fk","inline":true,"padRight":true},{"text":"be the space of piecewise polynomials of degree no more than ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"on the partition of [","element":"span"},{"style":{"height":19.53},"width":310.48,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-21.png","element":"img","alt":"−B, B]d into Kd ","inline":true,"padRight":true},{"text":"cubes with side length 2","element":"span"},{"style":{"height":18.22},"width":530.24,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-22.png","element":"img","alt":"B/K. If g is (s, Cg) smooth,","inline":true,"padRight":true},{"text":"the polynomial degree ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"should be chosen as ","element":"span"},{"style":{"height":17.6},"width":192.29,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-23.png","element":"img","alt":" k = ⌈s⌉ −","inline":true,"padRight":true},{"text":"1. Consider the piecewise polynomial estimator of order ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":":","element":"span"}],[{"style":{"width":"68%"},"width":1236,"height":133,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-24.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is bounded by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":", we truncate the final estimator to ","element":"span"},{"style":{"height":16.4},"width":215.38,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-25.png","element":"img","alt":" �g such that","inline":true}],[{"id":"id-71","style":{"width":"76%"},"width":1384,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-26.png","element":"img"}],[{"text":"The parameter ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"determines the size of the partition, which we set as","element":"span"}],[{"id":"id-70","style":{"width":"77%"},"width":1392,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/8-27.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"height":15.02},"width":48.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-0.png","element":"img","alt":" C4","inline":true,"padRight":true},{"text":"defined in ","element":"span"},{"href":"#id-67","text":"(13)","element":"a"},{"text":". The goal of this paper is to give an error analysis of ","element":"span"},{"style":{"height":16.4},"width":30.18,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-1.png","element":"img","alt":"�f","inline":true},{"text":", which has the Mean Squared Error (MSE)","element":"span"}],[{"style":{"width":"59%"},"width":1067,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-2.png","element":"img"}],[{"text":"where the expectation is taken over the joint distribution of ","element":"span"},{"style":{"height":20.02},"width":575.01,"height":50.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-3.png","element":"img","alt":" {(xi, yi)}2ni=1. For any x ∈ RD,","inline":true}],[{"id":"id-111","style":{"width":"83%"},"width":1516,"height":193,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-4.png","element":"img"}],[{"text":"The first term captures the estimation error of the central subspace by GCR, and the second terms captures the regression error of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":". The following theorem provides an upper bound for the MSE of ","element":"span"},{"style":{"height":16.4},"width":30.18,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-5.png","element":"img","alt":"�f","inline":true,"padRight":true},{"text":"(see its proof in Section ","element":"span"},{"href":"#id-68","text":"5.4)","element":"a"},{"text":".","element":"span"}],[{"id":"id-72","style":{"height":19.62},"width":491.08,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-6.png","element":"img","alt":"Theorem 3. Let {xi}2ni=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be i.i.d. samples of a probability measure ","element":"span"},{"style":{"height":19.62},"width":486.88,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-7.png","element":"img","alt":" ρ and {yi}2ni=1 be sampled","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"according to the model in ","element":"span"},{"href":"#id-40","text":"(1)","element":"a"},{"style":{"fontStyle":"italic"},"text":". Suppose Assumption ","element":"span"},{"href":"#id-49","style":{"fontStyle":"italic"},"text":"1-","element":"a"},{"href":"#id-69","style":{"fontStyle":"italic"},"text":"4 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold and ","element":"span"},{"style":{"height":18.44},"width":153.57,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-8.png","element":"img","alt":" Vf(�x, x)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is exactly known for any ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":706.53,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-9.png","element":"img","alt":"�x, x) pair. Set α ∈ (0, αthresh) and K","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"according to ","element":"span"},{"href":"#id-70","style":{"fontStyle":"italic"},"text":"(20)","element":"a"},{"style":{"fontStyle":"italic"},"text":". The estimator ","element":"span"},{"style":{"height":16.4},"width":30.18,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-10.png","element":"img","alt":"�f","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"in ","element":"span"},{"href":"#id-71","style":{"fontStyle":"italic"},"text":"(19) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"satisfies","element":"span"}],[{"id":"id-121","style":{"width":"89%"},"width":1613,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-11.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":21.45},"width":339.48,"height":53.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-12.png","element":"img","alt":" C5 = C4C2gB2, C4 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is defined in ","element":"span"},{"href":"#id-67","style":{"fontStyle":"italic"},"text":"(13) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C > ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is a constant depending on ","element":"span"},{"style":{"height":17.42},"width":310.05,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-13.png","element":"img","alt":" d, k, s, Cg, B, M.","inline":true}],[{"text":"Theorem ","element":"span"},{"href":"#id-72","text":"3 ","element":"a"},{"text":"demonstrates that if GCR is used to estimate the central subspace, the mean squared regression error decays exponentially in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"with an exponent depending on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":", instead of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D","element":"span"},{"text":". GCR effectively exploits the low-dimensional structure of the function and gives a fast rate of convergence in comparison with a direct regression in ","element":"span"},{"style":{"height":15.14},"width":74.24,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-14.png","element":"img","alt":" RD.","inline":true}],[{"id":"id-53","style":{"fontWeight":"bold"},"text":"3.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"A practical algorithm to estimate the central subspace","element":"span"}],[{"text":"Our estimation theory in Theorem ","element":"span"},{"href":"#id-60","text":"2 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-72","text":"3 ","element":"a"},{"text":"utilizes U-statistics with the exact knowledge of ","element":"span"},{"style":{"height":18.44},"width":137.08,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-15.png","element":"img","alt":"Vf(�x, x","inline":true},{"text":") for any (","element":"span"},{"style":{"height":11.2},"width":72.88,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-16.png","element":"img","alt":"�x, x","inline":true},{"text":") pair. In practice, ","element":"span"},{"style":{"height":18.44},"width":137.08,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-17.png","element":"img","alt":" Vf(�x, x","inline":true},{"text":") is not given and we need to estimate it from the samples. In this paper, we use the empirical estimation of ","element":"span"},{"style":{"height":17.24},"width":44.46,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-18.png","element":"img","alt":" Vf","inline":true,"padRight":true},{"text":"proposed in ","element":"span"},{"href":"#id-34","referenceIndex":36,"text":"[36] ","element":"a"},{"text":"and prove an estimation error for ","element":"span"},{"style":{"height":17.24},"width":44.46,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-19.png","element":"img","alt":" Vf","inline":true},{"text":". We also propose an efficient algorithm to compute the empirical covariance matrix for the estimation of the central subspace.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"3.4.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Empirical estimation of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"}],[{"text":"The variance quantity ","element":"span"},{"style":{"height":18.44},"width":165.09,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-20.png","element":"img","alt":" Vf(xi, xj","inline":true},{"text":") is the variance of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"along the segment ","element":"span"},{"style":{"height":18.22},"width":453.92,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-21.png","element":"img","alt":" ℓ(xi, xj). Since ℓ(xi, xj)","inline":true,"padRight":true},{"text":"has 0 measure, it is unlikely to have data exactly lying on ","element":"span"},{"style":{"height":18.22},"width":167.54,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-22.png","element":"img","alt":" ℓ(xi, xj).","inline":true,"padRight":true},{"text":"Following the idea in ","element":"span"},{"href":"#id-34","referenceIndex":36,"text":"[36]","element":"a"},{"text":", we approximate ","element":"span"},{"style":{"height":18.44},"width":165.09,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-23.png","element":"img","alt":" Vf(xi, xj","inline":true},{"text":") by the variance of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"in a narrow tube enclosing ","element":"span"},{"style":{"height":18.22},"width":256.3,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-24.png","element":"img","alt":" ℓ(xi, xj) with","inline":true,"padRight":true},{"text":"radius ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"text":"(see Figure ","element":"span"},{"href":"#id-73","text":"3)","element":"a"},{"text":". Let ","element":"span"},{"style":{"height":18.22},"width":221.6,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-25.png","element":"img","alt":" d(x, ℓ(xi, xj","inline":true},{"text":")) be the distance from ","element":"span"},{"style":{"height":18.22},"width":585.41,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-26.png","element":"img","alt":" x to ℓ(xi, xj): d(x, ℓ(xi, xj)) =","inline":true,"padRight":true},{"text":"inf","element":"span"},{"style":{"height":20.69},"width":633.44,"height":51.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-27.png","element":"img","alt":"z∈ℓ(xi,xj) ∥x − z∥. We define Tij(r","inline":true},{"text":") as the tube enclosing ","element":"span"},{"style":{"height":18.22},"width":136.04,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-28.png","element":"img","alt":" ℓ(xi, xj","inline":true},{"text":") with radius ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"text":"as follows","element":"span"}],[{"style":{"width":"84%"},"width":1528,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-29.png","element":"img"}],[{"text":"The variance of ","element":"span"},{"style":{"height":18.22},"width":179.94,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-30.png","element":"img","alt":" y in Tij(r","inline":true},{"text":") is denoted as","element":"span"}],[{"style":{"width":"66%"},"width":1209,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-31.png","element":"img"}],[{"text":"The empirical counterpart of ","element":"span"},{"style":{"height":18.22},"width":205.14,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-32.png","element":"img","alt":" Vy(xi, xj, r","inline":true},{"text":") based on the data ","element":"span"},{"style":{"height":18.09},"width":272.38,"height":45.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-33.png","element":"img","alt":" {(xi, yi)}ni=1 is","inline":true}],[{"id":"id-47","style":{"width":"89%"},"width":1616,"height":125,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/9-34.png","element":"img"}],[{"id":"id-127","style":{"width":"95%"},"width":1725,"height":574,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-0.png","element":"img"}],[{"id":"id-73","text":"Figure 3: ","element":"figcaption","subtype":"caption"},{"style":{"height":18.22},"width":91.53,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-1.png","element":"img","alt":"Tij(r","inline":true},{"text":") : the tube enclosing ","element":"figcaption","subtype":"caption"},{"style":{"height":18.22},"width":136.04,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-2.png","element":"img","alt":"ℓ(xi, xj","inline":true},{"text":") with radius ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"r","element":"figcaption","subtype":"caption"},{"text":".","element":"figcaption","subtype":"caption"}],[{"style":{"width":"99%"},"width":1800,"height":218,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-3.png","element":"img"}],[{"text":"We further assume that the difference between ","element":"span"},{"style":{"height":18.44},"width":489.37,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-4.png","element":"img","alt":" Vf(xi, xj, r) and Vf(xi, xj","inline":true},{"text":") is linear in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r","element":"span"},{"text":":","element":"span"}],[{"id":"id-74","style":{"fontWeight":"bold"},"text":"Assumption 6. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There exists ","element":"span"},{"style":{"height":14.22},"width":118,"height":35.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-5.png","element":"img","alt":" c2 > 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that, for any ","element":"span"},{"style":{"height":18.22},"width":316.19,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-6.png","element":"img","alt":" xi, xj ∈ supp(ρ),","inline":true}],[{"style":{"width":"66%"},"width":1201,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-7.png","element":"img"}],[{"text":"Assumption ","element":"span"},{"href":"#id-74","text":"6 ","element":"a"},{"text":"implies that when ","element":"span"},{"style":{"height":18.44},"width":405.66,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-8.png","element":"img","alt":" r is small, Vf(xi, xj, r","inline":true},{"text":") is close to ","element":"span"},{"style":{"height":18.44},"width":485.38,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-9.png","element":"img","alt":" Vf(xi, xj) for any xi, xj ∈","inline":true,"padRight":true},{"text":"supp(","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-10.png","element":"img","alt":"ρ","inline":true},{"text":"). The following example shows that when ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-11.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"text":"is the volume measure on its support and ","element":"span"},{"style":{"height":18.22},"width":297.49,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-12.png","element":"img","alt":"Tij(r) ⊂ supp(ρ","inline":true},{"text":"), then Assumption ","element":"span"},{"href":"#id-74","text":"6 ","element":"a"},{"text":"holds.","element":"span"}],[{"id":"id-75","style":{"fontWeight":"bold"},"text":"Example 2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be defined as ","element":"span"},{"href":"#id-40","text":"(1)","element":"a"},{"style":{"fontStyle":"italic"},"text":". Suppose Assumption ","element":"span"},{"href":"#id-51","style":{"fontStyle":"italic"},"text":"2(","element":"a"},{"style":{"fontStyle":"italic"},"text":"i) and ","element":"span"},{"href":"#id-52","style":{"fontStyle":"italic"},"text":"3 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold, and ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-13.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the volume measure on a compact set in ","element":"span"},{"style":{"height":15.13},"width":59.52,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-14.png","element":"img","alt":" RD","inline":true},{"style":{"fontStyle":"italic"},"text":". For every ","element":"span"},{"style":{"height":18.22},"width":939.43,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-15.png","element":"img","alt":" xi, xj ∈ supp(ρ) such that Tij(r) ⊂ supp(ρ), and","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"any ","element":"span"},{"style":{"height":19.64},"width":395.54,"height":49.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-16.png","element":"img","alt":" r ≤√5B/2, we have","inline":true}],[{"style":{"width":"69%"},"width":1248,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-17.png","element":"img"}],[{"text":"Example ","element":"span"},{"href":"#id-75","text":"2 ","element":"a"},{"text":"is proved in Supplementary materials ","element":"span"},{"text":"C.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"3.5 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"An efficient algorithm to form the empirical covariance matrix","element":"span"}],[{"style":{"width":"0%"},"width":15,"height":4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-18.png","element":"img"}],[{"text":"Forming the empirical covariance matrix as in ","element":"span"},{"href":"#id-76","text":"(8) ","element":"a"},{"text":"requires ","element":"span"},{"style":{"height":18.22},"width":205.14,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-19.png","element":"img","alt":"�Vy(xi, xj, r","inline":true},{"text":") for any (","element":"span"},{"style":{"height":18.22},"width":226.08,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-20.png","element":"img","alt":"xi, xj) pair.","inline":true,"padRight":true},{"text":"With ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"samples, the computational complexity is","element":"span"},{"style":{"height":20.96},"width":60.56,"height":52.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-21.png","element":"img","alt":"�n2�","inline":true},{"text":", which is expensive. When the parameter ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-22.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"is small, we expect to have a small number of (","element":"span"},{"style":{"height":13.02},"width":100.89,"height":32.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-23.png","element":"img","alt":"xi, xj","inline":true},{"text":") pairs connected such that ","element":"span"},{"style":{"height":18.22},"width":321.24,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-24.png","element":"img","alt":"�Vy(xi, xj, r) ≤ α.","inline":true,"padRight":true},{"text":"We next propose a new algorithm to efficiently identify the connected pairs.","element":"span"}],[{"text":"In our new algorithm, we form the empirical version of ","element":"span"},{"style":{"height":17.6},"width":334.31,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-25.png","element":"img","alt":" G(α) based on n","inline":true,"padRight":true},{"text":"i.i.d. samples ","element":"span"},{"style":{"height":18.09},"width":636.07,"height":45.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-26.png","element":"img","alt":"{(xi, yi)}ni=1. Let Aα ∈ {0, 1}n×n","inline":true,"padRight":true},{"text":"be a matrix with entries in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"} ","element":"span"},{"text":"such that all (","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, j","element":"span"},{"text":") indices ","element":"span"},{"text":"with ","element":"span"},{"style":{"height":17.6},"width":129.03,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-27.png","element":"img","alt":" Aα(i, j","inline":true},{"text":") = 1 satisfy ","element":"span"},{"style":{"height":18.22},"width":309.17,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-28.png","element":"img","alt":"�Vy(xi, xj, r) ≤ α","inline":true},{"text":". Our estimation of ","element":"span"},{"style":{"height":17.6},"width":139.99,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-29.png","element":"img","alt":" G(α) is","inline":true}],[{"style":{"width":"75%"},"width":1361,"height":125,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-30.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":18.22},"width":615.69,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-31.png","element":"img","alt":" �nα = #{(i, j) : Aα(xi, xj) = 1}","inline":true},{"text":". In other words, the connected pairs are indicated by the nonzero entries in the matrix ","element":"span"},{"style":{"height":15.42},"width":56.84,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-32.png","element":"img","alt":" Aα","inline":true},{"text":". In this paper, we form the matrix ","element":"span"},{"style":{"height":15.42},"width":56.84,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-33.png","element":"img","alt":" Aα","inline":true,"padRight":true},{"text":"via Algorithm ","element":"span"},{"href":"#id-77","text":"2. ","element":"a"},{"text":"In Algorithm ","element":"span"},{"href":"#id-77","text":"2, ","element":"a"},{"text":"each data point is used at most once: when a point is connected in one pair, we remove it from the rest of the computation. Algorithm ","element":"span"},{"href":"#id-77","text":"2 ","element":"a"},{"text":"is more efficient than the original GCR in the sense that it outputs at most ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n/","element":"span"},{"text":"2 connected pairs while the original GCR uses","element":"span"},{"style":{"height":20.96},"width":60.56,"height":52.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/10-34.png","element":"img","alt":"�n2�","inline":true,"padRight":true},{"text":"pairs (most of which are not connected) to estimate ","element":"span"},{"style":{"height":17.6},"width":79.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-0.png","element":"img","alt":" G(α","inline":true},{"text":"). Therefore the computational cost is greatly reduced. Moreover, our numerical experiments show that the mean squared error of the central subspace estimation by Algorithm ","element":"span"},{"href":"#id-77","text":"2 ","element":"a"},{"text":"converges in the order of ","element":"span"},{"style":{"height":15.13},"width":69.54,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-1.png","element":"img","alt":" n−1","inline":true},{"text":", which is the same as that in Corollary ","element":"span"},{"href":"#id-61","text":"1.","element":"a"}],[{"text":"In ","element":"span"},{"style":{"height":17.6},"width":319.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-2.png","element":"img","alt":"�G(α, r), α and r","inline":true,"padRight":true},{"text":"are two important parameters, which need to be properly chosen. In this paper, we choose these parameters as follows: Let ","element":"span"},{"style":{"height":10.4},"width":70.44,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-3.png","element":"img","alt":" ν >","inline":true,"padRight":true},{"text":"0 be a fixed constant. We set","element":"span"}],[{"id":"id-78","style":{"width":"79%"},"width":1441,"height":318,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-4.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"height":34.36},"width":765.75,"height":85.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-5.png","element":"img","alt":" C6 =�max(2c2, 8C2g)�−1, C7 =� 56νC2g3c1CD−16","inline":true}],[{"text":"eters ","element":"span"},{"style":{"height":17.42},"width":328.42,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-6.png","element":"img","alt":" Cg, B, c1, c2, σ, M","inline":true,"padRight":true},{"text":"are defined in Assumption ","element":"span"},{"href":"#id-51","text":"2-","element":"a"},{"href":"#id-74","text":"6.","element":"a"}],[{"text":"In the following theorem, we prove that, if the parameters ","element":"span"},{"style":{"height":11.2},"width":67.48,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-7.png","element":"img","alt":" α, r","inline":true,"padRight":true},{"text":"are chosen according to ","element":"span"},{"href":"#id-78","text":"(30) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-78","text":"(31)","element":"a"},{"text":", Algorithm ","element":"span"},{"href":"#id-77","text":"2 ","element":"a"},{"text":"guarantees at least ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n/","element":"span"},{"text":"4 connected pairs of (","element":"span"},{"style":{"height":13.02},"width":100.89,"height":32.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-8.png","element":"img","alt":"xi, xj","inline":true},{"text":") such that ","element":"span"},{"style":{"height":18.22},"width":269.1,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-9.png","element":"img","alt":"�Vy(xi, xj, r) ≤","inline":true},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-10.png","element":"img","alt":"α","inline":true},{"text":", and all the connected pairs satisfy ","element":"span"},{"style":{"height":19.98},"width":489.67,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-11.png","element":"img","alt":" Vf(xi, xj) ≤ α + α0 + 3σ2 ","inline":true,"padRight":true},{"text":"with high probability.","element":"span"}],[{"id":"id-79","style":{"height":18.09},"width":674.74,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-12.png","element":"img","alt":"Theorem 4. Let {xi}ni=1 be i.i.d.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"samples from the probability measure ","element":"span"},{"style":{"height":18.09},"width":345.53,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-13.png","element":"img","alt":" ρ, and {yi}ni=1 be","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"sampled according to the model in ","element":"span"},{"href":"#id-40","text":"(1)","element":"a"},{"style":{"fontStyle":"italic"},"text":", under Assumption ","element":"span"},{"href":"#id-51","style":{"fontStyle":"italic"},"text":"2(","element":"a"},{"style":{"fontStyle":"italic"},"text":"i), ","element":"span"},{"href":"#id-52","style":{"fontStyle":"italic"},"text":"3-","element":"a"},{"href":"#id-74","style":{"fontStyle":"italic"},"text":"6. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":15.6},"width":410.4,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-14.png","element":"img","alt":" ν > 2 and set α0, r, α","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"according to ","element":"span"},{"href":"#id-78","style":{"fontStyle":"italic"},"text":"(30) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-78","style":{"fontStyle":"italic"},"text":"(31)","element":"a"},{"style":{"fontStyle":"italic"},"text":". Index the output pairs ","element":"span"},{"style":{"height":18.22},"width":412.18,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-15.png","element":"img","alt":" {(xi, xj)|A(i, j) = 1}","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"by Algorithm ","element":"span"},{"href":"#id-77","style":{"fontStyle":"italic"},"text":"2 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"as ","element":"span"},{"style":{"height":18.84},"width":394.38,"height":47.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-16.png","element":"img","alt":"{(xik, xjk)}�nαk=1. If n","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is sufficiently large such that ","element":"span"},{"text":"2(","element":"span"},{"style":{"height":25.61},"width":782.73,"height":64.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-17.png","element":"img","alt":"n/ log n)d2D ≤ n and α0 < 2C2g, running","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"Algorithm ","element":"span"},{"href":"#id-77","style":{"fontStyle":"italic"},"text":"2 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"gives rise to","element":"span"}],[{"id":"id-134","style":{"width":"78%"},"width":1421,"height":206,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-18.png","element":"img"}],[{"text":"Theorem ","element":"span"},{"href":"#id-79","text":"4 ","element":"a"},{"text":"is proved in Supplementary materials ","element":"span"},{"text":"D. ","element":"span"},{"text":"Theorem ","element":"span"},{"href":"#id-79","text":"4 ","element":"a"},{"text":"shows that if ","element":"span"},{"style":{"height":12.8},"width":227.06,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-19.png","element":"img","alt":" α and r are","inline":true,"padRight":true},{"text":"properly chosen, with high probability, Algorithm ","element":"span"},{"href":"#id-77","text":"2 ","element":"a"},{"text":"gives rise to ","element":"span"},{"style":{"height":17.6},"width":118.75,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-20.png","element":"img","alt":"�G(α, r","inline":true},{"text":") with at least ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n/","element":"span"},{"text":"4 pairs of (","element":"span"},{"style":{"height":13.02},"width":100.89,"height":32.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-21.png","element":"img","alt":"xi, xj","inline":true},{"text":"). Moreover, all such pairs satisfy ","element":"span"},{"style":{"height":19.98},"width":477.34,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-22.png","element":"img","alt":" Vf(xi, xj) ≤ α + α0 + 3σ2 ","inline":true,"padRight":true},{"text":"with high probability. If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is large enough and ","element":"span"},{"style":{"height":8},"width":25,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-23.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is small enough such that ","element":"span"},{"style":{"height":17.58},"width":412.4,"height":43.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-24.png","element":"img","alt":" α+α0 +3σ2 < αthresh","inline":true},{"text":", we expect the estimated subspace ","element":"span"},{"style":{"height":0},"width":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-25.png","element":"img","alt":"�","inline":true},{"text":"Φ by Algorithm ","element":"span"},{"href":"#id-77","text":"2 ","element":"a"},{"text":"to be a good approximati","element":"span"},{"href":"#id-77","text":"on ","element":"a"},{"text":"of the central subspace Φ.","element":"span"}],[{"text":"described in Section ","element":"span"},{"href":"#id-80","text":"3.3 ","element":"a"},{"text":"using the samples ","element":"span"},{"style":{"height":20.82},"width":514.38,"height":52.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-26.png","element":"img","alt":" {(�Φ⊤xi, yi)}2ni=n+1. Then f","inline":true,"padRight":true},{"text":"is estimated as ","element":"span"},{"style":{"height":17.6},"width":136.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-27.png","element":"img","alt":" �f(x) =","inline":true},{"style":{"height":17.6},"width":154.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-28.png","element":"img","alt":"�g(�Φ⊤x).","inline":true}],[{"style":{"fontWeight":"bold"},"text":"3.6 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Data normalization","element":"span"}],[{"text":"Our theoretical analysis assumes that ","element":"span"},{"style":{"height":15.6},"width":487.63,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-29.png","element":"img","alt":" Ex = 0, Exx⊤ = I, and x","inline":true,"padRight":true},{"text":"follows a spherical distribution. In practice these conditions may not be satisfied for the given data, and we always preprocess the data by normalization ","element":"span"},{"href":"#id-35","referenceIndex":35,"text":"[35]","element":"a"},{"text":". Given the data set ","element":"span"},{"style":{"height":18.09},"width":192.82,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-30.png","element":"img","alt":" {xi, yi}ni=1","inline":true},{"text":", we first compute the empirical ","element":"span"},{"text":"mean ¯","element":"span"},{"style":{"height":21.29},"width":270.68,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-31.png","element":"img","alt":"x = 1n�ni=1 xi","inline":true,"padRight":true},{"text":"and the empirical covariance matrix ","element":"span"},{"style":{"height":21.29},"width":702.37,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-32.png","element":"img","alt":" �Σ = 1n−1�ni=1(xi − ¯x)(xi − ¯x)⊤, and","inline":true,"padRight":true},{"text":"then we normalize the data as","element":"span"}],[{"style":{"width":"19%"},"width":344,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-33.png","element":"img"}],[{"text":"This normalization does not alter the low-dimensional property of the function since","element":"span"}],[{"style":{"width":"64%"},"width":1160,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-34.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":22.89},"width":669.68,"height":57.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/11-35.png","element":"img","alt":"�Φ = �Σ12 Φ and �g(v) = g(v + Φ⊤¯x) .","inline":true}],[{"id":"id-77","style":{"width":"100%"},"width":1806,"height":1000,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-0.png","element":"img"}]]},{"heading":"4 Numerical experiments","paragraphs":[[{"text":"In this section, we provide numerical experiments to demonstrate the performance of the modi-fied GCR in Algorithm ","element":"span"},{"href":"#id-77","text":"2 ","element":"a"},{"text":"and the regression scheme. The data ","element":"span"},{"style":{"height":18.09},"width":226.76,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-1.png","element":"img","alt":" {(xi, yi)}ni=1 ","inline":true,"padRight":true},{"text":"are sampled accord- ","element":"span"},{"text":"ing to the model in ","element":"span"},{"href":"#id-40","text":"(1) ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":19.13},"width":232.3,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-2.png","element":"img","alt":" ξi ∼ N(0, σ2","inline":true},{"text":"). The noise is ","element":"span"},{"style":{"height":20.8},"width":735.33,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-3.png","element":"img","alt":" p% if σ = p%�1/n �ni=1 f2(xi). In all","inline":true,"padRight":true},{"text":"experiments, 90% of the given data is used for training and 10% is used for testing. Training data are used for central subspace estimation by GCR, SCR or SIR respectively and regression of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is estimated by Gaussian kernel regression through the MATLAB built-in function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"fitrker-nel","element":"span"},{"text":". The test data are used to compute the central subspace estimation error ","element":"span"},{"style":{"height":19.62},"width":318.75,"height":49.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-4.png","element":"img","alt":" ∥Proj�Φ − ProjΦ∥","inline":true,"padRight":true},{"text":"and the regression error","element":"span"}],[{"style":{"width":"55%"},"width":1009,"height":159,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-5.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":10.62},"width":80.76,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-6.png","element":"img","alt":" ntest","inline":true,"padRight":true},{"text":"is the number of the test data. In GCR, the parameter ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"text":"is chosen in the order of ","element":"span"},{"style":{"height":16.34},"width":114.4,"height":40.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-7.png","element":"img","alt":"n−1/D ","inline":true,"padRight":true},{"text":"according to ","element":"span"},{"href":"#id-78","text":"(30)","element":"a"},{"text":". We set ","element":"span"},{"style":{"height":16.34},"width":215.3,"height":40.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-8.png","element":"img","alt":" r = 2n−1/D ","inline":true,"padRight":true},{"text":"without specification.","element":"span"}],[{"text":"We expect ","element":"span"},{"href":"#id-78","style":{"height":22.36},"width":1001.95,"height":55.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-9.png","element":"img","alt":" E∥Proj�Φ − ProjΦ∥ ∼ n−1/2, so log10 ∥Proj�Φ − ProjΦ∥","inline":true,"padRight":true},{"text":"scales linearly with respect to log","element":"span"},{"style":{"height":12.19},"width":69.14,"height":30.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-10.png","element":"img","alt":"10 n","inline":true},{"text":", with a slope of ","element":"span"},{"style":{"height":12},"width":67.76,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-11.png","element":"img","alt":" −0.","inline":true},{"text":"5, independently of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D","element":"span"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"4.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Experiment 1: Robustness to non-elliptical distributions","element":"span"}],[{"text":"In the first experiment, we investigate the sensitivity of GCR, SCR and SIR to the condition of elliptical distributions in Assumption ","element":"span"},{"href":"#id-51","text":"2(","element":"a"},{"text":"ii),. Let ","element":"span"},{"style":{"height":19.81},"width":814.9,"height":49.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-12.png","element":"img","alt":" f(x) = x21 where x = [x1, x2]T ∈ R2. This","inline":true,"padRight":true},{"text":"function can be expressed as the model in ","element":"span"},{"href":"#id-40","text":"(1) ","element":"a"},{"text":"with Φ = [1","element":"span"},{"style":{"height":19.53},"width":684.19,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-13.png","element":"img","alt":", 0]T and g(z) = z2 where z = ΦT x.","inline":true,"padRight":true},{"text":"We sample ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"uniformly in the domain [","element":"span"},{"style":{"height":17.6},"width":375.74,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-14.png","element":"img","alt":"−0.5, 0.5]×[−0.5, 0.","inline":true},{"text":"5] excluding the forth quarter, which violates the condition of elliptical distributions. We set ","element":"span"},{"style":{"height":14.8},"width":302.45,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-15.png","element":"img","alt":" r = 0.01, α = 0.","inline":true},{"text":"001 in GCR, ","element":"span"},{"style":{"height":12},"width":167.18,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-16.png","element":"img","alt":" α = 0.01","inline":true,"padRight":true},{"text":"in SCR and 10 slices in SIR. Figure ","element":"span"},{"href":"#id-81","text":"4 ","element":"a"},{"text":"shows 1500 samples of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"(black dots), the direction of Φ (red arrow) and the direction of ","element":"span"},{"style":{"height":0},"width":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/12-17.png","element":"img","alt":"�","inline":true},{"text":"Φ (blue arrow). We observe that GCR is more robust than SCR and SIR when ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"is not elliptically distributed.","element":"span"}],[{"id":"id-81","style":{"width":"93%"},"width":1694,"height":536,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-0.png","element":"img"}],[{"text":"Figure 4: (Experiment 1) central subspace estimation by GCR, SCR and SIR when ","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"x ","element":"figcaption","subtype":"caption"},{"text":"is not elliptically distributed. Samples are displayed in black dots and the direction of Φ is represented by a red arrow. The direction of ","element":"figcaption","subtype":"caption"},{"style":{"height":0},"width":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-1.png","element":"img","alt":"�","inline":true},{"text":"Φ is shown in a blue arrow in (a) by GCR, (b) by SCR and (c) by SIR. We observe that GCR is more robust than SCR and SIR when ","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"x ","element":"figcaption","subtype":"caption"},{"text":"is not elliptically distributed.","element":"figcaption","subtype":"caption"}],[{"id":"id-82","style":{"width":"99%"},"width":1801,"height":765,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-2.png","element":"img"}],[{"text":"Figure 5: (Experiment 2 – comparison of GCR, SCR and SIR with 5% noise.) Log-log plot of the central subspace estimation error versus ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":354.46,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-3.png","element":"img","alt":" n for f1 (a) and f2","inline":true,"padRight":true},{"text":"(b). SIR has the best performance when the function ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"g ","element":"figcaption","subtype":"caption"},{"text":"is monotonic.","element":"figcaption","subtype":"caption"}],[{"style":{"fontWeight":"bold"},"text":"4.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Experiment 2: Monotonic functions","element":"span"}],[{"text":"In the second experiment, we test and compare the performance of GCR, SCR and SIR on two monotonic single index models:","element":"span"}],[{"style":{"width":"53%"},"width":961,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-4.png","element":"img"}],[{"text":"Both functions can be expressed in model ","element":"span"},{"href":"#id-40","text":"(1) ","element":"a"},{"text":"with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"= 10 and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"= 1. They are ","element":"span"},{"style":{"height":17.6},"width":154.93,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-5.png","element":"img","alt":" f1(x) =","inline":true}],[{"style":{"width":"99%"},"width":1804,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-6.png","element":"img"}],[{"style":{"height":17.75},"width":172.92,"height":44.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-7.png","element":"img","alt":"ei ∈ RD ","inline":true,"padRight":true},{"text":"has 1 in the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":"th entry and 0 everywhere else. ","element":"span"},{"text":"In this and the following experiments, the ","element":"span"},{"style":{"height":10.62},"width":38.48,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-8.png","element":"img","alt":" xi","inline":true},{"text":"’s are uniformly sampled from [","element":"span"},{"style":{"height":19.53},"width":137.09,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-9.png","element":"img","alt":"−1, 1]D","inline":true},{"text":", and the sample size varies such that ","element":"span"},{"style":{"height":17.93},"width":700.47,"height":44.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-10.png","element":"img","alt":"n = 103, 103.3, 103.6, 103.9, 104.2, 104.5.","inline":true}],[{"text":"We compare GCR,SCR and SIR with 5% noise. ","element":"span"},{"text":"Figure ","element":"span"},{"href":"#id-82","text":"5 ","element":"a"},{"text":"shows the log-log plot of the central subspace estimation error versus ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"for each function. In GCR, we use ","element":"span"},{"style":{"height":19.53},"width":322.8,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/13-11.png","element":"img","alt":" α = n−D/200 for","inline":true,"padRight":true},{"style":{"height":19.53},"width":517.75,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-0.png","element":"img","alt":"f1 and α = n−D/400 for f2","inline":true},{"text":". For both functions, ","element":"span"},{"style":{"height":17.6},"width":135.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-1.png","element":"img","alt":" α = n/","inline":true},{"text":"2 is used in SCR and each slice in SIR is set to contain about 200 samples. The subspace error in SCR and GCR converges almost in the order of ","element":"span"},{"style":{"height":16.33},"width":103.41,"height":40.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-2.png","element":"img","alt":" n−1/2 ","inline":true,"padRight":true},{"text":"as expected. When ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is monotonic, SIR yields the best performance. We will show in the next following experiments that SIR can easily fail when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is not monotonic, while GCR can handle many more cases.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"4.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Experiment 3: A non-monotonic function","element":"span"}],[{"text":"The third experiment is","element":"span"}],[{"style":{"width":"73%"},"width":1325,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-3.png","element":"img"}],[{"text":"This function can be expressed as the model in ","element":"span"},{"href":"#id-40","text":"(1) ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":20.96},"width":818.05,"height":52.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-4.png","element":"img","alt":" D = 10, d = 2, g(z1, z2) = sin�− π2 + π2 z1�+","inline":true},{"style":{"height":22.46},"width":1066.18,"height":56.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-5.png","element":"img","alt":"z2, z = ΦT x, Φ = [v1, v2] and v1 = 13�9i=1 ei, v2 = e10.","inline":true}],[{"id":"id-83","style":{"width":"99%"},"width":1799,"height":765,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-6.png","element":"img"}],[{"text":"Figure 6: (Experiment 3 – performance of GCR without noise) Log-log plot of the central subspace estimation error (a) and the regression error (b) versus ","element":"figcaption","subtype":"caption"},{"style":{"height":19.13},"width":575.1,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-7.png","element":"img","alt":" n, where α = Cn−1/D in GCR","inline":true,"padRight":true},{"text":"with ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"C ","element":"figcaption","subtype":"caption"},{"text":"= 1","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":", ","element":"figcaption","subtype":"caption"},{"text":"1","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"/","element":"figcaption","subtype":"caption"},{"text":"10","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":", ","element":"figcaption","subtype":"caption"},{"text":"1","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"/","element":"figcaption","subtype":"caption"},{"text":"120","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":", ","element":"figcaption","subtype":"caption"},{"text":"1","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"/","element":"figcaption","subtype":"caption"},{"text":"1200","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":", ","element":"figcaption","subtype":"caption"},{"text":"1","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"/","element":"figcaption","subtype":"caption"},{"text":"12000 respectively.","element":"figcaption","subtype":"caption"}],[{"text":"We first present the performance of GCR on noiseless data, i.e. ","element":"span"},{"style":{"height":8},"width":25,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-8.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"= 0, with different choices of the parameter ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-9.png","element":"img","alt":" α","inline":true},{"text":". In GCR, the parameter ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-10.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"should be chosen as ","element":"span"},{"style":{"height":19.93},"width":452.58,"height":49.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-11.png","element":"img","alt":" α = Cn−1/D according","inline":true,"padRight":true},{"text":"to ","element":"span"},{"href":"#id-78","text":"(31)","element":"a"},{"text":". We set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"10","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"120","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"1200","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"12000 and show the central subspace estimation error versus ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"in Figure ","element":"span"},{"href":"#id-83","text":"6(","element":"a"},{"text":"a) and the regression error versus ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"in Figure ","element":"span"},{"href":"#id-83","text":"6(","element":"a"},{"text":"b). Each error is averaged over 10 experiments. In log-log scale, the errors decay linearly as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"increases with the same rate as long as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"is not too large, as we expect. Our theory predicts the slopes in Figure ","element":"span"},{"href":"#id-83","text":"6(","element":"a"},{"text":"a) as ","element":"span"},{"style":{"height":12},"width":67.76,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-12.png","element":"img","alt":" −0.","inline":true},{"text":"5, which are almost matched by the slopes in Figure ","element":"span"},{"href":"#id-83","text":"6(","element":"a"},{"text":"a). The success of GCR requires the condition ","element":"span"},{"style":{"height":12.44},"width":215.85,"height":31.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-13.png","element":"img","alt":" α < αthresh","inline":true},{"text":", so GCR fails when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"is too large, i.e. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"= 1. The regression error is observed to converge in the order of ","element":"span"},{"style":{"height":18.73},"width":348.06,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-14.png","element":"img","alt":" n−0.5 as long as C","inline":true,"padRight":true},{"text":"is not too large.","element":"span"}],[{"text":"We next compare the performance of GCR, SCR and SIR with noiseless data. We set ","element":"span"},{"style":{"height":8.4},"width":74.88,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-15.png","element":"img","alt":" α =","inline":true},{"style":{"height":20.33},"width":139.14,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-16.png","element":"img","alt":"n−1/D/","inline":true},{"text":"120 in GCR, and ","element":"span"},{"style":{"height":17.6},"width":155.89,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-17.png","element":"img","alt":" α = 2/n","inline":true,"padRight":true},{"text":"in SCR since it provides the best results among many choices. In SIR, each slice contains about 200 samples. We display the central subspace estimation error versus ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"in Figure ","element":"span"},{"href":"#id-84","text":"7(","element":"a"},{"text":"a) and the regression error versus ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"in Figure ","element":"span"},{"href":"#id-84","text":"7(","element":"a"},{"text":"b). Each error is the averaged error of 10 experiments. GCR and SCR perform better than SIR, and GCR is the best. Estimation of the central subspace greatly improves the rate of convergence in comparison with a direct regression in ","element":"span"},{"style":{"height":15.13},"width":74.24,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/14-18.png","element":"img","alt":" RD.","inline":true}],[{"id":"id-84","style":{"width":"99%"},"width":1801,"height":766,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/15-0.png","element":"img"}],[{"text":"Figure 7: (Experiment 3 – Comparison of GCR, SCR and SIR without noise) Log-log plot of the central subspace estimation error (a) and the regression error (b) versus ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n ","element":"figcaption","subtype":"caption"},{"text":"by GCR, SCR and SIR. The black curve in (b) represents the regression error of ","element":"figcaption","subtype":"caption"},{"style":{"height":19.13},"width":149.04,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/15-1.png","element":"img","alt":" f in RD ","inline":true,"padRight":true},{"text":"without an estimation of the central subspace. GCR and SCR perform better than SIR, and GCR is the best. Estimating the central subspace greatly improves the rate of convergence in comparison with a direction regression in ","element":"figcaption","subtype":"caption"},{"style":{"height":15.14},"width":74.24,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/15-2.png","element":"img","alt":" RD.","inline":true}],[{"id":"id-85","style":{"width":"99%"},"width":1800,"height":765,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/15-3.png","element":"img"}],[{"text":"Figure 8: (Experiment 3 – Comparison of GCR, SCR and SIR with 5% noise) Log-log plot of the central subspace estimation error (a) and the regression error (b) versus ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n ","element":"figcaption","subtype":"caption"},{"text":"by GCR, SCR and SIR. The black curve in (b) represents the regression error of ","element":"figcaption","subtype":"caption"},{"style":{"height":19.13},"width":390.5,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/15-4.png","element":"img","alt":" f in RD without an","inline":true,"padRight":true},{"text":"estimation of the central subspace. GCR and SCR perform better than SIR, and GCR is the best. Estimating the central subspace greatly improves the rate of convergence in comparison with a direction regression in ","element":"figcaption","subtype":"caption"},{"style":{"height":15.13},"width":74.24,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/15-5.png","element":"img","alt":" RD.","inline":true}],[{"text":"Results with 5% noise are shown in Figure ","element":"span"},{"href":"#id-85","text":"8. ","element":"a"},{"text":"We set ","element":"span"},{"style":{"height":20.33},"width":699.4,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/15-6.png","element":"img","alt":" α = n−1/D/50 and α = 1/n in GCR","inline":true,"padRight":true},{"text":"and SCR. In SIR, each slice contains about 200 samples. We observe that GCR perform better than SCR and SIR.","element":"span"}],[{"text":"We then compare GCR, SCR and SIR with heavy noise – 50%","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"100% and 120% noise. Visualization of data along the ","element":"span"},{"style":{"height":15.02},"width":195.39,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/15-7.png","element":"img","alt":" v1 and v2","inline":true,"padRight":true},{"text":"directions is shown in the first and second row of Figure ","element":"span"},{"href":"#id-86","text":"9. ","element":"a"},{"text":"The third row shows the log-log plot of the central subspace estimation error by GCR,","element":"span"}],[{"id":"id-86","style":{"width":"94%"},"width":1710,"height":1790,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/16-0.png","element":"img"}],[{"text":"Figure 9: (Experiment 3 – Comparison of GCR, SCR and SIR with heavy noise: 50% noise in the left column, 100% in the middle column and 120% in the right column) The first row shows the visualization of the noiseless ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"f","element":"figcaption","subtype":"caption"},{"text":"(","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"x","element":"figcaption","subtype":"caption"},{"text":") in black and the noisy ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"y ","element":"figcaption","subtype":"caption"},{"text":"in gray along the ","element":"figcaption","subtype":"caption"},{"style":{"height":15.02},"width":229.08,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/16-1.png","element":"img","alt":" v1 direction","inline":true,"padRight":true},{"text":"while ","element":"figcaption","subtype":"caption"},{"style":{"height":14.22},"width":305.98,"height":35.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/16-2.png","element":"img","alt":" −0.1 < x10 < 0.","inline":true},{"text":"1. The second row shows the visualization of the noiseless ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"f","element":"figcaption","subtype":"caption"},{"text":"(","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"x","element":"figcaption","subtype":"caption"},{"text":") in black and the noisy ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"y ","element":"figcaption","subtype":"caption"},{"text":"in gray along the ","element":"figcaption","subtype":"caption"},{"style":{"height":10.62},"width":43.48,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/16-3.png","element":"img","alt":" v2","inline":true,"padRight":true},{"text":"direction while ","element":"figcaption","subtype":"caption"},{"style":{"height":21.6},"width":351.1,"height":54.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/16-4.png","element":"img","alt":" −0.9 < �91 xi < 0.","inline":true},{"text":"9. The third row shows ","element":"figcaption","subtype":"caption"},{"text":"the log-log plot of the central subspace estimation error by GCR, SCR and SIR versus ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n","element":"figcaption","subtype":"caption"},{"text":".","element":"figcaption","subtype":"caption"}],[{"text":"SCR and SIR versus ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". Each error is averaged over 10 experiments. We observe that GCR is very robust against heavy noise – the central subspace estimation error converges in the order of ","element":"span"},{"style":{"height":15.13},"width":129.74,"height":37.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/16-5.png","element":"img","alt":" n−0.569 ","inline":true,"padRight":true},{"text":"with 50% noise, and the rate slightly degrades in the presence of 100% and 120% noise. In comparison, SCR and SIR tend to fail when noise is heavy.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"4.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Experiment 4: A non-monotonic function","element":"span"}],[{"text":"The fourth experiment is","element":"span"}],[{"style":{"width":"66%"},"width":1205,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/16-6.png","element":"img"}],[{"text":"This function can be expressed as the model in ","element":"span"},{"href":"#id-40","text":"(1) ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":19.41},"width":696.67,"height":48.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-0.png","element":"img","alt":" d = 2, D = 10, g(z1, z2) = z21 + 2z22,","inline":true},{"style":{"height":19.64},"width":1085.11,"height":49.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-1.png","element":"img","alt":"z = ΦT x, Φ = [v1, v2] and v1 = e1, v2 = (e2 + e3)/√2.","inline":true,"padRight":true},{"text":"This experiment is more challenging since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is not monotonic along both ","element":"span"},{"style":{"height":15.02},"width":188.3,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-2.png","element":"img","alt":" v1 and v2","inline":true,"padRight":true},{"text":"directions.","element":"span"}],[{"id":"id-87","style":{"width":"99%"},"width":1800,"height":765,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-3.png","element":"img"}],[{"text":"Figure 10: (Experiment 4 – performance of GCR without noise) Log-log plot of the central subspace estimation error (a) and the regression error (b) versus ","element":"figcaption","subtype":"caption"},{"style":{"height":19.14},"width":575.1,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-4.png","element":"img","alt":" n, where α = Cn−1/D in GCR","inline":true,"padRight":true},{"text":"with ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"C ","element":"figcaption","subtype":"caption"},{"text":"= 1","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"/","element":"figcaption","subtype":"caption"},{"text":"0","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":".","element":"figcaption","subtype":"caption"},{"text":"05","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":", ","element":"figcaption","subtype":"caption"},{"text":"1","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"/","element":"figcaption","subtype":"caption"},{"text":"0","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":".","element":"figcaption","subtype":"caption"},{"text":"5","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":", ","element":"figcaption","subtype":"caption"},{"text":"1","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"/","element":"figcaption","subtype":"caption"},{"text":"5","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":", ","element":"figcaption","subtype":"caption"},{"text":"1","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"/","element":"figcaption","subtype":"caption"},{"text":"50","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":", ","element":"figcaption","subtype":"caption"},{"text":"1","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"/","element":"figcaption","subtype":"caption"},{"text":"500 respectively.","element":"figcaption","subtype":"caption"}],[{"text":"Performance of GCR with ","element":"span"},{"style":{"height":20.33},"width":970.06,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-5.png","element":"img","alt":" α = Cn−1/D where C = 1/0.05, 1/0.5, 1/5, 1/50, 1/","inline":true},{"text":"500 on noiseless data is presented in Figure ","element":"span"},{"href":"#id-87","text":"10. ","element":"a"},{"text":"The central subspace estimation error and the regression error are shown in Figure ","element":"span"},{"href":"#id-87","text":"10 ","element":"a"},{"text":"(a) and (b), respectively. Each error is averaged over 10 experiments. Our observation is similar to Experiment 2 that, in log-log scale, both errors decay linearly as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"i","element":"span"},{"href":"#id-88","text":"ncr","element":"a"},{"text":"eases with the same slope, as long as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"is not too large.","element":"span"}],[{"text":"In Figure ","element":"span"},{"href":"#id-88","text":"11, ","element":"a"},{"text":"we compare GCR, SCR and SIR with 5% noise. We set ","element":"span"},{"style":{"height":20.33},"width":335.52,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-6.png","element":"img","alt":" α = n−1/D/50 in","inline":true,"padRight":true},{"text":"GCR and ","element":"span"},{"style":{"height":17.6},"width":155.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-7.png","element":"img","alt":" α = 1/n","inline":true,"padRight":true},{"text":"in SCR. In SIR, each slice contains about 200 samples. We show the central subspace estimation error versus ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"in Figure ","element":"span"},{"href":"#id-88","text":"11 ","element":"a"},{"text":"(a) and the regression error versus ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"in Figure ","element":"span"},{"href":"#id-88","text":"11 ","element":"a"},{"text":"(b). Each error is averaged over 10 experiments. Among the three methods, GCR yields the smallest error and the fastest rate of convergenc","element":"span"},{"href":"#id-89","text":"e.","element":"a"}],[{"text":"Results with 50% noise are shown in Figure ","element":"span"},{"href":"#id-89","text":"12. ","element":"a"},{"text":"We set ","element":"span"},{"style":{"height":20.33},"width":654.65,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-8.png","element":"img","alt":" α = n−1/D in GCR and α = 24/n","inline":true,"padRight":true},{"text":"in SCR. In SIR, each slice contains about 200 samples. The noisy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"(gray) and the noiseless ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":") (black) are shown in Figure ","element":"span"},{"href":"#id-89","text":"12 ","element":"a"},{"text":"(a) along the ","element":"span"},{"style":{"height":10.62},"width":43.48,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-9.png","element":"img","alt":" v1","inline":true,"padRight":true},{"text":"direction with ","element":"span"},{"style":{"height":14.8},"width":447.68,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-10.png","element":"img","alt":" −0.2 < x2 + x3 < 0.2,","inline":true,"padRight":true},{"text":"and in Figure ","element":"span"},{"href":"#id-89","text":"12 ","element":"a"},{"text":"(b) along the ","element":"span"},{"style":{"height":10.62},"width":43.48,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-11.png","element":"img","alt":" v2","inline":true,"padRight":true},{"text":"direction with ","element":"span"},{"style":{"height":14.22},"width":357.36,"height":35.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-12.png","element":"img","alt":" −0.1 < x1 < 0.1.","inline":true,"padRight":true},{"text":"The central subspace estimation error by the three methods are displayed in Figure ","element":"span"},{"href":"#id-89","text":"12 ","element":"a"},{"text":"(c). Each error is averaged over 10 experiments. SCR and SIR perform poorly in this test, while GCR is very robust to heavy noise. The central subspace estimation error decays as ","element":"span"},{"style":{"height":19.13},"width":164.28,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-13.png","element":"img","alt":" O(n−0.43","inline":true},{"text":"), which is close to our prediction of ","element":"span"},{"style":{"height":19.13},"width":178.26,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-14.png","element":"img","alt":" O(n−0.5).","inline":true}],[{"style":{"fontWeight":"bold"},"text":"4.5 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Experiment 5: A non-monotonic function","element":"span"}],[{"text":"In Experiment 3 and 4, the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"can be written as a sum of two single index models. We next compare GCR, SCR and SIR on functions without such structures. Consider","element":"span"}],[{"style":{"width":"49%"},"width":888,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-15.png","element":"img"}],[{"text":"which can be expressed in the model ","element":"span"},{"href":"#id-40","text":"(1) ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":20.96},"width":899.56,"height":52.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-16.png","element":"img","alt":" D = 10, d = 2, g(z1, z2) = 10 sin� π5�z1 + z22��,","inline":true},{"style":{"height":19.64},"width":1062.9,"height":49.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/17-17.png","element":"img","alt":"z = ΦT x, Φ = [v1, v2] and v1 = e1, v2 = (e2 + e3)/√2.","inline":true,"padRight":true},{"text":"We test GCR, SCR and SIR with 5%","element":"span"}],[{"id":"id-88","style":{"width":"99%"},"width":1800,"height":768,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/18-0.png","element":"img"}],[{"text":"Figure 11: (Experiment 4 – Comparison of GCR, SCR and SIR with 5% noise) Log-log plot of the central subspace estimation error (a) and the regression error (b) versus ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n ","element":"figcaption","subtype":"caption"},{"text":"by GCR and SCR. The black curve in (b) represents the regression error of ","element":"figcaption","subtype":"caption"},{"style":{"height":19.13},"width":153.16,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/18-1.png","element":"img","alt":" f in RD ","inline":true,"padRight":true},{"text":"without an estimation of the central subspace. GCR performs better than SCR and SIR. Estimation of the central subspace greatly improves the rate of convergence in comparison with direction regression in ","element":"figcaption","subtype":"caption"},{"style":{"height":15.14},"width":74.24,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/18-2.png","element":"img","alt":"RD.","inline":true}],[{"id":"id-89","style":{"width":"94%"},"width":1699,"height":585,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/18-3.png","element":"img"}],[{"text":"Figure 12: (Experiment 4 – Comparison of GCR, SCR and SIR with 50% noise) Visualization of noiseless ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"f","element":"figcaption","subtype":"caption"},{"text":"(","element":"figcaption","subtype":"caption"},{"style":{"fontWeight":"bold"},"text":"x","element":"figcaption","subtype":"caption"},{"text":") in black and the noisy ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"y ","element":"figcaption","subtype":"caption"},{"text":"in gray along the ","element":"figcaption","subtype":"caption"},{"style":{"height":10.62},"width":43.48,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/18-4.png","element":"img","alt":" v1","inline":true,"padRight":true},{"text":"direction while ","element":"figcaption","subtype":"caption"},{"style":{"height":14.22},"width":393.64,"height":35.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/18-5.png","element":"img","alt":" −0.2 < x2 +x3 < 0.2","inline":true,"padRight":true},{"text":"(a) and along the ","element":"figcaption","subtype":"caption"},{"style":{"height":10.62},"width":43.48,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/18-6.png","element":"img","alt":" v2","inline":true,"padRight":true},{"text":"direction while ","element":"figcaption","subtype":"caption"},{"style":{"height":14.22},"width":289.37,"height":35.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/18-7.png","element":"img","alt":" −0.1 < x1 < 0.","inline":true},{"text":"1 (b). (c) is the log-log plot of the central subspace estimation error versus ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n ","element":"figcaption","subtype":"caption"},{"text":"by GCR, SCR and SIR. GCR performs better than SCR and SIR. GCR is very robust against heavy noise.","element":"figcaption","subtype":"caption"}],[{"text":"and 50% noise. The log-log plot of the central subspace estimation error versus ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is displayed in Figure ","element":"span"},{"href":"#id-90","text":"13. ","element":"a"},{"text":"Each error is averaged over 10 experiments. GCR has a convergence rate of ","element":"span"},{"style":{"height":19.13},"width":181.21,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/18-8.png","element":"img","alt":"O(n−0.565","inline":true},{"text":") with 5% noise and ","element":"span"},{"style":{"height":19.13},"width":181.2,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/18-9.png","element":"img","alt":" O(n−0.447","inline":true},{"text":") with 50% noise, which is robust to heavy noise and performs better than SCR and SIR.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"4.6 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Experiment 6: A non-monotonic function","element":"span"}],[{"text":"Our last experiment is","element":"span"}],[{"style":{"width":"32%"},"width":581,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/18-10.png","element":"img"}],[{"id":"id-90","style":{"width":"93%"},"width":1696,"height":708,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/19-0.png","element":"img"}],[{"text":"Figure 13: (Experiment 5 – Comparison of GCR, SCR and SIR with 5% and 50% noise) Log-log plot of the central subspace estimation error versus ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n ","element":"figcaption","subtype":"caption"},{"text":"by GCR, SCR and SIR with 5% noise (a) and 50% noise (b).","element":"figcaption","subtype":"caption"}],[{"id":"id-91","style":{"width":"93%"},"width":1696,"height":711,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/19-1.png","element":"img"}],[{"text":"Figure 14: (Experiment 6 – Comparison of GCR, SCR and SIR with 5% and 50% noise.) Loglog plot of the central subspace estimation error versus ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"n ","element":"figcaption","subtype":"caption"},{"text":"by GCR, SCR and SIR with 5% noise (a) and 50% noise (b). GCR performs better than SCR and SIR.","element":"figcaption","subtype":"caption"}],[{"text":"which can be expressed in the model ","element":"span"},{"href":"#id-40","text":"(1) ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":19.81},"width":910.54,"height":49.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/19-2.png","element":"img","alt":" D = 10, d = 2, g(z1, z2) = 4z21z22, z = ΦT x, Φ =","inline":true,"padRight":true},{"text":"[","element":"span"},{"style":{"height":19.64},"width":753.39,"height":49.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/19-3.png","element":"img","alt":"v1, v2] and v1 = e1, v2 = (e2 + e3)/√2.","inline":true}],[{"text":"We test GCR, SCR and SIR with 5% and 50% noise. The log-log plot of the central subspace estimation error versus ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is displayed in Figure ","element":"span"},{"href":"#id-91","text":"14. ","element":"a"},{"text":"Each error is averaged over 10 experiments. GCR has a convergence rate of ","element":"span"},{"style":{"height":19.13},"width":181.21,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/19-4.png","element":"img","alt":" O(n−0.575","inline":true},{"text":") with 5% noise and ","element":"span"},{"style":{"height":19.13},"width":181.2,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/19-5.png","element":"img","alt":" O(n−0.523","inline":true},{"text":") with 50% noise, which is robust to heavy noise and performs significantly better than SCR and SIR.","element":"span"}]]},{"heading":"5 Proof of main results","paragraphs":[[{"text":"This section contains the proof of our main results in Section ","element":"span"},{"text":"3. ","element":"span"},{"text":"Lemmas are proved in the Supplementary materials.","element":"span"}],[{"id":"id-55","style":{"fontWeight":"bold"},"text":"5.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Concentration inequality for matrix-valued U-statistics and proof of Theorem ","element":"span"},{"href":"#id-54","style":{"fontWeight":"bold"},"text":"1","element":"a"}],[{"text":"U-statistics was introduced by Hoeffding ","element":"span"},{"href":"#id-92","referenceIndex":26,"text":"[26] ","element":"a"},{"text":"and the concentration inequalities for scalar-valued U-statistics have been well studied ","element":"span"},{"href":"#id-92","referenceIndex":26,"text":"[26]","element":"a"},{"text":",","element":"span"},{"href":"#id-93","referenceIndex":47,"text":"[47, ","element":"a"},{"text":"Chapter 5]. To our knowledge, there are limited results on matrix-valued U-statistics. The concentration inequality for a modified matrix-valued U-statistics can be found in ","element":"span"},{"href":"#id-94","referenceIndex":42,"text":"[42]","element":"a"},{"text":". In this paper, we prove a Bernstein inequality for matrix-valued U-statistics using the tools in ","element":"span"},{"href":"#id-95","referenceIndex":49,"text":"[49, ","element":"a"},{"href":"#id-96","referenceIndex":1,"text":"1]","element":"a"},{"text":". We denote ","element":"span"},{"style":{"height":15.13},"width":117.02,"height":37.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-0.png","element":"img","alt":" HD×D ","inline":true,"padRight":true},{"text":"as the set of ","element":"span"},{"style":{"height":12},"width":111.68,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-1.png","element":"img","alt":" D×D","inline":true,"padRight":true},{"text":"real-valued Hermitian matrices.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 4 ","element":"span"},{"text":"(Matrix U-statistics)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a random variable with probability distribution ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-2.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"in ","element":"span"},{"style":{"height":18.33},"width":726.1,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-3.png","element":"img","alt":" RD, and W : RD × · · · × RD → HD×D ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a matrix-valued kernel with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"style":{"fontStyle":"italic"},"text":"inputs. Suppose we can access ","element":"span"},{"style":{"height":13.6},"width":123.55,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-4.png","element":"img","alt":" n ≥ m","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"i.i.d. copies of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":", denoted by ","element":"span"},{"style":{"height":18.09},"width":138.51,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-5.png","element":"img","alt":" {xi}ni=1","inline":true},{"style":{"fontStyle":"italic"},"text":". The U-statistic of order ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and kernel ","element":"span"},{"style":{"height":18.09},"width":377.59,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-6.png","element":"img","alt":"W based on {xi}ni=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is defined as","element":"span"}],[{"style":{"width":"99%"},"width":1798,"height":263,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-7.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"is the set of all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"style":{"fontStyle":"italic"},"text":"-tuple of integers between 1 and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"text":"We say ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"is permutation symmetric if for any (","element":"span"},{"style":{"height":11.2},"width":177.05,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-8.png","element":"img","alt":"x1, ..., xm","inline":true},{"text":") and any permutation ","element":"span"},{"style":{"height":17.6},"width":355.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-9.png","element":"img","alt":" π, W(x1, ..., xm) =","inline":true},{"style":{"height":17.6},"width":502.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-10.png","element":"img","alt":"W(xπ1, ..., xπm). When W","inline":true,"padRight":true},{"text":"is permutation symmetric, ","element":"span"},{"style":{"height":14.62},"width":50.79,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-11.png","element":"img","alt":" Un","inline":true,"padRight":true},{"text":"can be written as","element":"span"}],[{"id":"id-97","style":{"width":"70%"},"width":1265,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-12.png","element":"img"}],[{"text":"U-statistics are unbiased estimators of ","element":"span"},{"style":{"height":17.6},"width":270.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-13.png","element":"img","alt":" EW(x1, ..., xm","inline":true},{"text":") where the expectation is taken over the joint distribution of ","element":"span"},{"style":{"height":11.2},"width":177.05,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-14.png","element":"img","alt":" x1, ..., xm","inline":true},{"text":". We use tools from ","element":"span"},{"href":"#id-95","referenceIndex":49,"text":"[49, ","element":"a"},{"href":"#id-96","referenceIndex":1,"text":"1] ","element":"a"},{"text":"to prove the following lemma about the concentration of the matrix-valued U-statistics:","element":"span"}],[{"id":"id-98","style":{"fontWeight":"bold"},"text":"Lemma 2 ","element":"span"},{"text":"(Bernstein inequality for matrix-valued U-statistics)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a random variable with probability distribution ","element":"span"},{"style":{"height":19.13},"width":895.4,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-15.png","element":"img","alt":" ρ in RD, and W : RD × · · · × RD → HD×D ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a permutation symmetric matrix-valued kernel with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"style":{"fontStyle":"italic"},"text":"inputs. Assume","element":"span"}],[{"id":"id-99","style":{"width":"95%"},"width":1724,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-16.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Suppose we can access ","element":"span"},{"style":{"height":13.6},"width":124.46,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-17.png","element":"img","alt":" n ≥ m","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"i.i.d. copies of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":", denoted by ","element":"span"},{"style":{"height":18.09},"width":301.22,"height":45.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-18.png","element":"img","alt":" {xi}ni=1. Let Un","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be the U-statistics ","element":"span"},{"style":{"fontStyle":"italic"},"text":"of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"style":{"fontStyle":"italic"},"text":"defined in ","element":"span"},{"href":"#id-97","style":{"fontStyle":"italic"},"text":"(37)","element":"a"},{"style":{"fontStyle":"italic"},"text":". Then for any ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t > ","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"}],[{"style":{"width":"76%"},"width":1374,"height":112,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-19.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-98","style":{"fontStyle":"italic"},"text":"2. ","element":"a"},{"text":"Let ","element":"span"},{"style":{"height":20.96},"width":315.75,"height":52.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-20.png","element":"img","alt":" k =� nm�. Define","inline":true}],[{"style":{"width":"79%"},"width":1439,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-21.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"(","element":"span"},{"style":{"height":35.78},"width":272.67,"height":89.45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-22.png","element":"img","alt":"x1, ..., xn) = 1k","inline":true}],[{"id":"id-100","style":{"width":"80%"},"width":1445,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-23.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.95},"width":1024.43,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-24.png","element":"img","alt":" Vi = W(x(i−1)m+1, x(i−1)m+2, ..., xim). Note that V","inline":true,"padRight":true},{"text":"is the average of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"independent empirical realizations of ","element":"span"},{"style":{"height":12.4},"width":191.07,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-25.png","element":"img","alt":" W. Let π","inline":true,"padRight":true},{"text":"be a permutation of (1","element":"span"},{"style":{"height":17.6},"width":676.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-26.png","element":"img","alt":", 2, ..., n) and V π = V (xπ1, ..., xπn).","inline":true,"padRight":true},{"text":"The U-statistics in ","element":"span"},{"href":"#id-97","text":"(36) ","element":"a"},{"text":"can be written as","element":"span"}],[{"style":{"width":"58%"},"width":1058,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/20-27.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":12},"width":43.6,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-0.png","element":"img","alt":" In ","inline":true,"padRight":true},{"text":"denotes the set of all permutations of (1","element":"span"},{"style":{"height":17.6},"width":860.82,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-1.png","element":"img","alt":", 2, ..., n). We have EUn = EV π = EV for any","inline":true,"padRight":true},{"text":"permutation ","element":"span"},{"style":{"height":12.8},"width":157.08,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-2.png","element":"img","alt":" π. Then","inline":true}],[{"style":{"width":"66%"},"width":1197,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-3.png","element":"img"}],[{"text":"For any ","element":"span"},{"style":{"height":15.2},"width":107.57,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-4.png","element":"img","alt":" µ, t >","inline":true,"padRight":true},{"text":"0, we have","element":"span"}],[{"id":"id-101","style":{"width":"97%"},"width":1751,"height":487,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-5.png","element":"img"}],[{"text":"Next we bound ","element":"span"},{"style":{"height":16.33},"width":205.83,"height":40.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-6.png","element":"img","alt":" Eeµ∥V −EV ∥","inline":true},{"text":". By the assumption in ","element":"span"},{"href":"#id-99","text":"(38)","element":"a"},{"text":", for each ","element":"span"},{"style":{"height":14.62},"width":37.46,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-7.png","element":"img","alt":" Vi","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-100","text":"(40)","element":"a"},{"text":", we have","element":"span"}],[{"style":{"width":"78%"},"width":1412,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-8.png","element":"img"}],[{"text":"The Bernstein inequality for U-statistics is derived through several important results in ","element":"span"},{"href":"#id-95","referenceIndex":49,"text":"[49, ","element":"a"},{"href":"#id-96","referenceIndex":1,"text":"1]","element":"a"},{"text":". For 0 ","element":"span"},{"style":{"height":17.6},"width":265.55,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-9.png","element":"img","alt":" < µ/k < 3/R,","inline":true}],[{"style":{"width":"89%"},"width":1622,"height":883,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-10.png","element":"img"}],[{"text":"where tr(","element":"span"},{"style":{"height":5.6},"width":12,"height":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-11.png","element":"img","alt":"·","inline":true},{"text":") denotes the trace of its argument. Combining this bound with ","element":"span"},{"href":"#id-101","text":"(43) ","element":"a"},{"text":"gives rise to","element":"span"}],[{"style":{"width":"77%"},"width":1404,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-12.png","element":"img"}],[{"text":"Setting ","element":"span"},{"style":{"height":27.35},"width":367.42,"height":68.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-13.png","element":"img","alt":" µ = tkσ2W +Rt/3 yields","inline":true}],[{"style":{"width":"76%"},"width":1384,"height":406,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/21-14.png","element":"img"}],[{"text":"Theorem 1 is a direct result of Lemma 2 by taking ","element":"span"},{"href":"#id-54","style":{"height":41.59},"width":1804.88,"height":103.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-0.png","element":"img","alt":" W = �H(α) with m = 2, R = 4B2 andσ2W = 16B4.","inline":true}],[{"id":"id-58","style":{"fontWeight":"bold"},"text":"5.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-60","style":{"fontWeight":"bold"},"text":"2","element":"a"}],[{"text":"To prove Theorem ","element":"span"},{"href":"#id-60","text":"2, ","element":"a"},{"text":"we need the following two lemmas on matrix perturbation theory.","element":"span"}],[{"id":"id-105","style":{"fontWeight":"bold"},"text":"Lemma 3 ","element":"span"},{"text":"(Davis-Kahan ","element":"span"},{"href":"#id-102","referenceIndex":15,"text":"[15, ","element":"a"},{"href":"#id-103","referenceIndex":48,"text":"48]","element":"a"},{"text":")","element":"span"},{"style":{"height":18.33},"width":390.43,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-1.png","element":"img","alt":". Let W, �W ∈ CD×D ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be two Hermitian matrices with eigenvalues ","element":"span"},{"style":{"height":17.42},"width":362.29,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-2.png","element":"img","alt":" λj, �λj, j = 1, . . . , D","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"in non-increasing order. Let the columns of ","element":"span"},{"text":"Φ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":12},"width":32,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-3.png","element":"img","alt":"�Φ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"consist of the eigenvectors associated with ","element":"span"},{"style":{"height":22.85},"width":827.7,"height":57.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-4.png","element":"img","alt":" {λj}Dj=D−d+1 of W and {�λj}Dj=D−d+1 of �W","inline":true},{"style":{"fontStyle":"italic"},"text":", respectively. Suppose","element":"span"}],[{"style":{"width":"99%"},"width":1799,"height":205,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-5.png","element":"img"}],[{"id":"id-106","style":{"fontWeight":"bold"},"text":"Lemma 4 ","element":"span"},{"text":"(Weyl ","element":"span"},{"href":"#id-104","referenceIndex":52,"text":"[52]","element":"a"},{"text":")","element":"span"},{"style":{"height":18.33},"width":375.33,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-6.png","element":"img","alt":". Let W, �W ∈ CD×D ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be two Hermitian matrices with eigenvalues ","element":"span"},{"style":{"height":17.42},"width":191.34,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-7.png","element":"img","alt":" λj, �λj, j =","inline":true,"padRight":true},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . , D ","element":"span"},{"style":{"fontStyle":"italic"},"text":"in non-increasing order. Then","element":"span"}],[{"style":{"width":"68%"},"width":1235,"height":59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-60","style":{"fontStyle":"italic"},"text":"2. ","element":"a"},{"text":"By Proposition ","element":"span"},{"href":"#id-50","text":"1, ","element":"a"},{"text":"when ","element":"span"},{"style":{"height":12.44},"width":209.34,"height":31.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-9.png","element":"img","alt":" α < αthresh","inline":true},{"text":", the eigenvectors associated with the smallest ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"eigenvalues of ","element":"span"},{"style":{"height":17.6},"width":84.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-10.png","element":"img","alt":" H(α","inline":true},{"text":") span the central subspace. Recall that the column space of ","element":"span"},{"style":{"height":12.4},"width":74.64,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-11.png","element":"img","alt":"�Φ is","inline":true,"padRight":true},{"text":"the eigenspace associated with the smallest ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"eigenvalues of ","element":"span"},{"style":{"height":17.6},"width":84.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-12.png","element":"img","alt":"�H(α","inline":true},{"text":"). To derive the relation between Proj","element":"span"},{"style":{"height":18.82},"width":243.4,"height":47.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-13.png","element":"img","alt":"�Φ and ProjΦ","inline":true,"padRight":true},{"text":"using Lemma ","element":"span"},{"href":"#id-105","text":"3 ","element":"a"},{"text":"and Lemma ","element":"span"},{"href":"#id-106","text":"4, ","element":"a"},{"text":"we need an eigengap of ","element":"span"},{"style":{"height":17.6},"width":84.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-14.png","element":"img","alt":" H(α","inline":true},{"text":"). Denote the eigenvalues of ","element":"span"},{"style":{"height":26.41},"width":829.06,"height":66.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-15.png","element":"img","alt":" H(α) and G(α) by {λ(H)j }Dj=1 and {λ(G)j }Dj=1 ","inline":true,"padRight":true},{"text":"in non-increasing order, respectively. ","element":"span"},{"text":"We have ","element":"span"},{"style":{"height":26.41},"width":626.44,"height":66.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-16.png","element":"img","alt":" λ(H)j = pαλ(G)j . When α < αthresh","inline":true},{"text":", with Assumption ","element":"span"},{"href":"#id-49","text":"1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-51","text":"2(","element":"a"},{"text":"iii), Proposition ","element":"span"},{"href":"#id-50","text":"1 ","element":"a"},{"text":"implies","element":"span"}],[{"style":{"width":"71%"},"width":1284,"height":160,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-17.png","element":"img"}],[{"text":"Combining Lemma ","element":"span"},{"href":"#id-105","text":"3 ","element":"a"},{"text":"and Lemma ","element":"span"},{"href":"#id-106","text":"4 ","element":"a"},{"text":"and setting ","element":"span"},{"style":{"height":17.6},"width":416.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-18.png","element":"img","alt":" W = H(α), �W = �H(α","inline":true},{"text":") give rise to","element":"span"}],[{"style":{"width":"57%"},"width":1032,"height":269,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-19.png","element":"img"}],[{"text":"Notice that ","element":"span"},{"style":{"height":19.62},"width":813.72,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-20.png","element":"img","alt":" ∥Proj�Φ − ProjΦ∥ ≤ 2. For any 0 < t <","inline":true,"padRight":true},{"text":"2, a sufficient condition to guarantee","element":"span"}],[{"style":{"width":"84%"},"width":1526,"height":171,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-21.png","element":"img"}],[{"text":"Since 0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"< t < ","element":"span"},{"text":"2, we can express this sufficient condition as ","element":"span"},{"style":{"height":17.6},"width":500.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-22.png","element":"img","alt":" ∥ �H(α)−H(α)∥ ≤ pαc0t/3.","inline":true,"padRight":true},{"text":"Therefore, for any ","element":"span"},{"style":{"height":17.6},"width":178.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-23.png","element":"img","alt":" t ∈ (0, 2),","inline":true}],[{"style":{"width":"81%"},"width":1473,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-24.png","element":"img"}],[{"text":"Combining ","element":"span"},{"href":"#id-107","text":"(9) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-108","text":"(48) ","element":"a"},{"text":"gives rise to ","element":"span"},{"href":"#id-59","text":"(11)","element":"a"},{"text":".","element":"span"}],[{"id":"id-108","style":{"width":"1%"},"width":30,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/22-25.png","element":"img"}],[{"id":"id-62","style":{"fontWeight":"bold"},"text":"5.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Corollary ","element":"span"},{"href":"#id-61","style":{"fontWeight":"bold"},"text":"1","element":"a"}],[{"id":"id-109","style":{"height":16.4},"width":402.46,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-0.png","element":"img","alt":"Lemma 5. If X ≥ 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a random variable such that","element":"span"}],[{"style":{"width":"29%"},"width":537,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-1.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for some ","element":"span"},{"style":{"fontStyle":"italic"},"text":"a, b, c > ","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":", then","element":"span"}],[{"style":{"width":"39%"},"width":712,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-2.png","element":"img"}],[{"text":"Lemma ","element":"span"},{"href":"#id-109","text":"5 ","element":"a"},{"text":"is proved in Supplementary materials ","element":"span"},{"text":"E. ","element":"span"},{"text":"Corollary ","element":"span"},{"href":"#id-61","text":"1 ","element":"a"},{"text":"is proved based on Lemma ","element":"span"},{"href":"#id-109","text":"5.","element":"a"}],[{"style":{"fontWeight":"bold"},"text":"Proof of Corollary ","element":"span"},{"href":"#id-61","style":{"fontWeight":"bold"},"text":"1. ","element":"a"},{"text":"Combining Theorem ","element":"span"},{"href":"#id-60","text":"2 ","element":"a"},{"text":"and Lemma ","element":"span"},{"href":"#id-109","text":"5 ","element":"a"},{"text":"gives rise to","element":"span"}],[{"style":{"width":"78%"},"width":1422,"height":420,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.6},"width":336.72,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-4.png","element":"img","alt":" C1, C2, C3 and C4","inline":true,"padRight":true},{"text":"are given in ","element":"span"},{"href":"#id-67","text":"(13)","element":"a"},{"text":".","element":"span"}],[{"style":{"width":"1%"},"width":30,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-5.png","element":"img"}],[{"id":"id-68","style":{"fontWeight":"bold"},"text":"5.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-72","style":{"fontWeight":"bold"},"text":"3","element":"a"}],[{"style":{"width":"1%"},"width":24,"height":3,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-6.png","element":"img"}],[{"text":"In the regression model ","element":"span"},{"href":"#id-110","text":"(14)","element":"a"},{"text":", if we condition on the recovered central subspace ","element":"span"},{"style":{"height":0},"width":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-7.png","element":"img","alt":"�","inline":true},{"text":"Φ, the samples ","element":"span"},{"style":{"height":20.82},"width":181.23,"height":52.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-8.png","element":"img","alt":" {zi}2ni=n+1 ","inline":true,"padRight":true},{"text":"are independently sampled from the marginal distribution of ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-9.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"text":"on the subspace ","element":"span"},{"text":"spanned by ","element":"span"},{"style":{"height":0},"width":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-10.png","element":"img","alt":"�","inline":true},{"text":"Φ. Denote this marginal distribution by ","element":"span"},{"style":{"height":12},"width":26,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-11.png","element":"img","alt":" µ","inline":true},{"text":". For any set Ω ","element":"span"},{"style":{"height":19.53},"width":421.71,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-12.png","element":"img","alt":" ⊂ Rd, µ(Ω) = ρ({x ∈","inline":true},{"style":{"height":19.53},"width":270.58,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-13.png","element":"img","alt":"RD|�Φ⊤x ∈ Ω}","inline":true},{"text":"). According to ","element":"span"},{"href":"#id-111","text":"(21)","element":"a"},{"text":", the regression error of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"crucially depends on the estimation error of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":", which is established in the following lemma.","element":"span"}],[{"id":"id-113","style":{"height":20.82},"width":496.94,"height":52.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-14.png","element":"img","alt":"Lemma 6. Let {zi}2ni=n+1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be i.i.d. samples from a probability measure ","element":"span"},{"style":{"height":17.6},"width":432.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-15.png","element":"img","alt":" µ such that supp(µ) ⊆","inline":true,"padRight":true},{"text":"[","element":"span"},{"style":{"height":21.22},"width":429.28,"height":53.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-16.png","element":"img","alt":"−B, B]d and {yi}2ni=n+1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"follows the model in ","element":"span"},{"href":"#id-110","style":{"fontStyle":"italic"},"text":"(14)","element":"a"},{"style":{"fontStyle":"italic"},"text":". Let ","element":"span"},{"style":{"height":12},"width":22.28,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-17.png","element":"img","alt":" �g","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be the estimator in ","element":"span"},{"href":"#id-71","style":{"fontStyle":"italic"},"text":"(19) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"with polynomial ","element":"span"},{"style":{"fontStyle":"italic"},"text":"order ","element":"span"},{"style":{"height":17.6},"width":216.85,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-18.png","element":"img","alt":" k = ⌈s⌉ − 1","inline":true},{"style":{"fontStyle":"italic"},"text":". Under Assumption ","element":"span"},{"href":"#id-51","style":{"fontStyle":"italic"},"text":"2(","element":"a"},{"style":{"fontStyle":"italic"},"text":"i), ","element":"span"},{"href":"#id-52","style":{"fontStyle":"italic"},"text":"3, ","element":"a"},{"href":"#id-69","style":{"fontStyle":"italic"},"text":"4, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"we have","element":"span"}],[{"style":{"width":"88%"},"width":1605,"height":228,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-19.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is a universal constant and ","element":"span"},{"style":{"height":20.05},"width":76.86,"height":50.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-20.png","element":"img","alt":" b2bias ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is defined in ","element":"span"},{"href":"#id-112","style":{"fontStyle":"italic"},"text":"(17)","element":"a"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"text":"Lemma ","element":"span"},{"href":"#id-113","text":"6 ","element":"a"},{"text":"is proved in Supplementary materials ","element":"span"},{"text":"F ","element":"span"},{"text":"with classical results in nonparametric regression ","element":"span"},{"href":"#id-20","referenceIndex":23,"text":"[23, ","element":"a"},{"href":"#id-114","referenceIndex":45,"text":"45, ","element":"a"},{"href":"#id-115","referenceIndex":21,"text":"21, ","element":"a"},{"href":"#id-116","referenceIndex":13,"text":"13]","element":"a"},{"text":". The proof of Theorem ","element":"span"},{"href":"#id-72","text":"3 ","element":"a"},{"text":"is given below.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-72","style":{"fontWeight":"bold"},"text":"3. ","element":"a"},{"text":"From ","element":"span"},{"href":"#id-117","text":"(3)","element":"a"},{"text":", ","element":"span"},{"href":"#id-111","text":"(21) ","element":"a"},{"text":"and the fact that ","element":"span"},{"style":{"height":17.6},"width":341.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-21.png","element":"img","alt":" ∥x∥ ≤ B, we have","inline":true}],[{"id":"id-118","style":{"width":"72%"},"width":1311,"height":61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-22.png","element":"img"}],[{"text":"We use ","element":"span"},{"style":{"height":21.98},"width":241.95,"height":54.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-23.png","element":"img","alt":" E{(xi,yi)}2ni=1(·","inline":true},{"text":") to denote the expectation with respect to the joint distribution of ","element":"span"},{"style":{"height":19.62},"width":240.71,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-24.png","element":"img","alt":" {(xi, yi)}2ni=1.","inline":true,"padRight":true},{"text":"The regression error of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"can be expressed as","element":"span"}],[{"style":{"width":"32%"},"width":587,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/23-25.png","element":"img"}],[{"style":{"width":"93%"},"width":1692,"height":362,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/24-0.png","element":"img"}],[{"text":"Corollary ","element":"span"},{"href":"#id-61","text":"1 ","element":"a"},{"text":"gives an estimate of the first term:","element":"span"}],[{"id":"id-120","style":{"width":"68%"},"width":1238,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/24-1.png","element":"img"}],[{"text":"The second term in ","element":"span"},{"href":"#id-118","text":"(50) ","element":"a"},{"text":"can be estimated through Lemma ","element":"span"},{"href":"#id-113","text":"6 ","element":"a"},{"text":"as follows:","element":"span"}],[{"id":"id-119","style":{"width":"87%"},"width":1584,"height":982,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/24-2.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.02},"width":48.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/24-3.png","element":"img","alt":" C5","inline":true,"padRight":true},{"text":"is defined in Theorem ","element":"span"},{"href":"#id-72","text":"3 ","element":"a"},{"text":"and ","element":"span"},{"style":{"height":31.44},"width":492.48,"height":78.59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/24-4.png","element":"img","alt":" C = 2c�d+kd �+4C2gB2sds+k(k!)2","inline":true,"padRight":true},{"text":"independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". The last inequality in ","element":"span"},{"href":"#id-119","text":"(52) ","element":"a"},{"text":"results from","element":"span"}],[{"style":{"width":"31%"},"width":564,"height":82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/24-5.png","element":"img"}],[{"text":"by ","element":"span"},{"href":"#id-117","text":"(3)","element":"a"},{"text":", Corollary ","element":"span"},{"href":"#id-61","text":"1 ","element":"a"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"text":"chosen according to ","element":"span"},{"href":"#id-70","text":"(20)","element":"a"},{"text":". Finally, combining ","element":"span"},{"href":"#id-120","text":"(51) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-119","text":"(52) ","element":"a"},{"text":"gives rise to ","element":"span"},{"href":"#id-121","text":"(22)","element":"a"},{"text":".","element":"span"}],[{"style":{"width":"1%"},"width":30,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/24-6.png","element":"img"}]]},{"heading":"6 Conclusion","paragraphs":[[{"text":"In this paper, we combine GCR and piecewise polynomial approximations to estimate a high-dimensional function which varies along a low-dimensional central subspace. We prove that, the mean squared estimation error for the central subspace is ","element":"span"},{"style":{"height":19.13},"width":121,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/24-7.png","element":"img","alt":" O(n−1","inline":true},{"text":") and the mean squared regression error is ","element":"span"},{"style":{"height":31.6},"width":377.09,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/24-8.png","element":"img","alt":" O�(n/ log n)− 2s2s+d�","inline":true},{"text":". A modified GCR algorithm with improved efficiency is also proposed. Numerical experiments verify our theories and demonstrate that the modified GCR has the same accuracy as that in our theoretical results.","element":"span"}]]},{"heading":"Acknowledgement","paragraphs":[[{"text":"Wenjing Liao’s research is supported by NSF DMS 1818751 and DMS 2012652. Both authors are grateful to Alessandro Lanteri, Mauro Maggioni and Stefano Vigogna for pointing out the noise bias in the regression model ","element":"span"},{"href":"#id-110","text":"(14)","element":"a"},{"text":", which was not taken care of in the first version of this manuscript. This problem has been fixed in the current version.","element":"span"}]]},{"heading":"References","paragraphs":[[{"id":"id-96","text":"[1] R. Ahlswede and A. Winter. Strong converse for identification via quantum channels. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Transactions on Information Theory","element":"span"},{"text":", 48(3):569–579, 2002.","element":"span"}],[{"id":"id-5","text":"[2] I. Babuˇska, R. Temponet, and G. E. Zouraris. ","element":"span"},{"text":"Galerkin finite element approximations of stochastic elliptic partial differential equations. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"SIAM Journal on Numerical Analysis","element":"span"},{"text":", 2004.","element":"span"}],[{"id":"id-29","text":"[3] E. Bura and R. D. Cook. Estimating the structural dimension of regressions via para- ","element":"span"},{"text":"metric inverse regression. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the Royal Statistical Society. Series B: Statistical Methodology","element":"span"},{"text":", 2001.","element":"span"}],[{"id":"id-0","text":"[4] E. Bura and R. M. Pfeiffer. Graphical methods for class prediction using dimension reduc- ","element":"span"},{"text":"tion techniques on DNA microarray data. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Bioinformatics","element":"span"},{"text":", 2003.","element":"span"}],[{"id":"id-42","text":"[5] Y. Chen, Y. Chi, J. Fan, and C. Ma. Spectral methods for data science: A statistical ","element":"span"},{"text":"perspective. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:2012.08496","element":"span"},{"text":", 2020.","element":"span"}],[{"id":"id-19","text":"[6] C. Chesneau. Regression with random design: a minimax study. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Statistics & probability letters","element":"span"},{"text":", 77(1):40–53, 2007.","element":"span"}],[{"id":"id-23","text":"[7] A. Cohen, I. Daubechies, R. DeVore, G. Kerkyacharian, and D. Picard. Capturing Ridge ","element":"span"},{"text":"Functions in High Dimensions from Point Queries. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Constructive Approximation","element":"span"},{"text":", 2012.","element":"span"}],[{"id":"id-1","text":"[8] P. G. Constantine, E. Dow, and Q. Wang. Active subspace methods in theory and practice: ","element":"span"},{"text":"applications to kriging surfaces. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"SIAM Journal on Scientific Computing","element":"span"},{"text":", 36(4):A1500– A1524, 2014.","element":"span"}],[{"id":"id-2","text":"[9] P. G. Constantine, B. Zaharatos, and M. Campanelli. Discovering an active subspace in a ","element":"span"},{"text":"single-diode solar cell model. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Statistical Analysis and Data Mining: The ASA Data Science Journal","element":"span"},{"text":", 8(5-6):264–273, 2015.","element":"span"}],[{"id":"id-9","text":"[10] R. D. Cook and B. Li. Dimension reduction for conditional mean in regression. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 30(2):455–474, 2002.","element":"span"}],[{"id":"id-31","text":"[11] R. D. Cook and S. Weisberg. Discussion of sliced inverse regression for dimension reduction. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the American Statistical Association","element":"span"},{"text":", 86(414):328–332, 1991.","element":"span"}],[{"id":"id-32","text":"[12] R. D. Cook and S. Weisberg. Sliced inverse regression for dimension reduction: Comment. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the American Statistical Association","element":"span"},{"text":", 86(414):328–332, 1991.","element":"span"}],[{"id":"id-116","text":"[13] D. D. Cox et al. Approximation of least squares regression on nested subspaces. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 16(2):713–732, 1988.","element":"span"}],[{"id":"id-11","text":"[14] A. S. Dalalyan, A. Juditsky, and V. Spokoiny. A new algorithm for estimating the effective ","element":"span"},{"text":"dimension-reduction subspace. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Journal of Machine Learning Research","element":"span"},{"text":", 9:1647–1678, 2008.","element":"span"}],[{"id":"id-102","text":"[15] C. Davis and W. M. Kahan. The rotation of eigenvectors by a perturbation. iii. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"SIAM Journal on Numerical Analysis","element":"span"},{"text":", 7(1):1–46, 1970.","element":"span"}],[{"id":"id-33","text":"[16] R. Dennis Cook. ","element":"span"},{"text":"Save: a method for dimension reduction and graphics in regression. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Communications in statistics-Theory and methods","element":"span"},{"text":", 29(9-10):2109–2121, 2000.","element":"span"}],[{"id":"id-44","text":"[17] K.-T. Fang, S. Kotz, and K. W. Ng. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Symmetric multivariate and related distributions","element":"span"},{"text":". Chapman and Hall/CRC, 2018.","element":"span"}],[{"id":"id-24","text":"[18] M. Fornasier, K. Schnass, and J. Vybiral. Learning Functions of Few Arbitrary Linear ","element":"span"},{"text":"Parameters in High Dimensions. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Foundations of Computational Mathematics","element":"span"},{"text":", 2012.","element":"span"}],[{"id":"id-30","text":"[19] W. K. Fung, X. He, L. Liu, and P. Shi. Dimension reduction based on canonical correlation. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Statistica Sinica","element":"span"},{"text":", pages 1093–1113, 2002.","element":"span"}],[{"id":"id-17","text":"[20] S. Ga¨ıffas, G. Lecu´e, et al. Optimal rates and adaptation in the single-index model using ","element":"span"},{"text":"aggregation. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Electronic journal of statistics","element":"span"},{"text":", 1:538–573, 2007.","element":"span"}],[{"id":"id-115","text":"[21] S. Geer, S. van de Geer, R. Gill, B. Ripley, S. Ross, B. Silverman, and M. Stein. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Empirical Processes in M-Estimation","element":"span"},{"text":". Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2000.","element":"span"}],[{"id":"id-3","text":"[22] P. E. Gill, W. Murray, and M. H. Wright. Practical optimization. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"London: Academic Press, 1981","element":"span"},{"text":", 1981.","element":"span"}],[{"id":"id-20","text":"[23] L. Gy¨orfi, M. Kohler, A. Krzyzak, and H. Walk. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A Distribution-Free theory of Nonparametric Regression","element":"span"},{"text":". Springer Science & Business Media, 2006.","element":"span"}],[{"id":"id-14","text":"[24] W. H¨ardle and T. M. Stoker. Investigating smooth multiple regression by the method of ","element":"span"},{"text":"average derivatives. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the American Statistical Association","element":"span"},{"text":", 1989.","element":"span"}],[{"id":"id-15","text":"[25] W. H¨ardle and A. B. Tsybakov. How sensitive are average derivatives? ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 58(1-2):31–48, 1993.","element":"span"}],[{"id":"id-92","text":"[26] W. Hoeffding. A class of statistics with asymptotically normal distribution. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Breakthroughs in statistics","element":"span"},{"text":", pages 308–334. Springer, 1992.","element":"span"}],[{"id":"id-126","text":"[27] W. Hoeffding. ","element":"span"},{"text":"Probability inequalities for sums of bounded random variables. ","element":"span"},{"text":"In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Collected Works of Wassily Hoeffding","element":"span"},{"text":", pages 409–426. Springer, 1994.","element":"span"}],[{"id":"id-16","text":"[28] M. Hristache, A. Juditsky, J. Polzehl, and V. Spokoiny. Structure adaptive approach for ","element":"span"},{"text":"dimension reduction. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annals of Statistics","element":"span"},{"text":", 2001.","element":"span"}],[{"id":"id-21","text":"[29] H. Ichimura. Semiparametric least squares (sls) and weighted sls estimation of single-index ","element":"span"},{"text":"models. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 58(1-2):71–120, 1993.","element":"span"}],[{"id":"id-6","text":"[30] H. Kim, P. Howland, and H. Park. Dimension reduction in text classification with support ","element":"span"},{"text":"vector machines. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Machine Learning Research","element":"span"},{"text":", 6(Jan):37–53, 2005.","element":"span"}],[{"id":"id-39","text":"[31] T. Klock, A. Lanteri, and S. Vigogna. ","element":"span"},{"text":"Estimating multi-index models with responseconditional least squares. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Electronic Journal of Statistics","element":"span"},{"text":", 15(1):589–629, 2021.","element":"span"}],[{"id":"id-38","text":"[32] A. Lanteri, M. Maggioni, and S. Vigogna. Conditional regression for single-index models. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:2002.10008","element":"span"},{"text":", 2020.","element":"span"}],[{"id":"id-18","text":"[33] O. Lepski, N. Serdyukova, et al. Adaptive estimation under single-index constraint in a ","element":"span"},{"text":"regression model. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 42(1):1–28, 2014.","element":"span"}],[{"id":"id-26","text":"[34] B. Li. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Sufficient dimension reduction: Methods and applications with R","element":"span"},{"text":". Chapman and Hall/CRC, 2018.","element":"span"}],[{"id":"id-35","text":"[35] B. Li and S. Wang. On directional regression for dimension reduction. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the American Statistical Association","element":"span"},{"text":", 102(479):997–1008, 2007.","element":"span"}],[{"id":"id-34","text":"[36] B. Li, H. Zha, F. Chiaromonte, et al. Contour regression: a general approach to dimension ","element":"span"},{"text":"reduction. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 33(4):1580–1616, 2005.","element":"span"}],[{"id":"id-10","text":"[37] K.-C. Li. ","element":"span"},{"text":"Sliced inverse regression for dimension reduction. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the American Statistical Association","element":"span"},{"text":", 86(414):316–327, 1991.","element":"span"}],[{"id":"id-27","text":"[38] K. C. Li. ","element":"span"},{"text":"Sliced inverse regression for dimension reduction. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the American Statistical Association","element":"span"},{"text":", 1991.","element":"span"}],[{"id":"id-36","text":"[39] K.-C. Li. On principal hessian directions for data visualization and dimension reduction: ","element":"span"},{"text":"Another application of stein’s lemma. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the American Statistical Association","element":"span"},{"text":", 87(420):1025–1039, 1992.","element":"span"}],[{"id":"id-129","text":"[40] W. Liao and M. Maggioni. Adaptive geometric multiscale approximations for intrinsically ","element":"span"},{"text":"low-dimensional data. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Machine Learning Research","element":"span"},{"text":", 20(98):1–63, 2019.","element":"span"}],[{"id":"id-25","text":"[41] Y. Ma and L. Zhu. A review on dimension reduction. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Statistical Review","element":"span"},{"text":", 2013.","element":"span"}],[{"id":"id-94","text":"[42] S. Minsker, X. Wei, et al. Robust modifications of u-statistics and applications to covariance ","element":"span"},{"text":"estimation problems. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Bernoulli","element":"span"},{"text":", 26(1):694–727, 2020.","element":"span"}],[{"id":"id-7","text":"[43] R. M. Pfeiffer and E. Bura. A model free approach to combining biomarkers. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Biometrical Journal","element":"span"},{"text":", 2008.","element":"span"}],[{"id":"id-8","text":"[44] R. M. Pfeiffer, L. Forzani, and E. Bura. Sufficient dimension reduction for longitudinally ","element":"span"},{"text":"measured predictors. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Statistics in Medicine","element":"span"},{"text":", 2012.","element":"span"}],[{"id":"id-114","text":"[45] D. Pollard. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Convergence of Stochastic Processes","element":"span"},{"text":". Clinical Perspectives in Obstetrics and Gynecology. Springer New York, 1984.","element":"span"}],[{"id":"id-13","text":"[46] J. L. Powell, J. H. Stock, and T. M. Stoker. Semiparametric Estimation of Index Coeffi- ","element":"span"},{"text":"cients. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 1989.","element":"span"}],[{"id":"id-93","text":"[47] R. J. Serfling. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Approximation Theorems of Mathematical Statistics","element":"span"},{"text":", volume 162. ","element":"span"},{"text":"John Wiley & Sons, 2009.","element":"span"}],[{"id":"id-103","text":"[48] W. G. Stewart and J.-G. Sun. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Matrix perturbation theory","element":"span"},{"text":". Academic Press, 1990.","element":"span"}],[{"id":"id-95","text":"[49] J. A. Tropp. User-friendly tools for random matrices: An introduction. Technical report, ","element":"span"},{"text":"California Institute of Technology Division of Engineering and Applied Science, 2012.","element":"span"}],[{"id":"id-65","text":"[50] A. B. Tsybakov. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Introduction to Nonparametric Estimation","element":"span"},{"text":". Springer-Verlag New York, 2009.","element":"span"}],[{"id":"id-66","text":"[51] L. Wasserman. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"All of nonparametric statistics","element":"span"},{"text":". Springer Science & Business Media, 2006.","element":"span"}],[{"id":"id-104","text":"[52] H. Weyl. Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differential- ","element":"span"},{"text":"gleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematische Annalen","element":"span"},{"text":", 71(4):441–479, 1912.","element":"span"}],[{"id":"id-22","text":"[53] Y. Xia, H. Tong, W. K. Li, and L.-X. Zhu. An adaptive estimation of dimension reduc- ","element":"span"},{"text":"tion space. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the Royal Statistical Society: Series B (Statistical Methodology)","element":"span"},{"text":", 64(3):363–410, 2002.","element":"span"}],[{"id":"id-12","text":"[54] Y. Xia, H. Tong, W. K. Li, and L.-X. Zhu. An adaptive estimation of dimension reduction ","element":"span"},{"text":"space. In ","element":"span"},{"style":{"height":16.8},"width":1521.24,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/28-0.png","element":"img","alt":" Exploration Of A Nonlinear World: An Appreciation of Howell Tong’s Contri-","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"butions to Statistics","element":"span"},{"text":", pages 299–346. World Scientific, 2009.","element":"span"}],[{"id":"id-4","text":"[55] K. Zhou, J. C. Doyle, K. Glover, et al. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Robust and optimal control","element":"span"},{"text":", volume 40. Prentice hall New Jersey, 1996.","element":"span"}],[{"id":"id-28","text":"[56] L. X. Zhu and K. T. Fang. Asymptotics for kernel estimate of sliced Inverse regression. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annals of Statistics","element":"span"},{"text":", 1996.","element":"span"}],[{"id":"id-37","text":"[57] L. X. Zhu, M. Ohtaki, and Y. Li. On hybrid methods of inverse regression-based algorithms. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Computational statistics & data analysis","element":"span"},{"text":", 51(5):2621–2635, 2007.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Supplemental Materials for Learning functions varying along a central subspace","element":"span"}]]},{"heading":"A Proof of Proposition 1","paragraphs":[[{"style":{"fontWeight":"bold"},"text":"Proof of Proposition ","element":"span"},{"href":"#id-50","style":{"fontWeight":"bold"},"text":"1. ","element":"a"},{"text":"By expressing ","element":"span"},{"style":{"height":22.66},"width":721.28,"height":56.65,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-0.png","element":"img","alt":" �x − x = ProjSΦ(�x − x) + ProjS⊥Φ (�x − x","inline":true},{"text":"), we can write ","element":"span"},{"text":"the covariance matrix as","element":"span"}],[{"style":{"width":"99%"},"width":1803,"height":832,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-1.png","element":"img"}],[{"text":"which reflects ","element":"span"},{"style":{"fontWeight":"bold"},"text":"x ","element":"span"},{"text":"with respect to the central subspace Φ. The operator ","element":"span"},{"style":{"height":14.7},"width":58.13,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-2.png","element":"img","alt":" RΦ","inline":true,"padRight":true},{"text":"is invertible and ","element":"span"},{"style":{"height":21.23},"width":195.06,"height":53.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-3.png","element":"img","alt":"R−1Φ = RΦ","inline":true},{"text":". We also have","element":"span"}],[{"style":{"width":"70%"},"width":1278,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-4.png","element":"img"}],[{"text":"For any set Ω ","element":"span"},{"style":{"height":17.6},"width":179.19,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-5.png","element":"img","alt":" ⊂ supp(ρ","inline":true},{"text":"), we define the reflected set ","element":"span"},{"style":{"height":17.6},"width":781.11,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-6.png","element":"img","alt":" RΦ(Ω) = {RΦ(x)|x ∈ Ω}. Since x has a","inline":true,"padRight":true},{"text":"spherical distribution, we have, for any set Ω ","element":"span"},{"style":{"height":17.6},"width":204.5,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-7.png","element":"img","alt":" ⊂ supp(ρ),","inline":true}],[{"style":{"width":"82%"},"width":1485,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-8.png","element":"img"}],[{"text":"Therefore ","element":"span"},{"style":{"height":17.6},"width":236.71,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-9.png","element":"img","alt":" RΦ(x) and x","inline":true,"padRight":true},{"text":"have the same distribution, which implies that Ψ","element":"span"},{"style":{"height":17.6},"width":507.23,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-10.png","element":"img","alt":"⊤(RΦ(�x)−RΦ(x))(RΦ(�x)−","inline":true},{"style":{"height":17.6},"width":654.99,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-11.png","element":"img","alt":"RΦ(x))⊤Φ and Ψ⊤(�x − x)(�x − x)⊤","inline":true},{"text":"Φ have the same distribution.","element":"span"}],[{"text":"We next show ","element":"span"},{"style":{"height":18.44},"width":527.2,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-12.png","element":"img","alt":" Vf(�x, x) = Vf(RΦ(�x), RΦ(x","inline":true},{"text":")). It is straightforward that ","element":"span"},{"style":{"height":17.6},"width":345.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-13.png","element":"img","alt":" f(x) = g(Φ⊤x) =","inline":true},{"style":{"height":17.6},"width":949.21,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-14.png","element":"img","alt":"g(Φ⊤RΦ(x)) = f(RΦ(x)). Since RΦ(x) and x","inline":true,"padRight":true},{"text":"have the same distribution, (","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"style":{"fontStyle":"italic"},"text":", f","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":")) and (","element":"span"},{"style":{"height":17.6},"width":209.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-15.png","element":"img","alt":"RΦ(x), f(x","inline":true},{"text":")) have the same distribution. Thus","element":"span"}],[{"style":{"width":"69%"},"width":1260,"height":317,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-16.png","element":"img"}],[{"text":"Overall, the distributions of (","element":"span"},{"style":{"height":17.6},"width":942.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-17.png","element":"img","alt":"x, f(x), �x, f(�x)) and (RΦ(x), f(x), RΦ(�x), f(RΦ(�x","inline":true},{"text":"))) are the same. Therefore, we have","element":"span"}],[{"style":{"width":"72%"},"width":1316,"height":174,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/29-18.png","element":"img"}],[{"style":{"width":"72%"},"width":1313,"height":249,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-0.png","element":"img"}],[{"text":"Similarly, we can show that","element":"span"}],[{"style":{"width":"44%"},"width":800,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-1.png","element":"img"}],[{"text":"Let","element":"span"}],[{"style":{"width":"67%"},"width":1226,"height":174,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-2.png","element":"img"}],[{"text":"The covariance matrix ","element":"span"},{"style":{"height":17.6},"width":79.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-3.png","element":"img","alt":" G(α","inline":true},{"text":") can be written as","element":"span"}],[{"style":{"width":"35%"},"width":633,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-4.png","element":"img"}],[{"text":"Let ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"height":20.2},"width":184.81,"height":50.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-5.png","element":"img","alt":"λ1, . . . , ˆλd","inline":true,"padRight":true},{"text":"be the eigenvalues of ","element":"span"},{"style":{"height":17.6},"width":393.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-6.png","element":"img","alt":" G1(α) and z1, . . . , zd","inline":true,"padRight":true},{"text":"be the associated orthonormal eigenvectors, and let ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":19.41},"width":239.88,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-7.png","element":"img","alt":"λ1, . . . , ˜λD−d","inline":true,"padRight":true},{"text":"be the eigenvalues of ","element":"span"},{"style":{"height":17.6},"width":454.13,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-8.png","element":"img","alt":" G2(α) and ˜z1, . . . , ˜zD−d","inline":true,"padRight":true},{"text":"be the associated orthonormal eigenvectors. Then the eigenvalues of ","element":"span"},{"style":{"height":21.41},"width":804.34,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-9.png","element":"img","alt":" G(α) are ˆλ1, . . . , ˆλd, ˜λ1, . . . , ˜λD−d and the","inline":true,"padRight":true},{"text":"associated eigenvectors are Φ","element":"span"},{"style":{"height":15.2},"width":641.22,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-10.png","element":"img","alt":"z1, . . . , Φzd, Ψ˜z1, . . . , Ψ˜zD−d since","inline":true}],[{"style":{"width":"57%"},"width":1041,"height":126,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-11.png","element":"img"}],[{"text":"The eigenvalues satisfy","element":"span"}],[{"style":{"width":"99%"},"width":1802,"height":691,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-12.png","element":"img"}]]},{"heading":"B Proof of Example 1","paragraphs":[[{"style":{"fontStyle":"italic"},"text":"Proof of Example ","element":"span"},{"href":"#id-57","style":{"fontStyle":"italic"},"text":"1. ","element":"a"},{"text":"Denote Ω = supp(","element":"span"},{"style":{"height":18.3},"width":427.35,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-13.png","element":"img","alt":"ρ) and BD,r(x) the D","inline":true},{"text":"-dimensional ball with radius ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"text":"centered at ","element":"span"},{"style":{"height":16.4},"width":202.23,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-14.png","element":"img","alt":" x. Since ρ","inline":true,"padRight":true},{"text":"has a spherical distribution, we have Ω = ","element":"span"},{"style":{"height":18.3},"width":137.88,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-15.png","element":"img","alt":" BD,B(0","inline":true},{"text":"). Denote Ω","element":"span"},{"style":{"height":15.1},"width":163.7,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-16.png","element":"img","alt":"D as the","inline":true,"padRight":true},{"text":"largest ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D","element":"span"},{"text":"-dimensional hypercube inside Ω. There are infinitely many such hypercubes all of which have side length 2","element":"span"},{"style":{"height":19.98},"width":129.47,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-17.png","element":"img","alt":"B/√D","inline":true},{"text":". Any such hypercube can be written as","element":"span"}],[{"style":{"width":"64%"},"width":1162,"height":78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/30-18.png","element":"img"}],[{"text":"for a set of orthornormal basis ","element":"span"},{"style":{"height":19.53},"width":409.66,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-0.png","element":"img","alt":" {u1, u2, ..., uD} in RD","inline":true},{"text":". In this proof, we choose the one satisfying ","element":"span"},{"style":{"height":18.99},"width":613.84,"height":47.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-1.png","element":"img","alt":"uj = φj for 1 ≤ j ≤ d, where φj","inline":true,"padRight":true},{"text":"denotes the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":"-th column of Φ. We then define","element":"span"}],[{"style":{"width":"99%"},"width":1803,"height":981,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-2.png","element":"img"}],[{"text":"Here ","element":"span"},{"style":{"height":15.24},"width":275.6,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-3.png","element":"img","alt":"�Ωd ⊂ Ωd is a d","inline":true},{"text":"-dimensional hypercube with side length 2","element":"span"},{"style":{"height":20.6},"width":385.79,"height":51.5,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-4.png","element":"img","alt":"B/√D − 2√α/Cg >","inline":true,"padRight":true},{"text":"0 such that, for any ","element":"span"},{"style":{"height":29.89},"width":609.77,"height":74.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-5.png","element":"img","alt":" a ∈ �Ωd, we have Bd,√αCg(a) ⊂ Ωd","inline":true},{"text":". Such a choice of ","element":"span"},{"style":{"height":15.24},"width":49.52,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-6.png","element":"img","alt":"�Ωd","inline":true,"padRight":true},{"text":"guarantees that for any ","element":"span"},{"style":{"height":15.1},"width":143.83,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-7.png","element":"img","alt":" x ∈ �ΩD","inline":true}],[{"text":"with Φ","element":"span"},{"style":{"height":29.89},"width":401.75,"height":74.73,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-8.png","element":"img","alt":"⊤x ∈ �Ωd, Bd,√αCg(Φ⊤x","inline":true},{"text":") is entirely inside Ω","element":"span"},{"style":{"height":8.8},"width":18,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-9.png","element":"img","alt":"d","inline":true,"padRight":true},{"text":"(see Figure ","element":"span"},{"href":"#id-122","text":"15 ","element":"a"},{"text":"for a demonstration). Denote the volume of a subset ","element":"span"},{"style":{"height":19.53},"width":559.85,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-10.png","element":"img","alt":" S in Rd by Vold(S). We have","inline":true}],[{"style":{"width":"67%"},"width":1215,"height":242,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-11.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"height":27.42},"width":175.7,"height":68.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-12.png","element":"img","alt":" C1 = 1Cdg","inline":true}],[{"style":{"width":"95%"},"width":1728,"height":561,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-13.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"height":27.41},"width":154.48,"height":68.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-14.png","element":"img","alt":" C2 = 1Cdg","inline":true}],[{"id":"id-122","text":"Figure 15: ","element":"figcaption","subtype":"caption"},{"text":"For any ","element":"figcaption","subtype":"caption"},{"style":{"height":15.1},"width":279.99,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-15.png","element":"img","alt":" x ∈ ΩD with","inline":true,"padRight":true},{"text":"Φ","element":"figcaption","subtype":"caption"},{"style":{"height":29.89},"width":437.07,"height":74.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-16.png","element":"img","alt":"⊤x ∈ �Ωd, Bd,√αCg(Φ⊤x","inline":true},{"text":") entirely locates inside Ω","element":"figcaption","subtype":"caption"},{"style":{"height":8.8},"width":18,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-17.png","element":"img","alt":"d","inline":true}],[{"style":{"width":"1%"},"width":30,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-18.png","element":"img"}]]},{"heading":"C Proof of Example 2","paragraphs":[[{"href":"#id-75","style":{"height":19.01},"width":578.21,"height":47.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-19.png","element":"img","alt":"Proof of Example 2. Let ¯fℓ","inline":true,"padRight":true},{"text":"be the mean of ","element":"span"},{"style":{"height":20.02},"width":459.86,"height":50.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-20.png","element":"img","alt":" f(x) on ℓ(xi, xj) and ¯fij","inline":true,"padRight":true},{"text":"be the mean of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"x","element":"span"},{"text":") on ","element":"span"},{"style":{"height":18.22},"width":91.53,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-21.png","element":"img","alt":"Tij(r","inline":true},{"text":"). We first estimate the difference between ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":20.03},"width":186.74,"height":50.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-22.png","element":"img","alt":"fℓ and ¯fij","inline":true},{"text":". For simplicity, we denote ","element":"span"},{"style":{"height":18.22},"width":155.57,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/31-23.png","element":"img","alt":" ℓ(xi, xj)","inline":true,"padRight":true},{"text":"by ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-0.png","element":"img","alt":" ℓ","inline":true},{"text":". We parameterize ","element":"span"},{"style":{"height":17.6},"width":652.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-1.png","element":"img","alt":" ℓ by t for 0 ≤ t ≤ 1 and use r(t","inline":true},{"text":") to denote a point on ","element":"span"},{"style":{"height":12.8},"width":219.73,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-2.png","element":"img","alt":" ℓ such that","inline":true},{"style":{"height":18.22},"width":622.21,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-3.png","element":"img","alt":"r(0) = xi and r(1) = xj. Let S(t","inline":true},{"text":") be the disk centered at ","element":"span"},{"style":{"fontWeight":"bold"},"text":"r","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") with radius ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"text":"on the hyperplane of dimension ","element":"span"},{"style":{"height":12},"width":82.48,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-4.png","element":"img","alt":" D −","inline":true,"padRight":true},{"text":"1 which is perpendicular to ","element":"span"},{"style":{"height":16.4},"width":175.09,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-5.png","element":"img","alt":" ℓ. Let ρ⊤ ","inline":true,"padRight":true},{"text":"be a measure on [0","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1] such that, for any ","element":"span"},{"style":{"height":27.65},"width":1293.95,"height":69.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-6.png","element":"img","alt":" r(t) ∈ ℓ, ρ⊤(dt) = limr→0ρ(T(r(t),r(t+dt),r))ρ(T(r(0),r(1),r)) , where T(r(t), r(t + dt), r","inline":true},{"text":") is the tube enclosing ","element":"span"},{"style":{"height":17.6},"width":270.37,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-7.png","element":"img","alt":"ℓ(r(t), r(t + dt","inline":true},{"text":")) with radius ","element":"span"},{"style":{"height":16.4},"width":189.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-8.png","element":"img","alt":" r. Since ρ","inline":true,"padRight":true},{"text":"is uniform and ","element":"span"},{"style":{"height":18.22},"width":297.49,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-9.png","element":"img","alt":" Tij(r) ⊂ supp(ρ","inline":true},{"text":"), we can express","element":"span"}],[{"style":{"width":"84%"},"width":1523,"height":315,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-10.png","element":"img"}],[{"text":"Therefore","element":"span"}],[{"style":{"width":"67%"},"width":1215,"height":133,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-11.png","element":"img"}],[{"text":"We next derive the error bound between ","element":"span"},{"style":{"height":18.44},"width":520.87,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-12.png","element":"img","alt":" Vf(xi, xj) and Vf(xi, xj, r):","inline":true}],[{"style":{"width":"89%"},"width":1609,"height":589,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-13.png","element":"img"}],[{"text":"On the other hand, for any ","element":"span"},{"style":{"height":17.6},"width":153.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-14.png","element":"img","alt":" r(t) ∈ ℓ,","inline":true}],[{"style":{"width":"39%"},"width":704,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-15.png","element":"img"}],[{"text":"Similarly, for any ","element":"span"},{"style":{"height":18.22},"width":201.22,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-16.png","element":"img","alt":" x ∈ Tij(r),","inline":true}],[{"style":{"width":"71%"},"width":1286,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-17.png","element":"img"}],[{"text":"when 4","element":"span"},{"style":{"height":17.13},"width":186.05,"height":42.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-18.png","element":"img","alt":"r2 ≤ 5B2.","inline":true,"padRight":true},{"text":"In summary,","element":"span"}],[{"style":{"width":"69%"},"width":1250,"height":139,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/32-19.png","element":"img"}]]},{"heading":"D Proof of Theorem 4","paragraphs":[[{"text":"The proof of Theorem ","element":"span"},{"href":"#id-79","text":"4 ","element":"a"},{"text":"relies on several lemmas, which are presented and proved in Section ","element":"span"},{"href":"#id-123","text":"D.1. ","element":"a"},{"text":"We prove Theorem ","element":"span"},{"href":"#id-79","text":"4 ","element":"a"},{"text":"in Section ","element":"span"},{"href":"#id-124","text":"D.2.","element":"a"}],[{"id":"id-123","style":{"fontWeight":"bold"},"text":"D.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Some key lemmas","element":"span"}],[{"text":"The following lemma gives an estimate on the difference between the population variance of a bounded random variable and its empirical counterpart.","element":"span"}],[{"id":"id-125","style":{"fontWeight":"bold"},"text":"Lemma 7. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a random variable in ","element":"span"},{"text":"[","element":"span"},{"style":{"height":17.6},"width":986.22,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-0.png","element":"img","alt":"m − A, m + A] for some m ∈ R and A > 0. Suppose","inline":true},{"style":{"height":18.09},"width":132.48,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-1.png","element":"img","alt":"{si}ni=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are independent copies of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s","element":"span"},{"style":{"fontStyle":"italic"},"text":". Denote the empirical variance by ","element":"span"},{"style":{"height":21.29},"width":480.62,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-2.png","element":"img","alt":" �V (s) = 1n−1�ni=1(si − ¯s)2","inline":true}],[{"style":{"height":17.6},"width":651.11,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-3.png","element":"img","alt":"where ¯s = (s1 + . . . + sn)/n. Then","inline":true}],[{"style":{"width":"52%"},"width":940,"height":138,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-4.png","element":"img"}],[{"text":"Lemma ","element":"span"},{"href":"#id-125","text":"7 ","element":"a"},{"text":"is a consequence of U-statistics ","element":"span"},{"href":"#id-126","referenceIndex":27,"text":"[27] ","element":"a"},{"text":"applied on ","element":"span"},{"style":{"height":17.6},"width":101.55,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-5.png","element":"img","alt":"�V (s).","inline":true,"padRight":true},{"text":"Based on Lemma ","element":"span"},{"href":"#id-125","text":"7, ","element":"a"},{"text":"we prove a high probability bound between ","element":"span"},{"style":{"height":18.22},"width":548.84,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-6.png","element":"img","alt":" Vy(xi, xj, r) and �Vy(xi, xj, r):","inline":true}],[{"id":"id-128","style":{"height":18.09},"width":440.55,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-7.png","element":"img","alt":"Lemma 8. Let {xi}ni=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be i.i.d. samples from the probability measure ","element":"span"},{"style":{"height":18.09},"width":493.02,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-8.png","element":"img","alt":" ρ , and {yi}ni=1 be sampled","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"according to the model in ","element":"span"},{"href":"#id-40","text":"(1)","element":"a"},{"style":{"fontStyle":"italic"},"text":", under Assumption ","element":"span"},{"href":"#id-51","style":{"fontStyle":"italic"},"text":"2(","element":"a"},{"style":{"fontStyle":"italic"},"text":"i), ","element":"span"},{"href":"#id-52","style":{"fontStyle":"italic"},"text":"3, ","element":"a"},{"href":"#id-69","style":{"fontStyle":"italic"},"text":"4 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-127","style":{"fontStyle":"italic"},"text":"5. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":15.02},"width":443.63,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-9.png","element":"img","alt":" ν > 0 and set α0 and r","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"according to ","element":"span"},{"href":"#id-78","style":{"fontStyle":"italic"},"text":"(30)","element":"a"},{"style":{"fontStyle":"italic"},"text":". Then","element":"span"}],[{"style":{"width":"60%"},"width":1088,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-10.png","element":"img"}],[{"href":"#id-128","style":{"height":21.45},"width":825.76,"height":53.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-11.png","element":"img","alt":"Proof of Lemma 8. We set η = α0/(2C2g","inline":true},{"text":"), and consider the tube ","element":"span"},{"style":{"height":18.22},"width":494.22,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-12.png","element":"img","alt":" Tij(r) with ∥xi − xj∥ > η","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":18.22},"width":261.18,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-13.png","element":"img","alt":" ∥xi − xj∥ ≤ η","inline":true,"padRight":true},{"text":"as separate cases.","element":"span"}],[{"style":{"height":18.22},"width":993.44,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-14.png","element":"img","alt":"Case I when ∥xi − xj∥ ≤ η: When ∥xi − xj∥ ≤ η","inline":true},{"text":", we expect ","element":"span"},{"style":{"height":18.22},"width":205.14,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-15.png","element":"img","alt":" Vy(xi, xj, r","inline":true},{"text":") to be small since","element":"span"}],[{"style":{"width":"93%"},"width":1691,"height":261,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-16.png","element":"img"}],[{"text":"where the last inequality holds as long as ","element":"span"},{"style":{"height":21.45},"width":625.8,"height":53.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-17.png","element":"img","alt":" η2 ≤ α0/(2C2g) and r2 ≤ α0/(8C2g","inline":true},{"text":"). By ","element":"span"},{"href":"#id-78","text":"(30)","element":"a"},{"text":", ","element":"span"},{"style":{"height":14.22},"width":86.34,"height":35.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-18.png","element":"img","alt":" α0 is","inline":true,"padRight":true},{"text":"small when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is sufficiently large. These conditions are guaranteed as ","element":"span"},{"style":{"height":21.45},"width":359.44,"height":53.63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-19.png","element":"img","alt":" η2 < η = α0/(2C2g)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":21.45},"width":509.23,"height":53.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-20.png","element":"img","alt":" r2 < r ≤ α0/(8C2g) when n","inline":true,"padRight":true},{"text":"is sufficiently large.","element":"span"}],[{"style":{"height":18.22},"width":985,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-21.png","element":"img","alt":"Case II when ∥xi − xj∥ > η: When ∥xi −xj∥ > η","inline":true},{"text":", there are sufficient points in ","element":"span"},{"style":{"height":18.22},"width":253.69,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-22.png","element":"img","alt":" Tij(r) so that","inline":true},{"style":{"height":18.22},"width":205.14,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-23.png","element":"img","alt":"�Vy(xi, xj, r","inline":true},{"text":") is concentrated on ","element":"span"},{"style":{"height":18.22},"width":482.9,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-24.png","element":"img","alt":" Vy(xi, xj, r). Let ρ(Tij(r","inline":true},{"text":")) be the measure of the tube ","element":"span"},{"style":{"height":18.22},"width":91.53,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-25.png","element":"img","alt":"Tij(r","inline":true},{"text":"), which satisfies ","element":"span"},{"style":{"height":20.15},"width":378.8,"height":50.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-26.png","element":"img","alt":" ρ(Tij(r)) ≥ c1rD−1η","inline":true,"padRight":true},{"text":"by ","element":"span"},{"href":"#id-127","text":"(25)","element":"a"},{"text":". Let ","element":"span"},{"style":{"height":18.22},"width":92.22,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-27.png","element":"img","alt":" nij(r","inline":true},{"text":") be the number of points in ","element":"span"},{"style":{"height":18.22},"width":701,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-28.png","element":"img","alt":"Tij(r) and then �ρ(Tij(r)) = nij(r)/n","inline":true,"padRight":true},{"text":"is the empirical measure of ","element":"span"},{"style":{"height":18.22},"width":91.53,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-29.png","element":"img","alt":" Tij(r","inline":true},{"text":"). By ","element":"span"},{"href":"#id-129","referenceIndex":40,"text":"[40, ","element":"a"},{"text":"Lemma 29], we have the following concentration of measure:","element":"span"}],[{"style":{"width":"88%"},"width":1595,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-30.png","element":"img"}],[{"text":"On the condition of the event ","element":"span"},{"style":{"height":21.29},"width":641.04,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-31.png","element":"img","alt":" |�ρ(Tij(r)) − ρ(Tij(r))| ≤ 12ρ(Tij(r","inline":true},{"text":")), we have ","element":"span"},{"style":{"height":18.22},"width":217.71,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-32.png","element":"img","alt":" �ρ(Tij(r)) ≥","inline":true},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-33.png","element":"img","alt":"1","inline":true}],[{"style":{"height":19.22},"width":152.78,"height":48.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-34.png","element":"img","alt":"2ρ(Tij(r","inline":true},{"text":")), which implies ","element":"span"},{"style":{"height":21.29},"width":375.68,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-35.png","element":"img","alt":" nij(r) ≥ 12c1nrD−1η","inline":true},{"text":". By Lemma ","element":"span"},{"href":"#id-125","text":"7, ","element":"a"},{"style":{"height":18.22},"width":205.14,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-36.png","element":"img","alt":" �Vy(xi, xj, r","inline":true},{"text":") is concentrated ","element":"span"},{"text":"on ","element":"span"},{"style":{"height":18.22},"width":205.14,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-37.png","element":"img","alt":" Vy(xi, xj, r","inline":true},{"text":") with high probability:","element":"span"}],[{"style":{"width":"87%"},"width":1577,"height":356,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/33-38.png","element":"img"}],[{"style":{"width":"93%"},"width":1694,"height":964,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-0.png","element":"img"}],[{"text":"A key step in the proof of Theorem ","element":"span"},{"href":"#id-79","text":"4 ","element":"a"},{"text":"is to estimate the difference between ","element":"span"},{"style":{"height":18.22},"width":306.94,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-1.png","element":"img","alt":"�Vy(xi, xj, r) and","inline":true},{"style":{"height":18.44},"width":165.09,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-2.png","element":"img","alt":"Vf(xi, xj","inline":true},{"text":"), which is given in the next lemma.","element":"span"}],[{"id":"id-130","style":{"height":18.09},"width":443.28,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-3.png","element":"img","alt":"Lemma 9. Let {xi}ni=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be i.i.d. samples from the probability measure ","element":"span"},{"style":{"height":18.09},"width":483.94,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-4.png","element":"img","alt":" ρ, and {yi}ni=1 be sampled","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"according to the model in ","element":"span"},{"href":"#id-40","text":"(1)","element":"a"},{"style":{"fontStyle":"italic"},"text":", under Assumption ","element":"span"},{"href":"#id-51","style":{"fontStyle":"italic"},"text":"2(","element":"a"},{"style":{"fontStyle":"italic"},"text":"i), ","element":"span"},{"href":"#id-52","style":{"fontStyle":"italic"},"text":"3-","element":"a"},{"href":"#id-74","style":{"fontStyle":"italic"},"text":"6. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":15.02},"width":568.2,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-5.png","element":"img","alt":" ν > 0 and set α0 and r as in","inline":true,"padRight":true},{"href":"#id-78","style":{"fontStyle":"italic"},"text":"(30)","element":"a"},{"style":{"fontStyle":"italic"},"text":". Then","element":"span"}],[{"id":"id-132","style":{"width":"78%"},"width":1417,"height":75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-6.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-130","style":{"fontWeight":"bold"},"text":"9. ","element":"a"},{"text":"We decompose the difference by","element":"span"}],[{"id":"id-131","style":{"width":"96%"},"width":1744,"height":322,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-7.png","element":"img"}],[{"text":"The term I in ","element":"span"},{"href":"#id-131","text":"(D.4) ","element":"a"},{"text":"represents the difference between the variance of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"over the segment between ","element":"span"},{"style":{"height":17.42},"width":189.48,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-8.png","element":"img","alt":" xi and xj","inline":true},{"text":", and the variance of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"in the tube with radius ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"text":"enclosing this segment. According to Assumption ","element":"span"},{"href":"#id-74","text":"6, ","element":"a"},{"text":"I ","element":"span"},{"style":{"height":20.57},"width":297.39,"height":51.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-9.png","element":"img","alt":" ≤ α02 if r ≤ α02c2 .","inline":true,"padRight":true},{"text":"The term II satisfies II ","element":"span"},{"style":{"height":17.13},"width":89.56,"height":42.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-10.png","element":"img","alt":" ≤ σ2 ","inline":true,"padRight":true},{"text":"since it captures the variance of the bounded noise in [","element":"span"},{"style":{"height":17.6},"width":128.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-11.png","element":"img","alt":"−σ, σ].","inline":true,"padRight":true},{"text":"The term III in ","element":"span"},{"href":"#id-131","text":"(D.4) ","element":"a"},{"text":"captures the difference between the population variance of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"in the tube ","element":"span"},{"style":{"height":18.22},"width":91.53,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-12.png","element":"img","alt":" Tij(r","inline":true},{"text":") and its empirical counterpart. A high probability bound is given by Lemma ","element":"span"},{"href":"#id-128","text":"8. ","element":"a"},{"text":"Putting the above ingredients together, we have","element":"span"}],[{"style":{"width":"57%"},"width":1043,"height":251,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-13.png","element":"img"}],[{"text":"which implies ","element":"span"},{"href":"#id-132","text":"(D.3)","element":"a"},{"text":".","element":"span"}],[{"style":{"width":"1%"},"width":30,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-14.png","element":"img"}],[{"text":"The following lemma gives an upper bound of ","element":"span"},{"style":{"height":18.44},"width":196.59,"height":46.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/34-15.png","element":"img","alt":" Vf(xi, xj).","inline":true}],[{"id":"id-133","style":{"fontWeight":"bold"},"text":"Lemma 10. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is defined as ","element":"span"},{"href":"#id-40","text":"(1)","element":"a"},{"style":{"fontStyle":"italic"},"text":", with the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"style":{"fontStyle":"italic"},"text":"satisfying Assumption ","element":"span"},{"href":"#id-52","style":{"fontStyle":"italic"},"text":"3. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"For any two points ","element":"span"},{"style":{"height":21.85},"width":1041.66,"height":54.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-0.png","element":"img","alt":" xi, xj ∈ RD, we have Vf(xi, xj) ≤ C2g∥Φ⊤xi − Φ⊤xj∥2.","inline":true}],[{"href":"#id-133","style":{"height":18.22},"width":1117.41,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-1.png","element":"img","alt":"Proof of Lemma 10. For any x = (1 − t)xi + txj, t ∈ [0,","inline":true,"padRight":true},{"text":"1], we have","element":"span"}],[{"style":{"width":"62%"},"width":1136,"height":199,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-2.png","element":"img"}],[{"text":"Therefore, ","element":"span"},{"style":{"height":21.45},"width":1423.54,"height":53.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-3.png","element":"img","alt":" Vf(xi, xj) = var (f(x)|x = (1 − t)xi + txj, t ∈ [0, 1]) ≤ C2g∥Φ⊤xi − Φ⊤xj∥2.","inline":true}],[{"id":"id-124","style":{"fontWeight":"bold"},"text":"D.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-79","style":{"fontWeight":"bold"},"text":"4","element":"a"}],[{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-79","style":{"fontWeight":"bold"},"text":"4. ","element":"a"},{"text":"W","element":"span"},{"href":"#id-130","text":"e p","element":"a"},{"text":"rove ","element":"span"},{"href":"#id-134","text":"(32) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-134","text":"(33) ","element":"a"},{"text":"in sequel.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-134","style":{"fontWeight":"bold"},"text":"(32)","element":"a"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"Lemma ","element":"span"},{"href":"#id-130","text":"9 ","element":"a"},{"text":"gives an estimate on ","element":"span"},{"style":{"height":18.44},"width":828.4,"height":46.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-4.png","element":"img","alt":" |Vf(xi, xj) − �Vy(xi, xj, r)| for an (i, j) pair.","inline":true,"padRight":true},{"text":"We prove ","element":"span"},{"href":"#id-134","text":"(32) ","element":"a"},{"text":"by deriving a union bound to show that if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is large enough, then with high probability, every ","element":"span"},{"style":{"height":18.44},"width":165.09,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-5.png","element":"img","alt":" Vf(xi, xj","inline":true},{"text":") is small for all the (","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, j","element":"span"},{"text":") pairs satisfying ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, j","element":"span"},{"text":") = 1. Lemma ","element":"span"},{"href":"#id-130","text":"9 ","element":"a"},{"text":"implies that, for any ","element":"span"},{"style":{"height":14.8},"width":120.07,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-6.png","element":"img","alt":" α > 0,","inline":true}],[{"style":{"width":"42%"},"width":762,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-7.png","element":"img"}],[{"text":"Then we have","element":"span"}],[{"style":{"width":"46%"},"width":833,"height":582,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of ","element":"span"},{"href":"#id-134","style":{"fontWeight":"bold"},"text":"(33)","element":"a"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"In Algorithm ","element":"span"},{"href":"#id-77","text":"2, ","element":"a"},{"text":"we say two points ","element":"span"},{"style":{"height":13.02},"width":100.89,"height":32.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-9.png","element":"img","alt":" xi, xj","inline":true,"padRight":true},{"text":"are connected if Algorithm ","element":"span"},{"href":"#id-77","text":"2 ","element":"a"},{"text":"outputs ","element":"span"},{"style":{"height":18.22},"width":152.7,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-10.png","element":"img","alt":" A(xi, xj","inline":true},{"text":") = 1. Algorithm ","element":"span"},{"href":"#id-77","text":"2 ","element":"a"},{"text":"guarantees that if ","element":"span"},{"style":{"height":17.42},"width":177.48,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-11.png","element":"img","alt":" xi and xj","inline":true,"padRight":true},{"text":"are connected, they can not be connected with any other points, so that the connected pairs are independent from each other. As a consequence, ","element":"span"},{"style":{"height":10.62},"width":48.19,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-12.png","element":"img","alt":" �nα","inline":true,"padRight":true},{"text":"is upper bounded by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n/","element":"span"},{"text":"2.","element":"span"}],[{"text":"along each direction with grid spacing ","element":"span"},{"style":{"height":22.89},"width":355.6,"height":57.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-13.png","element":"img","alt":" h = 2B(log n/n)12D","inline":true,"padRight":true},{"text":". Denote the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"-dimensional cube with side length ","element":"span"},{"style":{"height":23.29},"width":1581.58,"height":58.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-14.png","element":"img","alt":" h as �Ωk, k = 1, . . . , N, and then N ≤ (2B/h)d = (n/ log n)d2D . Define N prisms in","inline":true}],[{"style":{"width":"72%"},"width":1303,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-15.png","element":"img"}],[{"text":"For any Ω","element":"span"},{"style":{"height":17.42},"width":399.35,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-16.png","element":"img","alt":"k and any xi, xj ∈ Ωk","inline":true},{"text":", we next show ","element":"span"},{"style":{"height":18.22},"width":309.16,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-17.png","element":"img","alt":"�Vy(xi, xj, r) ≤ α","inline":true,"padRight":true},{"text":"with high probability. If ","element":"span"},{"style":{"height":14.62},"width":144.54,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-18.png","element":"img","alt":" xi, xj ∈","inline":true,"padRight":true},{"text":"Ω","element":"span"},{"style":{"height":20.69},"width":524.7,"height":51.73,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-19.png","element":"img","alt":"k, ∥Φ⊤xi − Φ⊤xj∥ ≤ √dh","inline":true,"padRight":true},{"text":"and by Lemma ","element":"span"},{"href":"#id-133","text":"10, ","element":"a"},{"style":{"height":25.21},"width":831.9,"height":63.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-20.png","element":"img","alt":" Vf(xi, xj) ≤ dC2gh2 = 4dC2gB2(log n/n)1D .","inline":true,"padRight":true},{"text":"According to Lemma ","element":"span"},{"href":"#id-130","text":"9, ","element":"a"},{"style":{"height":31.6},"width":1030.02,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-21.png","element":"img","alt":" P�|Vf(xi, xj) − �Vy(xi, xj, r)| ≤ α0 + 3σ2�≥ 1 − 4n−ν","inline":true},{"text":". Therefore, for any ","element":"span"},{"style":{"height":17.42},"width":386.31,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-22.png","element":"img","alt":" xi, xj ∈ Ωk, we have","inline":true}],[{"style":{"width":"62%"},"width":1128,"height":136,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/35-23.png","element":"img"}],[{"text":"Denote #Ω","element":"span"},{"style":{"height":17.6},"width":392.37,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-0.png","element":"img","alt":"k = #{xi : xi ∈ Ωk}.","inline":true,"padRight":true},{"text":"Applying a union bound gives","element":"span"}],[{"style":{"width":"86%"},"width":1558,"height":442,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-1.png","element":"img"}],[{"text":"The equation above shows that, in all sets Ω","element":"span"},{"style":{"height":15.6},"width":279.97,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-2.png","element":"img","alt":"k, k = 1, . . . , N","inline":true},{"text":", all pairs of points ","element":"span"},{"style":{"height":17.42},"width":312.39,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-3.png","element":"img","alt":" xi, xj in each Ωk","inline":true,"padRight":true},{"text":"satisfy ","element":"span"},{"style":{"height":18.22},"width":309.16,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-4.png","element":"img","alt":"�Vy(xi, xj, r) ≤ α","inline":true,"padRight":true},{"text":"with probability no less than 1 ","element":"span"},{"style":{"height":16.33},"width":220.35,"height":40.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-5.png","element":"img","alt":" − 2n−(ν−2).","inline":true}],[{"text":"In Algorithm ","element":"span"},{"href":"#id-77","text":"2, ","element":"a"},{"text":"two points ","element":"span"},{"style":{"height":13.02},"width":100.89,"height":32.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-6.png","element":"img","alt":" xi, xj","inline":true,"padRight":true},{"text":"are likely to be connected if ","element":"span"},{"style":{"height":18.22},"width":489.12,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-7.png","element":"img","alt":"�Vy(xi, xj, r) ≤ α. Under","inline":true,"padRight":true},{"text":"the condition that in all the sets Ω","element":"span"},{"style":{"height":15.6},"width":284.93,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-8.png","element":"img","alt":"k, k = 1, . . . , N","inline":true},{"text":", all pairs of points ","element":"span"},{"style":{"height":17.42},"width":461.22,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-9.png","element":"img","alt":" xi, xj in each Ωk satisfy","inline":true},{"style":{"height":18.22},"width":309.16,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-10.png","element":"img","alt":"�Vy(xi, xj, r) ≤ α","inline":true},{"text":", there is at most one point in each Ω","element":"span"},{"style":{"height":8.8},"width":18,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-11.png","element":"img","alt":"k","inline":true,"padRight":true},{"text":"that is not connected with other points in the output of Algorithm ","element":"span"},{"href":"#id-77","text":"2. ","element":"a"},{"text":"Therefore, the number of connected pairs satisfies","element":"span"}],[{"style":{"width":"45%"},"width":823,"height":136,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-12.png","element":"img"}],[{"text":"if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"is sufficiently large such that 2(","element":"span"},{"style":{"height":23.29},"width":309.26,"height":58.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-13.png","element":"img","alt":"n/ log n)d2D ≤ n.","inline":true}],[{"style":{"width":"1%"},"width":30,"height":23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-14.png","element":"img"}]]},{"heading":"E Proof of Lemma 5","paragraphs":[[{"href":"#id-109","style":{"height":19.14},"width":537.18,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-15.png","element":"img","alt":"Proof of Lemma 5. E(X2","inline":true},{"text":") can be computed by the following intergral","element":"span"}],[{"style":{"width":"62%"},"width":1120,"height":545,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-16.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":14.4},"width":86.84,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-17.png","element":"img","alt":" t0, t1","inline":true,"padRight":true},{"text":"are chosen such that ","element":"span"},{"style":{"height":13.2},"width":79.04,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-18.png","element":"img","alt":" Ae−","inline":true}],[{"text":"The second equality is due to the fact that ","element":"span"},{"style":{"height":24.23},"width":144,"height":60.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-19.png","element":"img","alt":" Ae− at2b+ct","inline":true,"padRight":true},{"text":"is a probability which is no larger than 1.","element":"span"}],[{"text":"Plugging ","element":"span"},{"style":{"height":15.02},"width":166.84,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-20.png","element":"img","alt":" t0 and t1","inline":true,"padRight":true},{"text":"to the equation above gives rise to","element":"span"}],[{"style":{"width":"82%"},"width":1487,"height":398,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/36-21.png","element":"img"}]]},{"heading":"F Proof of Lemma 6","paragraphs":[[{"text":"Lemma ","element":"span"},{"href":"#id-113","text":"6 ","element":"a"},{"text":"is based on the following Lemma ","element":"span"},{"href":"#id-135","text":"11 ","element":"a"},{"text":"(a generalization of ","element":"span"},{"href":"#id-20","referenceIndex":23,"text":"[23, ","element":"a"},{"text":"Lemma 11.1] in one dimension to the multidimensional case) and Lemma ","element":"span"},{"href":"#id-136","text":"12 ","element":"a"},{"text":"(","element":"span"},{"href":"#id-20","referenceIndex":23,"text":"[23, ","element":"a"},{"text":"Theorem 11.3]) below, which are standard results in non-parametric statistics.","element":"span"}],[{"id":"id-135","style":{"fontWeight":"bold"},"text":"Lemma 11. ","element":"span"},{"href":"#id-20","referenceIndex":23,"style":{"fontStyle":"italic"},"text":"[23, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"Lemma 11.1] Suppose ","element":"span"},{"style":{"height":20.15},"width":513.35,"height":50.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/37-0.png","element":"img","alt":" g : [−B, B]d → R is (s, Cg)","inline":true},{"style":{"fontStyle":"italic"},"text":"-smooth with ","element":"span"},{"style":{"height":16.8},"width":256.96,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/37-1.png","element":"img","alt":" s = k + β for","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"some ","element":"span"},{"style":{"height":17.6},"width":558.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/37-2.png","element":"img","alt":" k ∈ N0 and β ∈ (0, 1]. Let gk","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be the Taylor polynomial of ","element":"span"},{"style":{"height":19.53},"width":576.01,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/37-3.png","element":"img","alt":" g at a ∈ [−B, B]d of degree k.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"Then","element":"span"}],[{"style":{"width":"34%"},"width":615,"height":85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/37-4.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"fontWeight":"bold"},"text":"z ","element":"span"},{"style":{"fontStyle":"italic"},"text":"near ","element":"span"},{"style":{"fontWeight":"bold"},"text":"a","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-135","style":{"fontWeight":"bold"},"text":"11. ","element":"a"},{"text":"We denote the multi-index ","element":"span"},{"style":{"height":17.6},"width":809,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/37-5.png","element":"img","alt":" α = (α1, . . . , αd) and let α! = α1! · · · αd!,","inline":true},{"style":{"height":19.5},"width":627.84,"height":48.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/37-6.png","element":"img","alt":"zα = zα11 · · · zαdd for z = (z1, ..., zd","inline":true},{"text":"). Denote the partial derivative ","element":"span"},{"style":{"height":31.08},"width":318.2,"height":77.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/37-7.png","element":"img","alt":" Dα := ∂α1+...+αd∂xα11 ···∂xαdd ","inline":true,"padRight":true},{"text":". The Taylor","element":"span"}],[{"text":"expansion of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"at ","element":"span"},{"style":{"fontWeight":"bold"},"text":"a ","element":"span"},{"text":"gives","element":"span"}],[{"style":{"width":"85%"},"width":1549,"height":643,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/37-8.png","element":"img"}],[{"text":"From this, and the assumption that ","element":"span"},{"style":{"height":18.22},"width":184.81,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/37-9.png","element":"img","alt":" g is (s, Cg","inline":true},{"text":") smooth, one obtains","element":"span"}],[{"style":{"width":"81%"},"width":1479,"height":1080,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/37-10.png","element":"img"}],[{"text":"We next introduce a standard result (","element":"span"},{"href":"#id-20","referenceIndex":23,"text":"[23, ","element":"a"},{"text":"Theorem 11.3]) in nonparametric statistics.","element":"span"}],[{"id":"id-136","style":{"fontWeight":"bold"},"text":"Lemma 12. ","element":"span"},{"href":"#id-20","referenceIndex":23,"style":{"fontStyle":"italic"},"text":"[23, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"Theorem 11.3] Let ","element":"span"},{"style":{"height":20.82},"width":181.23,"height":52.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-0.png","element":"img","alt":" {zi}2ni=n+1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be i.i.d. sampled from a probability measure ","element":"span"},{"style":{"height":12},"width":26,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-1.png","element":"img","alt":" µ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that ","element":"span"},{"text":"supp(","element":"span"},{"style":{"height":21.22},"width":611.03,"height":53.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-2.png","element":"img","alt":"µ) ⊆ [−B, B]d and let {yi}2ni=n+1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be sampled from the regression model","element":"span"}],[{"style":{"width":"13%"},"width":237,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":19.13},"width":1349.58,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-4.png","element":"img","alt":" E[ζ|z] = 0. Suppose ∥h∥∞ < +∞ and supz var(ζ|z) ≤ ¯σ2 < ∞. Let F","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a linear space of functions from ","element":"span"},{"style":{"height":18.33},"width":289.22,"height":45.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-5.png","element":"img","alt":" Rd to R, and �h","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be the estimator given by","element":"span"}],[{"style":{"width":"58%"},"width":1065,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-6.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Then","element":"span"}],[{"id":"id-137","style":{"width":"82%"},"width":1495,"height":213,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-7.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for some universal constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"text":"In ","element":"span"},{"href":"#id-137","text":"(F.1)","element":"a"},{"text":", the mean squared error is decomposed into two terms: the first term captures the variance and the second term estimates the bias. We consider piecewise polynomial approximation with order no more than ","element":"span"},{"style":{"height":15.24},"width":389.33,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-8.png","element":"img","alt":" k such that F = Fk.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-113","style":{"fontWeight":"bold"},"text":"6. ","element":"a"},{"text":"In order to apply Lemma ","element":"span"},{"href":"#id-136","text":"12, ","element":"a"},{"text":"we express the regression model in ","element":"span"},{"href":"#id-110","text":"(14) ","element":"a"},{"text":"as","element":"span"}],[{"style":{"width":"58%"},"width":1063,"height":177,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-9.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"height":12},"width":22,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-10.png","element":"img","alt":" η","inline":true,"padRight":true},{"text":"defined in ","element":"span"},{"href":"#id-138","text":"(16)","element":"a"},{"text":". The function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"is bounded: ","element":"span"},{"style":{"height":18.22},"width":826.21,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-11.png","element":"img","alt":" ∥h∥∞ ≤ ∥g∥∞ + ∥η∥∞ ≤ M + CgB∥�Φ − Φ∥.","inline":true,"padRight":true},{"text":"The noise satisfies ","element":"span"},{"style":{"height":17.6},"width":274.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-12.png","element":"img","alt":" E[ζ|z] = 0 and","inline":true}],[{"style":{"width":"94%"},"width":1714,"height":642,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-13.png","element":"img"}],[{"text":"for some universal ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c","element":"span"},{"text":". Here dim(","element":"span"},{"style":{"height":15.24},"width":49.36,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-14.png","element":"img","alt":"Fk","inline":true},{"text":") is the dimension of ","element":"span"},{"style":{"height":15.24},"width":49.36,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-15.png","element":"img","alt":" Fk","inline":true,"padRight":true},{"text":"and hence dim(","element":"span"},{"style":{"height":22.56},"width":315.33,"height":56.41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-16.png","element":"img","alt":"Fk) =�d+kd �Kd.","inline":true,"padRight":true},{"text":"Let ","element":"span"},{"style":{"height":12},"width":38.82,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-17.png","element":"img","alt":" gk","inline":true,"padRight":true},{"text":"be the piecewise Taylor polynomial of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"with degree no more than ","element":"span"},{"style":{"height":17.6},"width":365.54,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-18.png","element":"img","alt":" k = ⌈s⌉ − 1 on the","inline":true,"padRight":true},{"text":"partition of [","element":"span"},{"style":{"height":19.53},"width":317.23,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-19.png","element":"img","alt":"−B, B]d into Kd ","inline":true,"padRight":true},{"text":"cubes, where the Taylor expansion is at the center of each cube. The second term can be estimated from Lemma ","element":"span"},{"href":"#id-135","text":"11 ","element":"a"},{"text":"as","element":"span"}],[{"style":{"width":"64%"},"width":1166,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/38-20.png","element":"img"}],[{"style":{"width":"63%"},"width":1138,"height":142,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/39-0.png","element":"img"}],[{"text":"Therefore","element":"span"}],[{"style":{"width":"75%"},"width":1361,"height":228,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/39-1.png","element":"img"}],[{"text":"We have","element":"span"}],[{"style":{"width":"88%"},"width":1592,"height":426,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2001.07883/images/39-2.png","element":"img"}]]}],"_version":"3.3.2"},"paperNode":"$28:props:children:props:children:0:props:product"}]]