36:[["$","audio",null,{"id":"tts"}],["$","$L3b",null,{"paperID":"1311.2645","publisher":"arxiv","paperJSON":{"title":"Program Evaluation and Causal Inference with High-Dimensional Data","paperID":"1311.2645","avgLineHeight":15.41,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"In this paper, we provide efficient estimators and honest confidence bands for a variety of treatment effects including local average (LATE) and local quantile treatment effects (LQTE) in data-rich environments. ","element":"span"},{"text":"We can handle ","element":"span"},{"style":{"fontStyle":"italic"},"text":"very many ","element":"span"},{"text":"control variables, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"endogenous ","element":"span"},{"text":"receipt of treatment, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"heterogeneous ","element":"span"},{"text":"treatment effects, and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"function-valued ","element":"span"},{"text":"$3c","element":"span"}],[{"text":"The results on program evaluation are obtained as a consequence of more general results on honest inference in a general moment condition framework, which arises from structural equation models in econometrics. Here too the crucial ingredient is the use of orthogonal moment conditions, which can be constructed from the initial moment conditions. We provide results on honest inference for (function-valued) parameters within this general framework where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"any high-quality","element":"span"},{"text":", modern ","element":"span"},{"style":{"fontStyle":"italic"},"text":"machine learning ","element":"span"},{"text":"methods (e.g., boosted trees, deep neural networks, random forests, and their aggregated and hybrid versions) can be used to learn the nonparametric/high-dimensional components of the model. These include a number of supporting auxilliary results that are of major independent interest: namely, we (1) prove uniform validity of a multiplier bootstrap, (2) offer a uniformly valid functional delta method, and (3) provide results for sparsity-based estimation of regression functions for function-valued outcomes.","element":"span"}]]},{"heading":"1. Introduction","paragraphs":[[{"text":"$3d","element":"span"},{"style":{"height":6.4},"width":15,"height":16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/1-0.png","element":"img","alt":"1","inline":true}],[{"text":"$3e","element":"span"}],[{"text":"In this paper, we consider estimation of the effect of an ","element":"span"},{"style":{"fontStyle":"italic"},"text":"endogenous ","element":"span"},{"text":"binary treatment, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D","element":"span"},{"text":", on an outcome, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":", in the presence of a binary instrumental variable, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z","element":"span"},{"text":", in settings with very many potential controls, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":"). Allowing many potential controls expressly covers both the case where there are simply many controls (where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":") = ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":")) and the case where there are many technical controls ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":") generated as transformations such as powers, b-splines, or interactions of raw controls,","element":"span"},{"style":{"height":18.36},"width":85.02,"height":45.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/1-1.png","element":"img","alt":"2 X,","inline":true,"padRight":true},{"text":"along with combinations of the two cases. The notation ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":") naturally accommodates these cases, and we call ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":") the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"controls ","element":"span"},{"text":"regardless of the case. We allow for fully ","element":"span"},{"style":{"fontStyle":"italic"},"text":"heterogeneous ","element":"span"},{"text":"treatment effects and thus focus on estimation of causal quantities that are appropriate in heterogeneous effects settings such as the local average treatment effect (LATE) or the local quantile treatment effect (LQTE). We focus our discussion on the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"endogenous ","element":"span"},{"text":"case where identification is obtained through the use of an instrumental variable, but all results carry through to the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"exogenous ","element":"span"},{"text":"case where the treatment is taken as exogenous unconditionally or after conditioning on sufficient controls by simply replacing the instrument with the treatment variable in the estimation and inference methods and in the formal results. In the latter case, LATE reduces to the average treatment effect (ATE) and LQTE to the quantile treatment effect (QTE).","element":"span"}],[{"text":"The methodology for estimating treatment effects we consider allows for cases where the number of potential controls, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":":= dim ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":"), is much larger than the sample size, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". Of course, informative inference about causal parameters cannot proceed allowing for ","element":"span"},{"style":{"height":13.6},"width":116.11,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/2-0.png","element":"img","alt":" p ≫ n","inline":true,"padRight":true},{"text":"without further restrictions. We impose sufficient structure through the assumption that reduced form relationships such as the conditional expectations E","element":"span"},{"style":{"height":17.6},"width":607.57,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/2-1.png","element":"img","alt":"P [D|X], EP [Z|X], and EP [Y |X","inline":true},{"text":"] are approximately sparse. Intuitively, approximate sparsity imposes that these reduced form relationships can be represented up to a small approximation error as a linear combination, possibly inside of a known link function such as the logistic function, of a number ","element":"span"},{"style":{"height":11.2},"width":119.44,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/2-2.png","element":"img","alt":" s ≪ n","inline":true,"padRight":true},{"text":"of the variables in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":") whose identities are ","element":"span"},{"style":{"fontStyle":"italic"},"text":"a priori ","element":"span"},{"text":"unknown to the researcher. This assumption allows us to use methods for estimating models in high-dimensional sparse settings that are known to have good prediction properties to estimate the fundamental reduced form relationships. ","element":"span"},{"text":"We may then use these estimated reduced form quantities as inputs to estimating the causal parameters of interest. Approaching the problem of estimating treatment effects within this framework allows us to accommodate the realistic scenario in which a researcher is unsure about exactly which confounding variables or transformations of these confounds are important and so must search among a broad set of controls.","element":"span"}],[{"text":"Valid inference following model selection is non-trivial. ","element":"span"},{"text":"$3f","element":"span"},{"href":"#id-0","referenceIndex":73,"text":"(2008a; ","element":"a"},{"href":"#id-1","referenceIndex":74,"text":"2008b)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-2","referenceIndex":87,"text":"P¨otscher ","element":"a"},{"href":"#id-2","referenceIndex":87,"text":"(2009)","element":"a"},{"text":"; and Belloni, Chernozhukov, and Hansen ","element":"span"},{"href":"#id-3","referenceIndex":14,"text":"(2013a; ","element":"a"},{"href":"#id-4","referenceIndex":15,"text":"2014a)","element":"a"},{"text":".","element":"span"}],[{"text":"The ","element":"span"},{"style":{"fontStyle":"italic"},"text":"first main contribution ","element":"span"},{"text":"of this paper is providing inferential procedures for key parameters used in program evaluation that are theoretically valid within approximately sparse models allowing for imperfect model selection. Our procedures build upon ","element":"span"},{"href":"#id-5","referenceIndex":13,"text":"Belloni et al. ","element":"a"},{"href":"#id-5","referenceIndex":13,"text":"(2010) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-6","referenceIndex":9,"text":"Belloni et al. ","element":"a"},{"href":"#id-6","referenceIndex":9,"text":"(2012)","element":"a"},{"text":", who were the first to demonstrate in a highly specialized context, that valid inference can proceed following model selection allowing for model selection mistakes under two conditions. We formulate and extend these two conditions to a rather general moment-condition framework (e.g., ","element":"span"},{"href":"#id-7","referenceIndex":56,"text":"Hansen ","element":"a"},{"href":"#id-7","referenceIndex":56,"text":"(1982) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-8","referenceIndex":57,"text":"Hansen and Singleton ","element":"a"},{"href":"#id-8","referenceIndex":57,"text":"(1982)","element":"a"},{"text":") as follows. First, estimation should be based upon “orthogonal” moment conditions that are first-order insensitive to changes in the values of nuisance parameters that will be estimated using high-dimensional methods. Specifically, if the target parameter value ","element":"span"},{"style":{"height":10.62},"width":44.92,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-0.png","element":"img","alt":" α0","inline":true,"padRight":true},{"text":"is identified via the moment condition","element":"span"}],[{"style":{"width":"59%"},"width":1123,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-1.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.02},"width":42.14,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-2.png","element":"img","alt":" h0","inline":true,"padRight":true},{"text":"is a function-valued nuisance parameter estimated via a model-selection or regularization method, one needs to use a moment function, ","element":"span"},{"style":{"height":16.4},"width":29,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-3.png","element":"img","alt":" ψ","inline":true},{"text":", such that the corresponding moment condition is orthogonal with respect to perturbations of ","element":"span"},{"style":{"height":15.02},"width":230.72,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-4.png","element":"img","alt":" h around h0","inline":true},{"text":". More formally, the moment condition should satisfy the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Neyman orthogonality condition","element":"span"}],[{"id":"id-9","style":{"width":"63%"},"width":1190,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-5.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.64},"width":43.16,"height":39.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-6.png","element":"img","alt":" ∂h","inline":true,"padRight":true},{"text":"is a functional derivative operator with respect to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"restricted to directions of possible deviations of estimators of ","element":"span"},{"style":{"height":15.02},"width":230.09,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-7.png","element":"img","alt":" h0 from h0.","inline":true,"padRight":true},{"text":"Second, one needs to ensure that the model selection mistakes occurring in the estimation of nuisance parameters are uniformly “moderately” small with respect to the underlying model. Specifically, we will require that the nuisance parameter ","element":"span"},{"style":{"height":15.02},"width":42.14,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-8.png","element":"img","alt":" h0","inline":true,"padRight":true},{"text":"is estimated at the rate ","element":"span"},{"style":{"height":20.34},"width":141.53,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-9.png","element":"img","alt":" o(n−1/4","inline":true},{"text":"), which ensures small bias, and that the estimator takes values in a space whose entropy does not grow too fast, which ensures no overfitting. In this paper, we establish that building estimators based upon moment conditions with the orthogonality condition ","element":"span"},{"href":"#id-9","text":"(1.2) ","element":"a"},{"text":"holding ensures that crude estimation of ","element":"span"},{"style":{"height":15.02},"width":42.14,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-10.png","element":"img","alt":" h0","inline":true,"padRight":true},{"text":"via post-selection or other regularization methods has an asymptotically negligible effect on the estimation of ","element":"span"},{"style":{"height":10.62},"width":44.91,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-11.png","element":"img","alt":" α0","inline":true,"padRight":true},{"text":"in general frameworks. ","element":"span"},{"text":"It then follows that we can form a regular, root-","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"consistent estimator of ","element":"span"},{"style":{"height":10.62},"width":44.92,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-12.png","element":"img","alt":" α0","inline":true},{"text":", uniformly with respect to the underlying model.","element":"span"}],[{"text":"In the endogenous treatment effects setting, we build moment conditions satisfying ","element":"span"},{"href":"#id-9","text":"(1.2) ","element":"a"},{"text":"from the efficient influence functions for certain reduced form parameters, building upon ","element":"span"},{"href":"#id-10","referenceIndex":54,"text":"Hahn ","element":"a"},{"href":"#id-10","referenceIndex":54,"text":"(1998)","element":"a"},{"text":". We illustrate how orthogonal moment conditions coupled with methods developed for forecasting in high-dimensional approximately sparse models can be used to estimate and obtain valid inferential statements about a wide variety of structural/treatment effects. We formally demonstrate the uniform validity of the resulting inference within a broad class of approximately sparse models including models where perfect model selection is theoretically impossible. An important feature of our main theoretical results is that they cover the use of variable selection for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"functional response ","element":"span"},{"style":{"height":16.4},"width":249.53,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-13.png","element":"img","alt":"data using ℓ1","inline":true},{"text":"-penalized methods. Functional response data arises, for example, when one is interested in the LQTE at not just a single quantile but over a range of quantile indices. Considering this case then necessitates looking at the functional dependent variable ","element":"span"},{"style":{"height":17.6},"width":461.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/3-14.png","element":"img","alt":" u �−→ 1(Y ⩽ u), where","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"denotes various levels that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":"can cross. Treating such functional response data allows us to provide a unified inference procedure for interesting quantities such as the (local) distributional and quantile effects of the treatment, including simpler important parameters such as LQTE at a given quantile as a special case.","element":"span"}],[{"text":"The ","element":"span"},{"style":{"fontStyle":"italic"},"text":"second main contribution ","element":"span"},{"text":"of this paper is providing a general set of results for uniformly valid estimation and inference methods in moment-condition problems, arising in structural analysis in econometrics and other data sciences. These results are useful not only for establishing the properties of treatment effects estimators developed here, but they are also useful for attacking a wide range of problems in structural econometrics. For example, ","element":"span"},{"href":"#id-11","referenceIndex":40,"text":"Chernozhukov et al. ","element":"a"},{"text":"(2015a) provide estimates of parameters characterizing a simple structural demand model based loosely on the analysis in ","element":"span"},{"href":"#id-12","referenceIndex":21,"text":"Berry et al. ","element":"a"},{"href":"#id-12","referenceIndex":21,"text":"(1995) ","element":"a"},{"text":"using the framework developed here; see also ","element":"span"},{"href":"#id-13","referenceIndex":41,"text":"Chernozhukov et al. ","element":"a"},{"href":"#id-13","referenceIndex":41,"text":"(2015b)","element":"a"},{"text":". A key element to our establishing uniform validity of post-regularization inference is again the use of Neyman orthogonal moment conditions. In the general framework we consider, we may have (a continuum of) target parameters identified via (a continuum of) moment conditions that involve (a continuum of) nuisance functions that will be estimated via Lasso, Post-Lasso, or some other high-quality machine learning method. Our general theory expressly allows for a wide variety of traditional and machine learning methods, including those that do not rely on approximate sparsity, as long as the methods","element":"span"}],[{"style":{"width":"40%"},"width":766,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/4-0.png","element":"img"}],[{"text":"By “not overfitting” we mean that the entropy of the function classes containing the realizations of the estimator of the nuisance function/parameter does not increase too rapidly with the sample size. This second condition can only be verified analytically, but can be avoided by the use of various data splitting methods. For example, we can set aside a vanishing fraction of the data to estimate the nuisance parameter, as in ","element":"span"},{"href":"#id-14","referenceIndex":22,"text":"Bickel ","element":"a"},{"href":"#id-14","referenceIndex":22,"text":"(1982)","element":"a"},{"text":", or employ cross-fitting, as in Belloni ","element":"span"},{"style":{"fontStyle":"italic"},"text":"et al. ","element":"span"},{"text":"(2010, 2012) and ","element":"span"},{"href":"#id-15","referenceIndex":33,"text":"Chernozhukov et al. ","element":"a"},{"href":"#id-15","referenceIndex":33,"text":"(2016)","element":"a"},{"text":". Either scheme ensures that there is no asymptotic efficiency loss from data-splitting. We refer the reader to ","element":"span"},{"href":"#id-15","referenceIndex":33,"text":"Chernozhukov et al. ","element":"a"},{"href":"#id-15","referenceIndex":33,"text":"(2016) ","element":"a"},{"text":"for a detailed discussion and analysis of cross-fitting in connection to inference on ATE and other causal parameters using machine learning methods for high-dimensional data.","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/4-1.png","element":"img","alt":"3","inline":true}],[{"text":"These results contain the results on treatment effects relevant for program evaluation, particularly the results for distributional and quantile effects, as a leading special case. These results are also immediately useful in other contexts such as nonseparable quantile models as in ","element":"span"},{"href":"#id-16","referenceIndex":38,"text":"Chernozhukov and ","element":"a"},{"href":"#id-16","referenceIndex":38,"text":"Hansen ","element":"a"},{"href":"#id-16","referenceIndex":38,"text":"(2005)","element":"a"},{"text":", ","element":"span"},{"href":"#id-17","referenceIndex":39,"text":"Chernozhukov and Hansen ","element":"a"},{"href":"#id-17","referenceIndex":39,"text":"(2006)","element":"a"},{"text":", ","element":"span"},{"href":"#id-18","referenceIndex":42,"text":"Chesher ","element":"a"},{"href":"#id-18","referenceIndex":42,"text":"(2003)","element":"a"},{"text":", and ","element":"span"},{"href":"#id-19","referenceIndex":65,"text":"Imbens and Newey ","element":"a"},{"href":"#id-19","referenceIndex":65,"text":"(2009)","element":"a"},{"text":"; semiparametric and partially identified models as in ","element":"span"},{"href":"#id-20","referenceIndex":46,"text":"Escanciano and Zhu ","element":"a"},{"href":"#id-20","referenceIndex":46,"text":"(2013)","element":"a"},{"text":"; and many others. In our results, we first establish a functional central limit theorem for the continuum of target parameters and show that this functional central limit theorem holds uniformly in a wide range of data-generating processes ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"with approximately sparse continua of nuisance functions. Second, we establish a functional central limit theorem for the multiplier bootstrap that resamples the first order approximations to the standardized estimators and demonstrate its uniform-in-","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"validity. These uniformity results build upon and complement those given in ","element":"span"},{"href":"#id-21","referenceIndex":90,"text":"Romano and Shaikh ","element":"a"},{"href":"#id-21","referenceIndex":90,"text":"(2012) ","element":"a"},{"text":"for the empirical bootstrap. Third, we establish a functional delta method for smooth functionals of the continuum of target parameters and a functional delta method for the multiplier bootstrap of these smooth functionals, both of which hold uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", using an appropriately strengthened notion of Hadamard differentiability. All of these results are new and are of independent interest outside of the treatment effects focus of this paper.","element":"span"}],[{"text":"We illustrate the use of our methods by estimating the effect of 401(k) eligibility and 401(k) participation on measures of accumulated assets as in ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004)","element":"a"},{"text":".","element":"span"},{"style":{"height":15.56},"width":172.83,"height":38.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/5-0.png","element":"img","alt":"4 Similar","inline":true,"padRight":true},{"text":"to ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004)","element":"a"},{"text":", we provide estimates of ATE and QTE of 401(k) eligibility and of LATE and LQTE of 401(k) participation. We differ from this previous work by using the high-dimensional methods developed in this paper to allow ourselves to consider a broader set of controls than has previously been considered. We find that 401(k) participation has a moderate impact on accumulated financial assets at low quantiles while appearing to have a much larger impact at high quantiles. ","element":"span"},{"text":"Interpreting the quantile index as “preference for savings” as in ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004)","element":"a"},{"text":", this pattern suggests that 401(k) participation has little causal impact on the accumulated financial assets of those with low desire to save but a much larger impact on those with stronger preferences for saving. ","element":"span"},{"text":"It is interesting that these results are similar to those in ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004) ","element":"a"},{"text":"despite allowing for a much richer set of controls.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Links to the literature. ","element":"span"},{"text":"The Neyman orthogonality condition embodied in ","element":"span"},{"href":"#id-9","text":"(1.2) ","element":"a"},{"text":"has a long history in statistics and econometrics. For example, this type of orthogonality was used by ","element":"span"},{"href":"#id-23","referenceIndex":81,"text":"Neyman ","element":"a"},{"href":"#id-23","referenceIndex":81,"text":"(1979) ","element":"a"},{"text":"in low-dimensional settings to deal with crudely estimated parametric nuisance parameters. See also ","element":"span"},{"href":"#id-24","referenceIndex":78,"text":"Newey ","element":"a"},{"href":"#id-24","referenceIndex":78,"text":"(1990)","element":"a"},{"text":", ","element":"span"},{"href":"#id-25","referenceIndex":6,"text":"Andrews ","element":"a"},{"href":"#id-25","referenceIndex":6,"text":"(1994b)","element":"a"},{"text":", ","element":"span"},{"href":"#id-26","referenceIndex":79,"text":"Newey ","element":"a"},{"href":"#id-26","referenceIndex":79,"text":"(1994)","element":"a"},{"text":", ","element":"span"},{"href":"#id-27","referenceIndex":88,"text":"Robins and Rotnitzky ","element":"a"},{"href":"#id-27","referenceIndex":88,"text":"(1995)","element":"a"},{"text":", and ","element":"span"},{"href":"#id-28","referenceIndex":75,"text":"Linton ","element":"a"},{"href":"#id-28","referenceIndex":75,"text":"(1996) ","element":"a"},{"text":"for the use of this condition in semi-parametric problems.","element":"span"}],[{"text":"To the best of our knowledge, ","element":"span"},{"href":"#id-5","referenceIndex":13,"text":"Belloni et al. ","element":"a"},{"href":"#id-5","referenceIndex":13,"text":"(2010) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-6","referenceIndex":9,"text":"Belloni et al. ","element":"a"},{"href":"#id-6","referenceIndex":9,"text":"(2012) ","element":"a"},{"text":"were the first to use the orthogonality ","element":"span"},{"href":"#id-9","text":"(1.2) ","element":"a"},{"text":"to expressly address the question of the uniform post-selection inference without imposing “beta-min” conditions, either in high-dimensional settings with ","element":"span"},{"style":{"height":13.6},"width":115.83,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/5-1.png","element":"img","alt":" p ≫ n","inline":true,"padRight":true},{"text":"or in low-dimensional settings with ","element":"span"},{"style":{"height":13.6},"width":115.83,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/5-2.png","element":"img","alt":" p ≪ n","inline":true},{"text":". They applied it to the specific problem of the linear instrumental variables model with many instruments where the nuisance function ","element":"span"},{"style":{"height":15.02},"width":42.14,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/5-3.png","element":"img","alt":" h0","inline":true,"padRight":true},{"text":"is the optimal instrument estimated by Lasso or Post-Lasso methods and ","element":"span"},{"style":{"height":10.62},"width":44.92,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/5-4.png","element":"img","alt":" α0","inline":true,"padRight":true},{"text":"is the coefficient of the endogenous regressor. ","element":"span"},{"href":"#id-3","referenceIndex":14,"text":"Belloni et al. ","element":"a"},{"href":"#id-3","referenceIndex":14,"text":"(2013a) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-4","referenceIndex":15,"text":"Belloni et al. ","element":"a"},{"href":"#id-4","referenceIndex":15,"text":"(2014a) ","element":"a"},{"text":"also exploited this approach to develop a doubleselection method that yields valid post-selection inference on the parameters of the linear part of a partially linear model and on average treatment effects when the treatment is binary and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"exogenous ","element":"span"},{"text":"conditional on controls in both the ","element":"span"},{"style":{"height":19.16},"width":608.24,"height":47.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/5-5.png","element":"img","alt":" p ≫ n and the p ≪ n setting.5 ","inline":true,"padRight":true},{"text":"Subsequently, ","element":"span"},{"href":"#id-29","referenceIndex":48,"text":"Farrell ","element":"a"},{"href":"#id-29","referenceIndex":48,"text":"(2015) ","element":"a"},{"text":"extended the results of ","element":"span"},{"href":"#id-3","referenceIndex":14,"text":"Belloni et al. ","element":"a"},{"href":"#id-3","referenceIndex":14,"text":"(2013a) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-4","referenceIndex":15,"text":"Belloni et al. ","element":"a"},{"href":"#id-4","referenceIndex":15,"text":"(2014a) ","element":"a"},{"text":"to estimation of ATE when the treatment is multivalued and exogenous conditional on controls using group penalization for selection. Note that this previous work on treatment effects covers only the exogenous case and does not allow for functional responses which are necessary, for example, for working with distributional or quantile treatment effects.","element":"span"}],[{"text":"Our work also contributes to the line of research on obtaining ","element":"span"},{"style":{"height":17.6},"width":62.37,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/6-0.png","element":"img","alt":"√n","inline":true},{"text":"-consistent and asymptotically normal estimates for low-dimensional components within traditional semiparametric frameworks as in the important work by ","element":"span"},{"href":"#id-14","referenceIndex":22,"text":"Bickel ","element":"a"},{"href":"#id-14","referenceIndex":22,"text":"(1982)","element":"a"},{"text":", ","element":"span"},{"href":"#id-30","referenceIndex":89,"text":"Robinson ","element":"a"},{"href":"#id-30","referenceIndex":89,"text":"(1988)","element":"a"},{"text":", ","element":"span"},{"href":"#id-24","referenceIndex":78,"text":"Newey ","element":"a"},{"href":"#id-24","referenceIndex":78,"text":"(1990)","element":"a"},{"text":", ","element":"span"},{"href":"#id-31","referenceIndex":97,"text":"van der Vaart ","element":"a"},{"href":"#id-31","referenceIndex":97,"text":"(1991)","element":"a"},{"text":", ","element":"span"},{"href":"#id-25","referenceIndex":6,"text":"Andrews ","element":"a"},{"href":"#id-25","referenceIndex":6,"text":"(1994b)","element":"a"},{"text":", ","element":"span"},{"href":"#id-26","referenceIndex":79,"text":"Newey ","element":"a"},{"href":"#id-26","referenceIndex":79,"text":"(1994)","element":"a"},{"text":", ","element":"span"},{"href":"#id-32","referenceIndex":3,"text":"Ai and Chen ","element":"a"},{"href":"#id-32","referenceIndex":3,"text":"(2003, ","element":"a"},{"href":"#id-33","referenceIndex":4,"text":"2012)","element":"a"},{"text":", and ","element":"span"},{"href":"#id-34","referenceIndex":32,"text":"Chen et al. ","element":"a"},{"href":"#id-34","referenceIndex":32,"text":"(2003)","element":"a"},{"text":". The major difference is that we allow for the use of modern high-dimensional methods, a.k.a. machine learning methods, for modeling and fitting the non-parametric (or high-dimensional) components of the model. In contrast to the former literature, we expressly allow for data-driven choice of the approximating model for the high-dimensional component, which addresses a crucial problem that arises in empirical work. Moreover, recent methods based on ","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/6-1.png","element":"img","alt":" ℓ1","inline":true},{"text":"-penalization, upon which we focus in this paper, allow for much more flexible modeling of the non-parametric/high-dimensional parts of the model.","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/6-2.png","element":"img","alt":"6 ","inline":true,"padRight":true},{"text":"Our general theory in Section 5 also allows, in principle, for a wide variety of both traditional and machine learning methods.","element":"span"}],[{"text":"The paper also generates a number of new results on sparse estimation with functional response data. These results are of independent interest in themselves, and they build upon the work of ","element":"span"},{"href":"#id-35","referenceIndex":10,"text":"Belloni and Chernozhukov ","element":"a"},{"href":"#id-35","referenceIndex":10,"text":"(2011) ","element":"a"},{"text":"who provided rates of convergence for variable selection when one is interested in estimating the quantile regression process with exogenous variables. More generally, this theoretical work complements and extends the rapidly growing set of results for ","element":"span"},{"style":{"height":16},"width":229.59,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/6-3.png","element":"img","alt":" ℓ1-penalized","inline":true,"padRight":true},{"text":"estimation methods; see, for example, ","element":"span"},{"href":"#id-36","referenceIndex":49,"text":"Frank and Friedman ","element":"a"},{"href":"#id-36","referenceIndex":49,"text":"(1993)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-37","referenceIndex":94,"text":"Tibshirani ","element":"a"},{"href":"#id-37","referenceIndex":94,"text":"(1996)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-38","referenceIndex":47,"text":"Fan and Li ","element":"a"},{"href":"#id-38","referenceIndex":47,"text":"(2001)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-39","referenceIndex":103,"text":"Zou ","element":"a"},{"href":"#id-39","referenceIndex":103,"text":"(2006)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-40","referenceIndex":25,"text":"Cand`es and Tao ","element":"a"},{"href":"#id-40","referenceIndex":25,"text":"(2007)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-41","referenceIndex":96,"text":"van de Geer ","element":"a"},{"href":"#id-41","referenceIndex":96,"text":"(2008)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-42","referenceIndex":62,"text":"Huang et al. ","element":"a"},{"href":"#id-42","referenceIndex":62,"text":"(2008)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-43","referenceIndex":24,"text":"Bickel et al. ","element":"a"},{"href":"#id-43","referenceIndex":24,"text":"(2009)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-44","referenceIndex":77,"text":"Meinshausen and Yu ","element":"a"},{"href":"#id-44","referenceIndex":77,"text":"(2009)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-45","referenceIndex":8,"text":"Bach ","element":"a"},{"href":"#id-45","referenceIndex":8,"text":"(2010)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-46","referenceIndex":63,"text":"Huang et al. ","element":"a"},{"href":"#id-46","referenceIndex":63,"text":"(2010)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-35","referenceIndex":10,"text":"Belloni and Chernozhukov ","element":"a"},{"href":"#id-35","referenceIndex":10,"text":"(2011)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-47","referenceIndex":68,"text":"Kato ","element":"a"},{"href":"#id-47","referenceIndex":68,"text":"(2011)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-6","referenceIndex":9,"text":"Belloni et al. ","element":"a"},{"href":"#id-6","referenceIndex":9,"text":"(2012)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-48","referenceIndex":11,"text":"Belloni and Chernozhukov ","element":"a"},{"href":"#id-48","referenceIndex":11,"text":"(2013)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-49","referenceIndex":16,"text":"Belloni et al. ","element":"a"},{"href":"#id-49","referenceIndex":16,"text":"(2013b)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-50","referenceIndex":19,"text":"Belloni et al. ","element":"a"},{"href":"#id-50","referenceIndex":19,"text":"(2013c)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-51","referenceIndex":26,"text":"Caner and Zhang ","element":"a"},{"href":"#id-51","referenceIndex":26,"text":"(2014)","element":"a"},{"text":"; and the references therein.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Plan of the Paper. ","element":"span"},{"text":"Section ","element":"span"},{"href":"#id-52","text":"2 ","element":"a"},{"text":"introduces the structural parameters for policy evaluation and relates these parameters to reduced form functions. Section ","element":"span"},{"text":"3 ","element":"span"},{"text":"describes a three step procedure to estimate and make inference on the structural parameters and functionals of these parameters, and Section ","element":"span"},{"text":"4 ","element":"span"},{"text":"provides asymptotic theory in the treatment effects setting. Section ","element":"span"},{"text":"5 ","element":"span"},{"text":"generalizes the setting and results to moment-condition problems with a continuum of structural parameters and a continuum of reduced form functions. Section ","element":"span"},{"href":"#id-53","text":"6 ","element":"a"},{"text":"derives general asymptotic theory for the Lasso and post-Lasso estimators for functional response data used in the estimation of the reduced form functions. Section ","element":"span"},{"href":"#id-54","text":"7 ","element":"a"},{"text":"presents the empirical application. We provide notation, proofs of key results, and details about implementation of the methods in the empirical example in Appendices ","element":"span"},{"href":"#id-55","text":"A–","element":"a"},{"href":"#id-56","text":"E. ","element":"a"},{"text":"An on-line Supplementary Appendix provides all remaining proofs, additional technical material, and results from a small Monte Carlo simulation ","element":"span"},{"href":"#id-57","referenceIndex":12,"text":"(Belloni et al., ","element":"a"},{"href":"#id-57","referenceIndex":12,"text":"2015)","element":"a"},{"text":".","element":"span"}]]},{"heading":"2. The Treatment Effects Setting and Target Parameters","paragraphs":[[{"id":"id-52","text":"2.1. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Observables and Reduced Form Parameters. ","element":"span"},{"text":"The observed random variables consist of ((","element":"span"},{"style":{"height":17.6},"width":294.31,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/6-4.png","element":"img","alt":"Yu)u∈U, X, Z, D","inline":true},{"text":"). The outcome variable of interest ","element":"span"},{"style":{"height":14.62},"width":45.33,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/6-5.png","element":"img","alt":" Yu","inline":true,"padRight":true},{"text":"is indexed by ","element":"span"},{"style":{"height":12.8},"width":110.77,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/6-6.png","element":"img","alt":" u ∈ U","inline":true},{"text":". We give examples of the index ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"below. The variable ","element":"span"},{"style":{"height":17.6},"width":290.57,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-0.png","element":"img","alt":" D ∈ D = {0, 1}","inline":true,"padRight":true},{"text":"is a binary indicator of the receipt of a treatment or participation in a program. It will typically be treated as endogenous; that is, we will typically view the treatment as assigned non-randomly with respect to the outcome. ","element":"span"},{"text":"The instrumental variable ","element":"span"},{"style":{"height":17.6},"width":289.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-1.png","element":"img","alt":" Z ∈ Z = {0, 1}","inline":true,"padRight":true},{"text":"is a binary indicator, such as an offer of participation, that is assumed to be randomly assigned conditional on the observable covariates ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"with support ","element":"span"},{"style":{"height":15.16},"width":66.64,"height":37.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-2.png","element":"img","alt":" X.7 ","inline":true,"padRight":true},{"text":"For example, we argue that 401(k) eligibility can be considered exogenous only after conditioning on income and other individual characteristics in the empirical application. The notions of exogeneity and endogeneity we employ are standard and thus omitted.","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-3.png","element":"img","alt":"8","inline":true}],[{"text":"The indexing of the outcome ","element":"span"},{"style":{"height":16.4},"width":156.9,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-4.png","element":"img","alt":" Yu by u","inline":true,"padRight":true},{"text":"is useful to analyze functional data. For example, ","element":"span"},{"style":{"height":14.62},"width":45.33,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-5.png","element":"img","alt":" Yu","inline":true,"padRight":true},{"text":"could represent an outcome falling short of a threshold, namely ","element":"span"},{"style":{"height":17.6},"width":269.82,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-6.png","element":"img","alt":" Yu = 1(Y ⩽ u","inline":true},{"text":"), in the context of distributional analysis; ","element":"span"},{"style":{"height":14.62},"width":45.34,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-7.png","element":"img","alt":" Yu","inline":true,"padRight":true},{"text":"could be a height indexed by age ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"in growth charts analysis; or ","element":"span"},{"style":{"height":15.02},"width":160.53,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-8.png","element":"img","alt":" Yu could","inline":true,"padRight":true},{"text":"be a health outcome indexed by a dosage ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"in dosage response studies. Our framework is tailored for such functional response data. The special case with no index is included by simply considering ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"to be a singleton set.","element":"span"}],[{"text":"We make use of two key types of reduced form parameters for estimating the structural parameters of interest – (local) treatment effects and related quantities. These reduced form parameters are defined as","element":"span"}],[{"id":"id-58","style":{"width":"71%"},"width":1336,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-9.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.56},"width":814.07,"height":38.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-10.png","element":"img","alt":" z = 0 or z = 1 are the fixed values of Z.9 ","inline":true,"padRight":true},{"text":"The function ","element":"span"},{"style":{"height":16},"width":254.23,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-11.png","element":"img","alt":" gV maps ZX","inline":true},{"text":", the support of the vector (","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z, X","element":"span"},{"text":"), to the real line ","element":"span"},{"text":"R ","element":"span"},{"text":"and is defined as","element":"span"}],[{"style":{"width":"64%"},"width":1201,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-12.png","element":"img"}],[{"text":"We use ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"to denote a target variable whose identity may change depending on the context such as ","element":"span"},{"style":{"height":17.6},"width":1017.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-13.png","element":"img","alt":"V = 1d(D)Yu or V = 1d(D) where 1d(D) := 1(D = d","inline":true},{"text":") is the indicator function.","element":"span"}],[{"text":"All the structural parameters we consider are smooth functionals of these reduced-form parameters. In our approach to estimating treatment effects, we estimate the key reduced form parameter ","element":"span"},{"style":{"height":17.6},"width":95.13,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/7-14.png","element":"img","alt":"αV (z","inline":true},{"text":") using modern methods to deal with high-dimensional data coupled with orthogonal estimating equations. The orthogonality property allows us to deal with the “non-regular” nature of penalized and post-selection estimators which do not admit linearizations except under very restrictive conditions. The use of regularization by model selection or penalization is in turn motivated by the desire to accommodate high-dimensional data.","element":"span"}],[{"id":"id-155","text":"2.2. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Target Structural Parameters – Local Treatment Effects. ","element":"span"},{"text":"The reduced form parameters defined in ","element":"span"},{"href":"#id-58","text":"(2.1) ","element":"a"},{"text":"are key because the structural parameters of interest are functionals of these","element":"span"}],[{"text":"elementary objects. The local average structural function (LASF) defined as","element":"span"}],[{"id":"id-62","style":{"width":"73%"},"width":1383,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-0.png","element":"img"}],[{"text":"underlies the formation of many commonly used treatment effects. Under standard assumptions, the LASF identifies average potential outcomes for the group of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"compliers","element":"span"},{"text":", individuals whose treatment status may be influenced by variation in the instrument, in the treated and non-treated states; see, e.g. Abadie ","element":"span"},{"href":"#id-59","referenceIndex":1,"text":"(2002; ","element":"a"},{"href":"#id-60","referenceIndex":2,"text":"2003)","element":"a"},{"text":". The local average treatment effect (LATE) of ","element":"span"},{"href":"#id-61","referenceIndex":64,"text":"Imbens and Angrist ","element":"a"},{"href":"#id-61","referenceIndex":64,"text":"(1994) ","element":"a"},{"text":"corresponds to the difference of the two values of the LASF:","element":"span"}],[{"id":"id-63","style":{"width":"57%"},"width":1081,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-1.png","element":"img"}],[{"text":"The term local designates that this parameter does not measure the effect on the entire population but rather measures the effect on the subpopulation of compliers.","element":"span"},{"style":{"height":8.4},"width":33.93,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-2.png","element":"img","alt":"10","inline":true}],[{"text":"When there is no endogeneity, formally when ","element":"span"},{"style":{"height":12},"width":127.52,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-3.png","element":"img","alt":" D ≡ Z","inline":true},{"text":", the LASF and LATE become the average structural function (ASF) and average treatment effect (ATE) on the entire population. Thus, our results cover this situation as a special case where the ASF and ATE simplify to","element":"span"}],[{"style":{"width":"76%"},"width":1437,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-4.png","element":"img"}],[{"text":"We also note that the impact of the instrument ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":"itself may be of interest since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":"often encodes an offer of participation in a program. In this case, the parameters of interest are again simply the reduced form parameters","element":"span"}],[{"style":{"width":"25%"},"width":484,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-5.png","element":"img"}],[{"text":"Thus, the LASF and LATE are primary targets of interest in this paper, and the ASF and ATE are subsumed as special cases.","element":"span"}],[{"text":"2.2.1. ","element":"span"},{"style":{"height":16.4},"width":1253.86,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-6.png","element":"img","alt":" Local Distribution and Quantile Treatment Effects. Setting Yu = Y","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-62","text":"(2.3) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-63","text":"(2.4) ","element":"a"},{"text":"provides the conventional LASF and LATE. An important generalization arises by letting ","element":"span"},{"style":{"height":17.6},"width":292.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-7.png","element":"img","alt":" Yu = 1(Y ⩽ u)","inline":true,"padRight":true},{"text":"be the indicator of the outcome of interest falling below a threshold ","element":"span"},{"style":{"height":12.8},"width":110.31,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-8.png","element":"img","alt":" u ∈ R","inline":true},{"text":". In this case, the family of effects","element":"span"}],[{"style":{"width":"60%"},"width":1129,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-9.png","element":"img"}],[{"text":"describe the local distribution treatment effects (LDTE). Similarly, we can look at the quantile left-inverse transform of the curve ","element":"span"},{"style":{"height":17.6},"width":249.91,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-10.png","element":"img","alt":" u �−→ θYu(d),","inline":true}],[{"style":{"width":"67%"},"width":1271,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-11.png","element":"img"}],[{"text":"and examine the family of local quantile treatment effects (LQTE):","element":"span"}],[{"style":{"width":"63%"},"width":1189,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/8-12.png","element":"img"}],[{"text":"The LQTE identify the differences of quantiles between the distribution of potential outcomes in the treated and non-treated states for compliers.","element":"span"}],[{"id":"id-156","text":"2.3. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Target Structural Parameters – Local Treatment Effects on the Treated. ","element":"span"},{"text":"We may also be interested in local treatment effects on the treated. The key object in defining these effects is the local average structural function on the treated (LASF-T) which is defined by its two values:","element":"span"}],[{"style":{"width":"72%"},"width":1362,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-0.png","element":"img"}],[{"text":"The LASF-T identifies average potential outcomes for the group of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"treated compliers ","element":"span"},{"text":"in the treated and non-treated states under standard assumptions. The local average treatment effect on the treated (LATE-T) introduced in ","element":"span"},{"href":"#id-64","referenceIndex":60,"text":"Hong and Nekipelov ","element":"a"},{"href":"#id-64","referenceIndex":60,"text":"(2010) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-65","referenceIndex":50,"text":"Fr¨olich and Melly ","element":"a"},{"href":"#id-65","referenceIndex":50,"text":"(2013) ","element":"a"},{"text":"is the difference of two values of the LASF-T:","element":"span"}],[{"style":{"width":"58%"},"width":1087,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-1.png","element":"img"}],[{"text":"The LATE-T may be of interest because it measures the average treatment effect for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"treated compliers","element":"span"},{"text":", namely the subgroup of compliers that actually receive the treatment.","element":"span"}],[{"text":"When the treatment is assigned randomly given controls so we can take ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z","element":"span"},{"text":", the LASF-T and LATE-T become the average structural function on the treated (ASF-T) and average treatment effect on the treated (ATE-T). In this special case, the ASF-T and ATE-T simplify to","element":"span"}],[{"style":{"width":"83%"},"width":1570,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-2.png","element":"img"}],[{"text":"and we can use our results to provide estimation and inference methods for these quantities.","element":"span"}],[{"text":"2.3.1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Local Distribution and Quantile Treatment Effects on the Treated. ","element":"span"},{"text":"Local distribution treatment effects on the treated (LDTE-T) and local quantile treatment effects on the treated (LQTE-T) can also be defined. As in Section 2.2.1, we let ","element":"span"},{"style":{"height":17.6},"width":267,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-3.png","element":"img","alt":" Yu = 1(Y ⩽ u","inline":true},{"text":") be the indicator of the outcome of interest falling below a threshold ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"text":". The family of treatment effects","element":"span"}],[{"style":{"width":"60%"},"width":1128,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-4.png","element":"img"}],[{"text":"then describes the LDTE-T. We can also use the quantile left-inverse transform of the curve ","element":"span"},{"style":{"height":9.6},"width":107.76,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-5.png","element":"img","alt":" u �−→","inline":true},{"style":{"height":18.37},"width":992.32,"height":45.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-6.png","element":"img","alt":"ϑYu(d), namely ϑ←Y (τ, d) := inf{u ∈ R : ϑYu(d) ⩾ τ},","inline":true,"padRight":true},{"text":"and define the LQTE-T:","element":"span"}],[{"style":{"width":"63%"},"width":1193,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-7.png","element":"img"}],[{"text":"Under conditional exogeneity LQTE and LQTE-T reduce to the quantile treatment effects (QTE) and quantile treatment effects on the treated (QTE-T) ","element":"span"},{"href":"#id-66","referenceIndex":71,"text":"(Koenker, ","element":"a"},{"href":"#id-66","referenceIndex":71,"text":"2005, ","element":"a"},{"text":"Chap. 2).","element":"span"}]]},{"heading":"3. Estimation of Reduced-Form and Structural Parameters in a Data-Rich Environment","paragraphs":[[{"text":"The key objects used to define the structural parameters in Section 2 are the expectations","element":"span"}],[{"style":{"width":"69%"},"width":1310,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-8.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":675.67,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-9.png","element":"img","alt":" gV (z, X) = EP [V |Z = z, X] and V","inline":true,"padRight":true},{"text":"denotes a variable whose identity will change with the context. Specifically, we shall vary ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"over the set ","element":"span"},{"style":{"height":14.62},"width":60.34,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-10.png","element":"img","alt":" Vu:","inline":true}],[{"style":{"width":"81%"},"width":1520,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/9-11.png","element":"img"}],[{"text":"It is clear that ","element":"span"},{"style":{"height":17.6},"width":146.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-0.png","element":"img","alt":" gV (z, X","inline":true},{"text":") will play an important role in estimating ","element":"span"},{"style":{"height":17.6},"width":95.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-1.png","element":"img","alt":" αV (z","inline":true},{"text":"). A related function that will also play an important role in forming a robust estimation strategy is the propensity score","element":"span"}],[{"style":{"width":"99%"},"width":1867,"height":128,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-2.png","element":"img"}],[{"text":"We will denote other potential values for the functions ","element":"span"},{"style":{"height":16.4},"width":211.25,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-3.png","element":"img","alt":" gV and mZ","inline":true,"padRight":true},{"text":"by the parameters ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":", respectively. We can then estimate ","element":"span"},{"style":{"height":17.6},"width":95.13,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-4.png","element":"img","alt":" αV (z","inline":true},{"text":") by estimating ","element":"span"},{"style":{"height":16.4},"width":208.16,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-5.png","element":"img","alt":" gV and mZ","inline":true,"padRight":true},{"text":"using high-dimensional modeling and estimation methods.","element":"span"},{"style":{"height":8.4},"width":33.93,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-6.png","element":"img","alt":"11","inline":true}],[{"text":"In the rest of this section, we describe the estimation of the reduced-form and structural parameters. The estimation method consists of 3 steps:","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1) ","element":"span"},{"text":"Estimate the predictive relationships ","element":"span"},{"style":{"height":16.4},"width":205.3,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-7.png","element":"img","alt":" mZ and gV","inline":true,"padRight":true},{"text":"using high-dimensional nonparametric methods with model selection.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"2) ","element":"span"},{"text":"Estimate the reduced form parameters ","element":"span"},{"style":{"height":16},"width":206.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-8.png","element":"img","alt":" αV and γV","inline":true,"padRight":true},{"text":"using orthogonal estimating equations to immunize the reduced form estimators to imperfect model selection in the first step.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"3) ","element":"span"},{"text":"Estimate the structural parameters and effects via the plug-in rule.","element":"span"}],[{"id":"id-71","text":"3.1. ","element":"span"},{"style":{"height":16.4},"width":1061.97,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-9.png","element":"img","alt":" First Step: Modeling and Estimating gV and mZ.","inline":true,"padRight":true},{"text":"In this section, we discuss estimation of the conditional expectation functions ","element":"span"},{"style":{"height":16.4},"width":241.4,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-10.png","element":"img","alt":" gV and mZ.","inline":true,"padRight":true},{"text":"Since these functions are unknown and potentially complicated, we use a generalized linear combination of a large number of control terms","element":"span"}],[{"style":{"width":"59%"},"width":1119,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-11.png","element":"img"}],[{"text":"to approximate ","element":"span"},{"style":{"height":16.4},"width":212.77,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-12.png","element":"img","alt":" gV and mZ","inline":true},{"text":". Specifically, we use","element":"span"}],[{"id":"id-67","style":{"width":"84%"},"width":1578,"height":202,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-13.png","element":"img"}],[{"text":"In these equations, ","element":"span"},{"style":{"height":17.6},"width":336.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-14.png","element":"img","alt":" rV (z, x) and rZ(x","inline":true},{"text":") are approximation errors, and the functions Λ","element":"span"},{"style":{"height":17.6},"width":254.87,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-15.png","element":"img","alt":"V (f(z, x)′βV )","inline":true,"padRight":true},{"text":"and Λ","element":"span"},{"style":{"height":17.6},"width":189.97,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-16.png","element":"img","alt":"Z(f(x)′βZ","inline":true},{"text":") are generalized linear approximations to the target functions ","element":"span"},{"style":{"height":17.6},"width":417.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-17.png","element":"img","alt":" gV (z, x) and mZ(1, x).","inline":true,"padRight":true},{"text":"The functions Λ","element":"span"},{"style":{"height":15.5},"width":194.87,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-18.png","element":"img","alt":"V and ΛZ","inline":true,"padRight":true},{"text":"are taken to be known link functions Λ. ","element":"span"},{"text":"The most common example is the linear link Λ(","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"text":") = ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"text":". ","element":"span"},{"text":"When the response variable is binary, we may also use the logistic link Λ(","element":"span"},{"style":{"height":17.6},"width":535.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-19.png","element":"img","alt":"u) = Λ0(u) = eu/(1 + eu","inline":true},{"text":") and its complement 1 ","element":"span"},{"style":{"height":17.6},"width":140.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-20.png","element":"img","alt":" − Λ0(u","inline":true},{"text":") or the probit link Λ(","element":"span"},{"style":{"height":23.36},"width":703.48,"height":58.41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-21.png","element":"img","alt":"u) = Φ(u) = (2π)−1/2 � u−∞ e−z2/2dz","inline":true,"padRight":true},{"text":"and its complement 1 ","element":"span"},{"style":{"height":17.6},"width":149.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-22.png","element":"img","alt":" − Φ(u).","inline":true,"padRight":true},{"text":"For clarity, we use links from the finite set ","element":"span"},{"style":{"height":17.6},"width":561.47,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-23.png","element":"img","alt":" L = {Id, Φ, 1 − Φ, Λ0, 1 − Λ0}","inline":true,"padRight":true},{"text":"where Id is the identity (linear) link.","element":"span"}],[{"text":"As discussed in the Introduction, the dictionary of controls, denoted by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":"), can be “rich” in the sense that its dimension ","element":"span"},{"style":{"height":11.6},"width":129.84,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-24.png","element":"img","alt":" p = pn","inline":true,"padRight":true},{"text":"may be large relative to the sample size. Specifically, our results require only that log ","element":"span"},{"style":{"height":20.33},"width":199.94,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/10-25.png","element":"img","alt":" p = o(n1/3","inline":true},{"text":") along with other technical conditions. We also note that the functions ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"forming the dictionary can depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", but we suppress this dependence.","element":"span"}],[{"text":"Having very many controls ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":") creates a challenge for estimation and inference. ","element":"span"},{"text":"A useful condition that makes it possible to perform constructive estimation and inference in such cases is termed approximate sparsity or simply sparsity. Sparsity imposes that there exist approximations of the form given in ","element":"span"},{"href":"#id-67","text":"(3.5)","element":"a"},{"text":"-","element":"span"},{"href":"#id-67","text":"(3.7) ","element":"a"},{"text":"that require only a small number of non-zero coefficients to render the approximation errors small relative to estimation error. More formally, sparsity relies on two conditions. First, there must exist ","element":"span"},{"style":{"height":16.4},"width":203.01,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-0.png","element":"img","alt":" βV and βZ","inline":true,"padRight":true},{"text":"such that, for all ","element":"span"},{"style":{"height":17.6},"width":439.41,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-1.png","element":"img","alt":" V ∈ V := {Vu : u ∈ U},","inline":true}],[{"style":{"width":"59%"},"width":1114,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-2.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":85.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-3.png","element":"img","alt":" ∥x∥0","inline":true,"padRight":true},{"text":"is the number of non-zero components of vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"and all other norms we use are defined in Appendix A. That is, there are at most ","element":"span"},{"style":{"height":12.62},"width":225.54,"height":31.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-4.png","element":"img","alt":" s = sn ≪ n","inline":true,"padRight":true},{"text":"components of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z, X","element":"span"},{"text":") and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":") with nonzero coefficient in the approximations to ","element":"span"},{"style":{"height":16.4},"width":239.71,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-5.png","element":"img","alt":" gV and mZ.","inline":true,"padRight":true},{"text":"Second, the sparsity condition requires that the size of the resulting approximation errors is small compared to the conjectured size of the estimation error; namely, for all ","element":"span"},{"style":{"height":15.2},"width":130.81,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-6.png","element":"img","alt":" V ∈ V,","inline":true}],[{"style":{"width":"73%"},"width":1372,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-7.png","element":"img"}],[{"text":"Note that the size of the approximating model ","element":"span"},{"style":{"height":10.62},"width":125.23,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-8.png","element":"img","alt":" s = sn","inline":true,"padRight":true},{"text":"can grow with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"just as in standard series estimation, subject to the rate condition","element":"span"}],[{"style":{"width":"27%"},"width":513,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-9.png","element":"img"}],[{"text":"These conditions ensure that the functions ","element":"span"},{"style":{"height":16.4},"width":206.79,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-10.png","element":"img","alt":" gV and mZ","inline":true,"padRight":true},{"text":"are estimable at a ","element":"span"},{"style":{"height":20.33},"width":141.54,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-11.png","element":"img","alt":" o(n−1/4","inline":true},{"text":") rate and are used to derive asymptotic normality results for the structural and reduced-form parameter estimators. They could be relaxed through the use of sample splitting methods as in ","element":"span"},{"href":"#id-6","referenceIndex":9,"text":"Belloni et al. ","element":"a"},{"href":"#id-6","referenceIndex":9,"text":"(2012)","element":"a"},{"text":".","element":"span"}],[{"text":"The high-dimensional-sparse-model framework outlined above extends the standard framework in the program evaluation literature which assumes both that the identities of the relevant controls are known and that the number of such controls ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s ","element":"span"},{"text":"is small relative to the sample size.","element":"span"},{"style":{"height":18.36},"width":209.26,"height":45.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-12.png","element":"img","alt":"12 Instead,","inline":true,"padRight":true},{"text":"we assume that there are many, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":", potential controls of which at most ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s ","element":"span"},{"text":"controls suffice to achieve a desirable approximation to the unknown functions ","element":"span"},{"style":{"height":16.4},"width":219.1,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-13.png","element":"img","alt":" gV and mZ","inline":true},{"text":"; and we allow the identity and number of these controls to be unknown. Relying on this assumed sparsity, we use selection methods to choose approximately the right set of controls.","element":"span"}],[{"text":"Current estimation methods that exploit approximate sparsity employ different types of regularization aimed at producing estimators that theoretically perform well in high-dimensional settings while remaining computationally tractable. ","element":"span"},{"text":"Many widely used methods are based on ","element":"span"},{"style":{"height":15.02},"width":52.11,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/11-14.png","element":"img","alt":" ℓ1-","inline":true,"padRight":true},{"text":"penalization. The Lasso method is one such commonly used approach that adds a penalty for the weighted sum of the absolute values of the model parameters to the usual objective function of an M-estimator. A related approach is the Post-Lasso method which performs re-estimation of the model after selection of variables by Lasso. These methods are discussed at length in recent papers and review articles; see, for example, ","element":"span"},{"href":"#id-3","referenceIndex":14,"text":"Belloni et al. ","element":"a"},{"href":"#id-3","referenceIndex":14,"text":"(2013a)","element":"a"},{"text":".","element":"span"}],[{"text":"In the following, we outline the general features of the Lasso and Post-Lasso methods focusing on estimation of ","element":"span"},{"style":{"height":12},"width":46.82,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-0.png","element":"img","alt":" gV","inline":true,"padRight":true},{"text":". Given the data ( ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":20.9},"width":559.45,"height":52.25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-1.png","element":"img","alt":"Yi, ˜Xi)ni=1 = (Vi, f(Zi, Xi))ni=1","inline":true},{"text":", the Lasso estimator ","element":"span"},{"style":{"height":16.4},"width":177.2,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-2.png","element":"img","alt":" �βV solves","inline":true}],[{"id":"id-149","style":{"width":"92%"},"width":1734,"height":188,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":22.82},"width":444.61,"height":57.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-4.png","element":"img","alt":"�Ψ = diag(�l1, . . . ,�ldim( �X)","inline":true},{"text":") is a diagonal matrix of data-dependent penalty loadings, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"y, t","element":"span"},{"text":") = ","element":"span"},{"text":"(","element":"span"},{"style":{"height":19.13},"width":151.83,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-5.png","element":"img","alt":"y − t)2/","inline":true},{"text":"2 in the case of linear regression, and ","element":"span"},{"style":{"height":17.6},"width":967.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-6.png","element":"img","alt":" M(y, t) = −{1(y = 1) log ΛV (t) + 1(y = 0) log(1 −","inline":true,"padRight":true},{"text":"Λ","element":"span"},{"style":{"height":17.6},"width":117.91,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-7.png","element":"img","alt":"V (t))}","inline":true,"padRight":true},{"text":"in the case of binary regression. The penalty level, ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-8.png","element":"img","alt":" λ","inline":true},{"text":", and loadings, ","element":"span"},{"style":{"height":18.22},"width":402.1,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-9.png","element":"img","alt":"�lj, j = 1, ..., dim( �X),","inline":true,"padRight":true},{"text":"are selected to guarantee good theoretical properties of the method. We provide further discussion of these methods for estimation of a continuum of functions in Section ","element":"span"},{"href":"#id-53","text":"6, ","element":"a"},{"text":"and we specify detailed implementation algorithms used in the empirical example in Appendix ","element":"span"},{"href":"#id-68","text":"F. ","element":"a"},{"text":"A key consideration in this paper is that the penalty level needs to be set to account for the fact that we will be simultaneously estimating potentially a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"continuum ","element":"span"},{"text":"of Lasso regressions since our ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"varies over the list ","element":"span"},{"style":{"height":15.02},"width":188.6,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-10.png","element":"img","alt":" Vu with u","inline":true,"padRight":true},{"text":"varying over the index set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"text":".","element":"span"}],[{"text":"The Post-Lasso method uses ","element":"span"},{"style":{"height":16.4},"width":50.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-11.png","element":"img","alt":"�βV","inline":true,"padRight":true},{"text":"solely as a model selection device. Specifically, it makes use of the labels of the regressors with non-zero estimated coefficients, ","element":"span"},{"style":{"height":17.6},"width":313.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-12.png","element":"img","alt":"�IV := supp(�βV ).","inline":true,"padRight":true},{"text":"The Post-Lasso estimator is then a solution to","element":"span"}],[{"id":"id-150","style":{"width":"76%"},"width":1432,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-13.png","element":"img"}],[{"text":"A main contribution of this paper is establishing that the estimator ","element":"span"},{"style":{"height":19.4},"width":589.9,"height":48.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-14.png","element":"img","alt":" �gV (Z, X) = ΛV (f(Z, X)′ ¯βV ) of","inline":true,"padRight":true},{"text":"the regression function ","element":"span"},{"style":{"height":20.61},"width":718.74,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-15.png","element":"img","alt":" gV (Z, X), where ¯βV = �βV or ¯βV = ˜βV","inline":true,"padRight":true},{"text":", achieves the near oracle rate of convergence","element":"span"},{"style":{"height":20.8},"width":238.71,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-16.png","element":"img","alt":"�(s log p)/n","inline":true,"padRight":true},{"text":"and maintains desirable theoretic properties while allowing for a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"continuum ","element":"span"},{"text":"of response variables.","element":"span"}],[{"text":"Estimation of ","element":"span"},{"style":{"height":10.7},"width":63.31,"height":26.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-17.png","element":"img","alt":" mZ","inline":true,"padRight":true},{"text":"proceeds similarly. ","element":"span"},{"text":"The Lasso estimator ","element":"span"},{"style":{"height":16.4},"width":49.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-18.png","element":"img","alt":"�βZ","inline":true,"padRight":true},{"text":"and Post-Lasso estimator ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":16.4},"width":49.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-19.png","element":"img","alt":"βZ","inline":true,"padRight":true},{"text":"are defined analogously to ","element":"span"},{"style":{"height":19.8},"width":208.95,"height":49.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-20.png","element":"img","alt":"�βV and ˜βV","inline":true,"padRight":true},{"text":"using the data ( ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":20.9},"width":493.86,"height":52.25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-21.png","element":"img","alt":"Yi, ˜Xi)ni=1= (Zi, f(Xi))ni=1","inline":true},{"text":". The estimator ","element":"span"},{"style":{"height":20.6},"width":1212.73,"height":51.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-22.png","element":"img","alt":"�mZ(1, X) = ΛZ(f(X)′ ¯βZ) of mZ(X), with ¯βZ = �βZ or ¯βZ = ˜βZ","inline":true},{"text":", also achieves the near oracle rate of convergence","element":"span"},{"style":{"height":20.8},"width":238.71,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-23.png","element":"img","alt":"�(s log p)/n","inline":true,"padRight":true},{"text":"and has other good theoretic properties. The estimator of ","element":"span"},{"style":{"height":17.6},"width":180.27,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-24.png","element":"img","alt":" �mZ(0, X)","inline":true,"padRight":true},{"text":"is then formed as 1 ","element":"span"},{"style":{"height":17.6},"width":235.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-25.png","element":"img","alt":" − �mZ(1, X).","inline":true}],[{"text":"3.2. ","element":"span"},{"style":{"height":17.6},"width":1782.37,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-26.png","element":"img","alt":" Second Step: Robust Estimation of the Reduced-Form Parameters αV (z) and γV .","inline":true,"padRight":true},{"text":"Estimation of the key quantities ","element":"span"},{"style":{"height":17.6},"width":95.13,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/12-27.png","element":"img","alt":" αV (z","inline":true},{"text":") will make heavy use of orthogonal moment functions as defined in ","element":"span"},{"href":"#id-9","text":"(1.2)","element":"a"},{"text":". These moment functions are closely tied to efficient influence functions, where effi-ciency is in the sense of locally minimax semi-parametric efficiency. The use of these functions will deliver robustness with respect to the non-regularity of the post-selection and penalized estimators needed to manage high-dimensional data. The use of these functions also automatically delivers semi-parametric efficiency for estimating and performing inference on the reduced-form parameters and their smooth transformations – the structural parameters.","element":"span"}],[{"text":"The efficient influence function and orthogonal moment function for ","element":"span"},{"style":{"height":17.6},"width":506.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-0.png","element":"img","alt":" αV (z), z ∈ Z = {0, 1}, are","inline":true,"padRight":true},{"text":"given respectively by","element":"span"}],[{"id":"id-148","style":{"width":"78%"},"width":1461,"height":177,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-1.png","element":"img"}],[{"text":"This efficient influence function was derived by ","element":"span"},{"href":"#id-10","referenceIndex":54,"text":"Hahn ","element":"a"},{"href":"#id-10","referenceIndex":54,"text":"(1998)","element":"a"},{"text":"; it has recently been used by ","element":"span"},{"href":"#id-69","referenceIndex":28,"text":"Cattaneo ","element":"a"},{"href":"#id-69","referenceIndex":28,"text":"(2010) ","element":"a"},{"text":"in the series context (with ","element":"span"},{"style":{"height":13.6},"width":118.67,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-2.png","element":"img","alt":" p ≪ n","inline":true},{"text":") and ","element":"span"},{"href":"#id-70","referenceIndex":91,"text":"Rothe and Firpo ","element":"a"},{"href":"#id-70","referenceIndex":91,"text":"(2013) ","element":"a"},{"text":"in the kernel context. The efficient influence function and the moment function for ","element":"span"},{"style":{"height":11.6},"width":48.59,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-3.png","element":"img","alt":" γV","inline":true,"padRight":true},{"text":"are trivially given by","element":"span"}],[{"style":{"width":"71%"},"width":1343,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-4.png","element":"img"}],[{"text":"We then define estimators of the reduced-form parameters ","element":"span"},{"style":{"height":17.6},"width":309.38,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-5.png","element":"img","alt":" αV (z) and γV (z","inline":true},{"text":") as solutions ","element":"span"},{"style":{"height":8.4},"width":79.91,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-6.png","element":"img","alt":" α =","inline":true},{"style":{"height":17.6},"width":344.5,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-7.png","element":"img","alt":"�αV (z) and γ = �γV","inline":true,"padRight":true},{"text":"to the equations","element":"span"}],[{"style":{"width":"74%"},"width":1403,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-8.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":16.4},"width":208.29,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-9.png","element":"img","alt":" �gV and �mZ","inline":true,"padRight":true},{"text":"are constructed as in Section ","element":"span"},{"href":"#id-71","text":"3.1. ","element":"a"},{"text":"We apply this procedure to each variable name ","element":"span"},{"style":{"height":14.62},"width":135.22,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-10.png","element":"img","alt":"V ∈ Vu","inline":true,"padRight":true},{"text":"and obtain the estimator","element":"span"},{"style":{"height":8.4},"width":33.93,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-11.png","element":"img","alt":"13","inline":true}],[{"id":"id-86","style":{"width":"87%"},"width":1645,"height":60,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-12.png","element":"img"}],[{"text":"The estimator and the parameter are vectors in ","element":"span"},{"style":{"height":15.13},"width":63.94,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-13.png","element":"img","alt":" Rdρ ","inline":true,"padRight":true},{"text":"with dimension ","element":"span"},{"style":{"height":17.82},"width":417.56,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-14.png","element":"img","alt":" dρ = 3 × dim Vu = 15.","inline":true}],[{"id":"id-73","style":{"width":"97%"},"width":1819,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-15.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":14.62},"width":51.35,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-16.png","element":"img","alt":" Pn","inline":true,"padRight":true},{"text":"is a rich set of data generating processes ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"which includes cases where perfect model selection is impossible theoretically. The notation “","element":"span"},{"style":{"height":17.1},"width":220.64,"height":42.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-17.png","element":"img","alt":"Zn,P ⇝ ZP","inline":true,"padRight":true},{"text":"uniformly in ","element":"span"},{"style":{"height":14.62},"width":147.82,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-18.png","element":"img","alt":" P ∈ Pn","inline":true},{"text":"” is defined formally in Appendix ","element":"span"},{"href":"#id-55","text":"A ","element":"a"},{"text":"and can be read as “","element":"span"},{"style":{"height":17.1},"width":85.76,"height":42.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-19.png","element":"img","alt":"Zn,P","inline":true,"padRight":true},{"text":"is approximately distributed as ","element":"span"},{"style":{"height":14.7},"width":55.79,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-20.png","element":"img","alt":" ZP","inline":true,"padRight":true},{"text":"uniformly in ","element":"span"},{"style":{"height":14.62},"width":138.76,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-21.png","element":"img","alt":"P ∈ Pn","inline":true},{"text":".” This usage corresponds to the usual notion of asymptotic distribution extended to handle uniformity in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":".","element":"span"}],[{"text":"We then stack all the reduced form estimators and parameters over ","element":"span"},{"style":{"height":12.8},"width":163.32,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-22.png","element":"img","alt":" u ∈ U as","inline":true}],[{"style":{"width":"32%"},"width":606,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-23.png","element":"img"}],[{"text":"giving rise to the empirical reduced-form process ","element":"span"},{"style":{"height":12},"width":24.8,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-24.png","element":"img","alt":" �ρ","inline":true,"padRight":true},{"text":"and the reduced-form function-valued parameter ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-25.png","element":"img","alt":"ρ","inline":true},{"text":". We establish that ","element":"span"},{"style":{"height":17.77},"width":178.42,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-26.png","element":"img","alt":"√n(�ρ − ρ","inline":true},{"text":") is asymptotically Gaussian: In ","element":"span"},{"style":{"height":19.53},"width":168.32,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-27.png","element":"img","alt":" ℓ∞(U)dρ,","inline":true}],[{"style":{"width":"75%"},"width":1410,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-28.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.1},"width":59.94,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-29.png","element":"img","alt":" GP","inline":true,"padRight":true},{"text":"denotes the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":"-Brownian bridge ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"(van der Vaart and Wellner, ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"1996, ","element":"a"},{"text":"p. 81–82). This result contains ","element":"span"},{"href":"#id-73","text":"(3.17) ","element":"a"},{"text":"as a special case and again allows ","element":"span"},{"style":{"height":14.62},"width":51.35,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/13-30.png","element":"img","alt":" Pn","inline":true,"padRight":true},{"text":"to be a “rich” set of data generating processes ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"that includes cases where perfect model selection is impossible theoretically. Importantly, this result verifies that the functional central limit theorem applies to the reduced-form estimators in the presence of possible model selection mistakes.","element":"span"}],[{"text":"Since some of our objects of interest are complicated, inference can be facilitated by a multiplier bootstrap method as in ","element":"span"},{"href":"#id-74","referenceIndex":52,"text":"Gin´e and Zinn ","element":"a"},{"href":"#id-74","referenceIndex":52,"text":"(1984)","element":"a"},{"text":". We define ","element":"span"},{"style":{"height":17.6},"width":244.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-0.png","element":"img","alt":" �ρ∗ = (�ρ∗u)u∈U","inline":true},{"text":", a bootstrap draw of ","element":"span"},{"style":{"height":15.6},"width":106,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-1.png","element":"img","alt":" �ρ, via","inline":true}],[{"id":"id-78","style":{"width":"61%"},"width":1158,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-2.png","element":"img"}],[{"text":"Here (","element":"span"},{"style":{"height":18.09},"width":104.45,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-3.png","element":"img","alt":"ξi)ni=1 ","inline":true,"padRight":true},{"text":"are i.i.d. copies of ","element":"span"},{"style":{"height":16.4},"width":20,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-4.png","element":"img","alt":" ξ","inline":true,"padRight":true},{"text":"which are independently distributed from the data (","element":"span"},{"style":{"height":18.09},"width":214.8,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-5.png","element":"img","alt":"Wi)ni=1 and","inline":true,"padRight":true},{"text":"whose distribution ","element":"span"},{"style":{"height":17.24},"width":43.01,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-6.png","element":"img","alt":" Pξ","inline":true,"padRight":true},{"text":"does not depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":". We also impose that","element":"span"}],[{"style":{"width":"69%"},"width":1301,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-7.png","element":"img"}],[{"text":"Examples of ","element":"span"},{"style":{"height":16.4},"width":20,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-8.png","element":"img","alt":" ξ","inline":true,"padRight":true},{"text":"include (a) ","element":"span"},{"style":{"height":16.4},"width":344.62,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-9.png","element":"img","alt":" ξ = E −1, where E","inline":true,"padRight":true},{"text":"is a standard exponential random variable, (b) ","element":"span"},{"style":{"height":18},"width":133.51,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-10.png","element":"img","alt":" ξ = N,","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N ","element":"span"},{"text":"is a standard normal random variable, and (c) ","element":"span"},{"style":{"height":19.92},"width":781.13,"height":49.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-11.png","element":"img","alt":" ξ = N1/√2 + (N 22 − 1)/2, where N1 and","inline":true},{"style":{"height":16.62},"width":52.8,"height":41.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-12.png","element":"img","alt":"N2","inline":true,"padRight":true},{"text":"are mutually independent standard normal random variables.","element":"span"},{"style":{"height":8.4},"width":33.93,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-13.png","element":"img","alt":"14 ","inline":true,"padRight":true},{"text":"The choices of (a), (b), and (c) correspond respectively to the Bayesian bootstrap (e.g., ","element":"span"},{"href":"#id-75","referenceIndex":53,"text":"Hahn ","element":"a"},{"href":"#id-75","referenceIndex":53,"text":"(1997) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-76","referenceIndex":30,"text":"Chamberlain and ","element":"a"},{"href":"#id-76","referenceIndex":30,"text":"Imbens ","element":"a"},{"href":"#id-76","referenceIndex":30,"text":"(2003)","element":"a"},{"text":"), the Gaussian multiplier method (e.g, ","element":"span"},{"href":"#id-74","referenceIndex":52,"text":"Gin´e and Zinn ","element":"a"},{"href":"#id-74","referenceIndex":52,"text":"(1984) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996, ","element":"a"},{"text":"Chap. 3.6)), and the wild bootstrap method ","element":"span"},{"href":"#id-77","referenceIndex":76,"text":"(Mammen ","element":"a"},{"href":"#id-77","referenceIndex":76,"text":"(1993)","element":"a"},{"text":").","element":"span"},{"style":{"height":18.09},"width":103.41,"height":45.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-14.png","element":"img","alt":"15 �ψρu ","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-78","text":"(3.19) ","element":"a"},{"text":"is ","element":"span"},{"text":"an estimator of the influence function ","element":"span"},{"style":{"height":18.09},"width":48.43,"height":45.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-15.png","element":"img","alt":" ψρu ","inline":true,"padRight":true},{"text":"defined via the plug-in rule:","element":"span"}],[{"style":{"width":"96%"},"width":1809,"height":63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-16.png","element":"img"}],[{"text":"Note that this bootstrap is computationally efficient since it does not involve recomputing the influence functions ","element":"span"},{"style":{"height":19.16},"width":96.08,"height":47.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-17.png","element":"img","alt":"�ψρu.16","inline":true,"padRight":true},{"text":"Each new draw of (","element":"span"},{"style":{"height":18.09},"width":104.45,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-18.png","element":"img","alt":"ξi)ni=1 ","inline":true,"padRight":true},{"text":"generates a new draw of ","element":"span"},{"style":{"height":16.33},"width":39.56,"height":40.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-19.png","element":"img","alt":" �ρ∗","inline":true,"padRight":true},{"text":"holding the data ","element":"span"},{"text":"and the estimates of the influence functions fixed. This method simply amounts to resampling the first-order approximations to the estimators. Here we build upon prior uses of this or similar methods in low-dimensional settings such as ","element":"span"},{"href":"#id-79","referenceIndex":55,"text":"Hansen ","element":"a"},{"href":"#id-79","referenceIndex":55,"text":"(1996) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-80","referenceIndex":69,"text":"Kline and Santos ","element":"a"},{"href":"#id-80","referenceIndex":69,"text":"(2012)","element":"a"},{"text":".","element":"span"}],[{"id":"id-89","style":{"width":"99%"},"width":1870,"height":192,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-20.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":8.8},"width":69.64,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-21.png","element":"img","alt":" ⇝B","inline":true,"padRight":true},{"text":"denotes weak convergence of the bootstrap law in probability, as defined in Appendix ","element":"span"},{"text":"B.","element":"span"}],[{"text":"3.3. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Third Step: Robust Estimation of the Structural Parameters. ","element":"span"},{"text":"All structural parameters we consider take the form of smooth transformations of the reduced-form parameters:","element":"span"}],[{"style":{"width":"72%"},"width":1351,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-22.png","element":"img"}],[{"text":"The structural parameters may themselves carry an index ","element":"span"},{"style":{"height":16},"width":112.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-23.png","element":"img","alt":" q ∈ Q","inline":true,"padRight":true},{"text":"that can be different from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"text":"; for example, the LQTE is indexed by a quantile index ","element":"span"},{"style":{"height":17.6},"width":125.77,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/14-24.png","element":"img","alt":" q ∈ (0,","inline":true,"padRight":true},{"text":"1). This formulation includes as special cases all the structural functions of Section ","element":"span"},{"href":"#id-52","text":"2. ","element":"a"},{"text":"We estimate these quantities by the plug-in rule. We establish the asymptotic behavior of these estimators and the validity of the bootstrap as a corollary from the results outlined in Section 3.2 and the functional delta method (extended to handle uniformity in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":").","element":"span"}],[{"text":"For the application of the functional delta method, we require that the functional ","element":"span"},{"style":{"height":17.6},"width":209.74,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-0.png","element":"img","alt":" ρ �−→ φ(ρ)","inline":true,"padRight":true},{"text":"be Hadamard differentiable ","element":"span"},{"style":{"height":17.82},"width":597.76,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-1.png","element":"img","alt":" uniformly in ρ ∈ Dρ, where Dρ","inline":true,"padRight":true},{"text":"is a set that contains the true values ","element":"span"},{"style":{"height":16.4},"width":414.64,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-2.png","element":"img","alt":"ρ = ρP for all P ∈ Pn","inline":true},{"text":", tangentially to a subset that contains the realizations of ","element":"span"},{"style":{"height":15.1},"width":340.28,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-3.png","element":"img","alt":" ZP for all P ∈ Pn","inline":true,"padRight":true},{"text":"with derivative map ","element":"span"},{"href":"#id-81","style":{"height":22.28},"width":613.8,"height":55.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-4.png","element":"img","alt":" h �−→ φ′ρ(h) = (φ′ρ(h)(q))q∈Q.17","inline":true,"padRight":true},{"text":"We define the estimators of the structural ","element":"span"},{"text":"parameters and their bootstrap versions via the plug-in rule as","element":"span"}],[{"id":"id-90","style":{"width":"85%"},"width":1604,"height":61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-5.png","element":"img"}],[{"text":"We establish that these estimators are asymptotically Gaussian","element":"span"}],[{"style":{"width":"72%"},"width":1361,"height":62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-6.png","element":"img"}],[{"text":"and that the bootstrap consistently estimates their large sample distribution:","element":"span"}],[{"style":{"width":"74%"},"width":1395,"height":61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-7.png","element":"img"}],[{"text":"These results can be used to construct simultaneous confidence bands and test functional hypotheses on ∆ using the methods described for example in ","element":"span"},{"href":"#id-82","referenceIndex":35,"text":"Chernozhukov and Fern´andez-Val ","element":"a"},{"href":"#id-82","referenceIndex":35,"text":"(2005) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-83","referenceIndex":36,"text":"Chernozhukov et al. ","element":"a"},{"href":"#id-83","referenceIndex":36,"text":"(2013)","element":"a"},{"text":".","element":"span"}]]},{"heading":"4. Theory: Estimation and Inference on Local Treatment Effects Functionals","paragraphs":[[{"text":"Consider fixed sequences of numbers ","element":"span"},{"style":{"height":16.8},"width":428.82,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-8.png","element":"img","alt":" δn ↘ 0, ϵn ↘ 0, ∆n ↘","inline":true,"padRight":true},{"text":"0, at a speed at most polynomial in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"(for example, ","element":"span"},{"style":{"height":17.6},"width":730.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-9.png","element":"img","alt":" δn ⩾ 1/nc for some c > 0), ℓn := log n","inline":true},{"text":", and positive constants ","element":"span"},{"style":{"height":17.6},"width":361.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-10.png","element":"img","alt":" c, C, and c′ < 1/2.","inline":true,"padRight":true},{"text":"These sequences and constants will not vary with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":". The probability ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"can vary in the set ","element":"span"},{"style":{"height":14.62},"width":51.35,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-11.png","element":"img","alt":" Pn","inline":true,"padRight":true},{"text":"of probability measures, termed “data-generating processes”, where ","element":"span"},{"style":{"height":14.62},"width":51.35,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-12.png","element":"img","alt":" Pn","inline":true,"padRight":true},{"text":"is typically a set that is weakly increasing in ","element":"span"},{"style":{"height":15.82},"width":340.96,"height":39.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-13.png","element":"img","alt":" n, i.e. Pn ⊆ Pn+1","inline":true},{"text":". Other definitions and notation are collected in Appendix A.","element":"span"}],[{"id":"id-84","style":{"fontWeight":"bold"},"text":"Assumption 4.1 ","element":"span"},{"text":"(Basic Assumptions)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"(i) Consider a random element ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with values in a measure space ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":156.19,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-14.png","element":"img","alt":"W, AW)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and law determined by a probability measure ","element":"span"},{"style":{"height":14.62},"width":171.99,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-15.png","element":"img","alt":" P ∈ Pn.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"The observed data ","element":"span"},{"text":"((","element":"span"},{"style":{"height":18.09},"width":468.73,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-16.png","element":"img","alt":"Wui)u∈U)ni=1 consist of n","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"i.i.d. copies of a random element ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":692.5,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-17.png","element":"img","alt":"Wu)u∈U = ((Yu)u∈U, D, Z, X), where","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is a Polish space equipped with its Borel sigma-field and ","element":"span"},{"text":"(","element":"span"},{"style":{"height":19.53},"width":690.18,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-18.png","element":"img","alt":"Yu, D, Z, X) ∈ R3+dX. Each Wu is","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"generated via a measurable transform ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"W, u","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":", namely the map ","element":"span"},{"style":{"height":15.93},"width":404.14,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-19.png","element":"img","alt":" t : W × U �−→ R3+dX","inline":true}],[{"style":{"fontStyle":"italic"},"text":"is measurable, and the map can possibly depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"style":{"fontStyle":"italic"},"text":". Let","element":"span"}],[{"style":{"width":"74%"},"width":1398,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-20.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":19.29},"width":1166.74,"height":48.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-21.png","element":"img","alt":" J = {1, ..., 5}. (ii) For P := ∪∞n=n0Pn, the map u �−→ Yu ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"obeys the uniform continuity ","element":"span"},{"style":{"fontStyle":"italic"},"text":"property:","element":"span"}],[{"id":"id-81","style":{"width":"80%"},"width":1502,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/15-22.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where the second supremum in the first expression is taken over ","element":"span"},{"style":{"height":15.6},"width":335.76,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-0.png","element":"img","alt":" u, ¯u ∈ U, and U","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a totally bounded metric space equipped with a semi-metric ","element":"span"},{"style":{"height":15.5},"width":47.71,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-1.png","element":"img","alt":" dU","inline":true},{"style":{"fontStyle":"italic"},"text":". The uniform covering entropy of the set ","element":"span"},{"style":{"height":17.6},"width":354.81,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-2.png","element":"img","alt":"FP = {Yu : u ∈ U}","inline":true},{"style":{"fontStyle":"italic"},"text":", viewed as a collection of maps ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":411.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-3.png","element":"img","alt":"W, AW) �−→ R, obeys","inline":true}],[{"style":{"width":"49%"},"width":929,"height":78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-4.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"height":18.19},"width":723.16,"height":45.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-5.png","element":"img","alt":" P ∈ P, where FP (W) = supu∈U |Yu|","inline":true},{"style":{"fontStyle":"italic"},"text":", with the supremum taken over all finitely discrete probability measures ","element":"span"},{"style":{"height":17.6},"width":291.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-6.png","element":"img","alt":" Q on (W, AW)","inline":true},{"style":{"fontStyle":"italic"},"text":". (iii) For each ","element":"span"},{"style":{"height":12.8},"width":130.65,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-7.png","element":"img","alt":" P ∈ P","inline":true},{"style":{"fontStyle":"italic"},"text":", the conditional probability of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":"= 1 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"given ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is bounded away from zero or one, namely ","element":"span"},{"style":{"height":17.6},"width":485.57,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-8.png","element":"img","alt":" c′ ⩽ mZ(1, X) ⩽ 1 − c′ P","inline":true},{"style":{"fontStyle":"italic"},"text":"-a.s., the instrument ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"style":{"fontStyle":"italic"},"text":"has a non-trivial impact on ","element":"span"},{"style":{"height":17.6},"width":1286.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-9.png","element":"img","alt":" D, namely c′ ⩽ |PP [D = 1|Z = 1, X] − PP [D = 1|Z = 0, X]| P-a.s,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and the regression function ","element":"span"},{"style":{"height":12},"width":46.81,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-10.png","element":"img","alt":" gV","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is bounded, ","element":"span"},{"style":{"height":18.3},"width":542.78,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-11.png","element":"img","alt":" ∥gV ∥P,∞ < ∞ for all V ∈ V.","inline":true}],[{"text":"Assumption ","element":"span"},{"href":"#id-84","text":"4.1 ","element":"a"},{"text":"is stated to deal with the measurability issues associated with functional response data. This assumption also implies that the set of functions (","element":"span"},{"style":{"height":18.49},"width":274.42,"height":46.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-12.png","element":"img","alt":"ψρu)u∈U, where","inline":true}],[{"style":{"width":"29%"},"width":552,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-13.png","element":"img"}],[{"text":"is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":"-Donsker uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":". That is, it implies","element":"span"}],[{"id":"id-87","style":{"width":"71%"},"width":1331,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-14.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"height":15.1},"width":59.94,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-15.png","element":"img","alt":" GP","inline":true,"padRight":true},{"text":"denoting the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":"-Brownian bridge ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"(van der Vaart and Wellner, ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"1996, ","element":"a"},{"text":"p. 81–82) and with ","element":"span"},{"style":{"height":14.7},"width":55.79,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-16.png","element":"img","alt":"ZP","inline":true,"padRight":true},{"text":"having bounded, uniformly continuous paths uniformly in ","element":"span"},{"style":{"height":12.8},"width":133.35,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-17.png","element":"img","alt":" P ∈ P:","inline":true}],[{"id":"id-88","style":{"width":"85%"},"width":1597,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-18.png","element":"img"}],[{"text":"We work with the sequence of constants defined prior to Assumption ","element":"span"},{"href":"#id-84","text":"4.1.","element":"a"}],[{"id":"id-85","style":{"fontWeight":"bold"},"text":"Assumption 4.2 ","element":"span"},{"text":"(Approximate Sparsity)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under each ","element":"span"},{"style":{"height":14.62},"width":147.39,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-19.png","element":"img","alt":" P ∈ Pn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and for each ","element":"span"},{"style":{"height":13.82},"width":136.2,"height":34.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-20.png","element":"img","alt":" n ⩾ n0","inline":true},{"style":{"fontStyle":"italic"},"text":", uniformly for all ","element":"span"},{"style":{"height":12.8},"width":134.76,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-21.png","element":"img","alt":" V ∈ V","inline":true},{"style":{"fontStyle":"italic"},"text":": (i) The approximations ","element":"span"},{"href":"#id-67","style":{"fontStyle":"italic"},"text":"(3.5)","element":"a"},{"style":{"fontStyle":"italic"},"text":"-","element":"span"},{"href":"#id-67","style":{"fontStyle":"italic"},"text":"(3.7) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold with the link functions ","element":"span"},{"text":"Λ","element":"span"},{"style":{"height":15.5},"width":270.12,"height":38.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-22.png","element":"img","alt":"V and ΛZ be-","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"longing to the set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L","element":"span"},{"style":{"fontStyle":"italic"},"text":", the sparsity condition ","element":"span"},{"style":{"height":17.6},"width":371.38,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-23.png","element":"img","alt":" ∥βV ∥0 + ∥βZ∥0 ⩽ s","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"holding, the approximation errors satisfying ","element":"span"},{"style":{"height":21.04},"width":1165.27,"height":52.59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-24.png","element":"img","alt":" ∥rV ∥P,2 + ∥rZ∥P,2 ⩽ δnn−1/4 and ∥rV ∥P,∞ + ∥rZ∥P,∞ ⩽ ϵn","inline":true},{"style":{"fontStyle":"italic"},"text":", and the sparsity index ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and the number of terms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"style":{"fontStyle":"italic"},"text":"in the vector ","element":"span"},{"style":{"height":19.87},"width":775.89,"height":49.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-25.png","element":"img","alt":" f(X) obeying s2 log2(p ∨ n) log2 n ⩽ δnn","inline":true},{"style":{"fontStyle":"italic"},"text":". (ii) There are estimators ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":18.6},"width":202.69,"height":46.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-26.png","element":"img","alt":"βV and ¯βZ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that, with probability no less than ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":15.42},"width":99.61,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-27.png","element":"img","alt":" − ∆n","inline":true},{"style":{"fontStyle":"italic"},"text":", the estimation errors satisfy ","element":"span"},{"style":{"height":21.11},"width":1872.06,"height":52.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-28.png","element":"img","alt":"∥f(Z, X)′(¯βV − βV )∥Pn,2 + ∥f(X)′(¯βZ − βZ)∥Pn,2 ⩽ δnn−1/4, Kn∥¯βV − βV ∥1 + Kn∥¯βZ − βZ∥1 ⩽","inline":true},{"style":{"height":10.22},"width":38.71,"height":25.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-29.png","element":"img","alt":"ϵn","inline":true},{"style":{"fontStyle":"italic"},"text":"; the estimators are sparse such that ","element":"span"},{"style":{"height":19.41},"width":433.6,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-30.png","element":"img","alt":" ∥¯βV ∥0 + ∥¯βZ∥0 ⩽ Cs","inline":true},{"style":{"fontStyle":"italic"},"text":"; and the empirical and population norms induced by the Gram matrix formed by ","element":"span"},{"text":"(","element":"span"},{"style":{"height":18.09},"width":181.51,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-31.png","element":"img","alt":"f(Xi))ni=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are equivalent on sparse subsets, ","element":"span"},{"text":"sup","element":"span"},{"style":{"height":20.59},"width":844.98,"height":51.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-32.png","element":"img","alt":"∥δ∥0⩽ℓns |∥f(X)′δ∥Pn,2/∥f(X)′δ∥P,2 − 1| ⩽ ϵn","inline":true},{"style":{"fontStyle":"italic"},"text":". (iii) The following boundedness conditions hold: ","element":"span"},{"style":{"height":18.3},"width":762.95,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-33.png","element":"img","alt":"∥∥f(X)∥∞||P,∞ ⩽ Kn and ∥V ∥P,∞ ⩽ C.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Comment 4.1. ","element":"span"},{"text":"Assumption ","element":"span"},{"href":"#id-85","text":"4.2 ","element":"a"},{"text":"imposes simple intermediate-level conditions which encode both the approximate sparsity of the models as well as some reasonable behavior of the sparse estimators of ","element":"span"},{"style":{"height":16.4},"width":209.02,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/16-34.png","element":"img","alt":" mZ and gV","inline":true,"padRight":true},{"text":". These conditions significantly extend and generalize the conditions employed in the literature on adaptive estimation using series methods. The boundedness conditions are made to simplify arguments, and they could be removed at the cost of more complicated proofs and more stringent side conditions. Sufficient conditions for the equivalence between empirical and population norms and primitive examples of functions admitting sparse approximations are given in ","element":"span"},{"href":"#id-4","referenceIndex":15,"text":"Belloni ","element":"a"},{"href":"#id-4","referenceIndex":15,"text":"et al. ","element":"a"},{"href":"#id-4","referenceIndex":15,"text":"(2014a)","element":"a"},{"text":". We provide primitive conditions for Lasso estimators to satisfy the bounds above while addressing the problem of estimating continua of approximately sparse nuisance functions in Section 6. We expect that other sparsity-based estimators, such as the Dantzig selector or adaptive Lasso, could be used in the present context as well. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-0.png","element":"img","alt":"■","inline":true}],[{"text":"Under the stated assumptions, the empirical reduced form process ","element":"span"},{"style":{"height":18.47},"width":322.19,"height":46.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-1.png","element":"img","alt":"�Zn,P = √n(�ρ − ρ","inline":true},{"text":") defined by ","element":"span"},{"href":"#id-86","text":"(3.16) ","element":"a"},{"text":"obeys the following relations. We recall definitions of convergence uniformly in ","element":"span"},{"style":{"height":14.62},"width":199.47,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-2.png","element":"img","alt":" P ∈ Pn in","inline":true,"padRight":true},{"text":"Appendix ","element":"span"},{"href":"#id-55","text":"A.","element":"a"}],[{"id":"id-165","style":{"fontWeight":"bold"},"text":"Theorem 4.1 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Uniform Gaussianity of the Reduced-Form Parameter Process","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under Assumptions ","element":"span"},{"href":"#id-84","style":{"fontStyle":"italic"},"text":"4.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-85","style":{"fontStyle":"italic"},"text":"4.2, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"the reduced-form empirical process admits a linearization; namely,","element":"span"}],[{"style":{"width":"86%"},"width":1616,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"The process ","element":"span"},{"style":{"height":17.1},"width":85.75,"height":42.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-4.png","element":"img","alt":"�Zn,P","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is asymptotically Gaussian, namely","element":"span"}],[{"style":{"width":"75%"},"width":1422,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-5.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":14.7},"width":55.79,"height":36.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-6.png","element":"img","alt":" ZP","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is defined in ","element":"span"},{"href":"#id-87","style":{"fontStyle":"italic"},"text":"(4.2) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and its paths obey the property ","element":"span"},{"href":"#id-88","style":{"fontStyle":"italic"},"text":"(4.3)","element":"a"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"text":"Another main result of this section shows that the bootstrap law of the process","element":"span"}],[{"style":{"width":"99%"},"width":1869,"height":218,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-7.png","element":"img"}],[{"id":"id-166","style":{"fontWeight":"bold"},"text":"Theorem 4.2 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Validity of Multiplier Bootstrap for Inference on Reduced-Form Parameters","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under Assumptions ","element":"span"},{"href":"#id-84","style":{"fontStyle":"italic"},"text":"4.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-85","style":{"fontStyle":"italic"},"text":"4.2, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"the bootstrap law consistently approximates the large sample law ","element":"span"},{"style":{"height":17.5},"width":210.62,"height":43.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-8.png","element":"img","alt":" ZP of Zn,P","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":16.4},"width":317.29,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-9.png","element":"img","alt":" P ∈ Pn, namely,","inline":true}],[{"style":{"width":"76%"},"width":1436,"height":61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-10.png","element":"img"}],[{"text":"Next we consider inference on the structural functionals ∆ defined in ","element":"span"},{"href":"#id-89","text":"(3.22)","element":"a"},{"text":". We derive the large sample distribution of the estimator ","element":"span"},{"style":{"height":0},"width":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-11.png","element":"img","alt":"�","inline":true},{"text":"∆ in ","element":"span"},{"href":"#id-90","text":"(3.23)","element":"a"},{"text":", and show that the multiplier bootstrap law of ","element":"span"},{"style":{"height":12.8},"width":53.36,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-12.png","element":"img","alt":"�∆∗","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-90","text":"(3.23) ","element":"a"},{"text":"provides a consistent approximation to that distribution. We rely on the functional delta method in our derivations, which we modify to handle uniformity with respect to the underlying d.g.p. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":". Our argument relies on the following assumption on the structural functionals.","element":"span"}],[{"id":"id-92","style":{"fontWeight":"bold"},"text":"Assumption 4.3 ","element":"span"},{"text":"(Uniform Hadamard Differentiability of Structural Functionals)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose that for each ","element":"span"},{"style":{"height":17.42},"width":382.53,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-13.png","element":"img","alt":" P ∈ P, ρ = ρP ∈ Dρ","inline":true},{"style":{"fontStyle":"italic"},"text":", a compact metric space. Suppose ","element":"span"},{"style":{"height":17.6},"width":199.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-14.png","element":"img","alt":" ϱ �−→ φ(ϱ)","inline":true},{"style":{"fontStyle":"italic"},"text":", a functional of interest mapping ","element":"span"},{"style":{"height":20.55},"width":851.57,"height":51.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-15.png","element":"img","alt":" Dφ ⊂ D = ℓ∞(U)dρ to ℓ∞(Q), where Dρ ⊂ Dφ","inline":true},{"style":{"fontStyle":"italic"},"text":", is Hadamard differentiable in ","element":"span"},{"style":{"height":11.6},"width":23,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-16.png","element":"img","alt":" ϱ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"tangentially to ","element":"span"},{"style":{"height":19.53},"width":276.82,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-17.png","element":"img","alt":" D0 = UC(U)dρ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":17.42},"width":125.74,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-18.png","element":"img","alt":" ϱ ∈ Dρ","inline":true},{"style":{"fontStyle":"italic"},"text":", with the linear derivative map ","element":"span"},{"style":{"height":19.11},"width":261.35,"height":47.78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-19.png","element":"img","alt":" φ′ϱ : D0 �−→ D","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"mapping ","element":"span"},{"text":"(","element":"span"},{"style":{"height":19.92},"width":748.1,"height":49.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/17-20.png","element":"img","alt":"ϱ, h) �−→ φ′ϱ(h) from Dρ × D0 to ℓ∞(Q)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is continuous.","element":"span"}],[{"text":"The definition of uniform Hadamard differentiability is given in Definition ","element":"span"},{"href":"#id-91","text":"B.1 ","element":"a"},{"text":"of Appendix ","element":"span"},{"text":"B. ","element":"span"},{"text":"Assumption ","element":"span"},{"href":"#id-92","text":"4.3 ","element":"a"},{"text":"holds for all examples of structural parameters listed in Section 2.","element":"span"}],[{"text":"The following corollary gives the large sample law of ","element":"span"},{"style":{"height":17.77},"width":164.43,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-0.png","element":"img","alt":"√n(�∆ −","inline":true,"padRight":true},{"text":"∆), the properly normalized structural estimator. It also shows that the bootstrap law of ","element":"span"},{"style":{"height":17.77},"width":249.87,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-1.png","element":"img","alt":"√n(�∆∗ − �∆),","inline":true,"padRight":true},{"text":"computed conditionally on the data, approaches the large sample law ","element":"span"},{"style":{"height":17.77},"width":160.52,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-2.png","element":"img","alt":"√n(�∆ −","inline":true,"padRight":true},{"text":"∆). It follows from the previous theorems as well as from a more general result contained in Theorem ","element":"span"},{"href":"#id-93","text":"5.3.","element":"a"}],[{"id":"id-175","style":{"fontWeight":"bold"},"text":"Corollary 4.1 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Limit Theory and Validity of Multiplier Bootstrap for Smooth Structural Functionals","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under Assumptions ","element":"span"},{"href":"#id-84","style":{"fontStyle":"italic"},"text":"4.1, ","element":"a"},{"href":"#id-85","style":{"fontStyle":"italic"},"text":"4.2, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-92","style":{"fontStyle":"italic"},"text":"4.3,","element":"a"}],[{"style":{"width":"83%"},"width":1556,"height":61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":14.7},"width":51.5,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-4.png","element":"img","alt":" TP","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a zero mean tight Gaussian process, for each ","element":"span"},{"style":{"height":12.8},"width":120.43,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-5.png","element":"img","alt":" P ∈ P","inline":true},{"style":{"fontStyle":"italic"},"text":". Moreover,","element":"span"}],[{"style":{"width":"78%"},"width":1464,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-6.png","element":"img"}]]},{"heading":"5. General Theory: Honest Inference in General Moment Condition Problems with Nuisance Functions Estimated by Machine Learning Methods","paragraphs":[[{"text":"In this section, we consider a general moment condition framework, where possibly a continuum of target parameters is of interest and we use modern machine learning methods, with Lasso-type methods being a lead example, to estimate a continuum of high-dimensional nuisance functions. This setting covers a rich variety of modern moment-condition problems in econometrics including the treatment effects problem. We establish a functional central limit theorem for the estimators of the continuum of target parameters that holds uniformly in ","element":"span"},{"style":{"height":15.6},"width":323.65,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-7.png","element":"img","alt":" P ∈ P, where P","inline":true,"padRight":true},{"text":"includes a wide range of data-generating processes with well-approximable continuums of nuisance functions. We also derive a functional central limit theorem for the multiplier bootstrap that resamples the first order approximations to the standardized estimators of the continuum of target parameters and establish its uniform validity. Moreover, we establish the uniform validity of the functional delta method and the functional delta method for the multiplier bootstrap for smooth functionals of the continuum of target parameters using an appropriate strengthening of Hadamard differentiability.","element":"span"}],[{"text":"5.1. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Setting. ","element":"span"},{"text":"We are interested in function-valued target parameters indexed by ","element":"span"},{"style":{"height":15.93},"width":274.27,"height":39.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-8.png","element":"img","alt":" u ∈ U ⊂ Rdu.","inline":true,"padRight":true},{"text":"We denote the true value of the target parameter by","element":"span"}],[{"style":{"width":"58%"},"width":1089,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-9.png","element":"img"}],[{"text":"We assume that for each ","element":"span"},{"style":{"height":15.2},"width":136.22,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-10.png","element":"img","alt":" u ∈ U,","inline":true,"padRight":true},{"text":"the true value ","element":"span"},{"style":{"height":15.02},"width":40.48,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-11.png","element":"img","alt":" θu","inline":true,"padRight":true},{"text":"is identified as the solution to the following moment condition:","element":"span"}],[{"id":"id-95","style":{"width":"63%"},"width":1198,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-12.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":14.62},"width":61.21,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-13.png","element":"img","alt":" Wu","inline":true,"padRight":true},{"text":"is a random vector that takes values in a Borel set ","element":"span"},{"style":{"height":17.75},"width":216.73,"height":44.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-14.png","element":"img","alt":" Wu ⊂ Rdw ","inline":true,"padRight":true},{"text":"and contains as a subcomponent the vector ","element":"span"},{"style":{"height":14.62},"width":49.79,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-15.png","element":"img","alt":" Zu","inline":true,"padRight":true},{"text":"taking values in a Borel set ","element":"span"},{"style":{"height":14.62},"width":51.62,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-16.png","element":"img","alt":" Zu","inline":true},{"text":", the moment function","element":"span"}],[{"style":{"width":"86%"},"width":1612,"height":64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/18-17.png","element":"img"}],[{"text":"is a Borel measurable map, and the function","element":"span"}],[{"id":"id-97","style":{"width":"77%"},"width":1454,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-0.png","element":"img"}],[{"text":"is another Borel measurable map that denotes the possibly infinite-dimensional nuisance parameter. The sets ","element":"span"},{"style":{"height":17.6},"width":85.07,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-1.png","element":"img","alt":" Tu(z","inline":true},{"text":") are assumed to be convex for each ","element":"span"},{"style":{"height":15.02},"width":340.86,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-2.png","element":"img","alt":" u ∈ U and z ∈ Zu","inline":true},{"text":". Finite-dimensional nuisance parameters that do not depend on ","element":"span"},{"style":{"height":14.62},"width":49.79,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-3.png","element":"img","alt":" Zu","inline":true,"padRight":true},{"text":"are treated as part of ","element":"span"},{"style":{"height":15.02},"width":200.8,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-4.png","element":"img","alt":" hu as well.","inline":true}],[{"text":"We assume that the continuum of nuisance functions (","element":"span"},{"style":{"height":17.6},"width":130.89,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-5.png","element":"img","alt":"hu)u∈U","inline":true,"padRight":true},{"text":"is well-approximable and can be well estimated by the modern generation of statistical and machine learning methods. In particular, our regularity conditions allow for approximately sparse nuisance functions, which can be modeled and estimated using methods such as Lasso and Post-Lasso. We let ","element":"span"},{"style":{"height":21.4},"width":525.59,"height":53.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-6.png","element":"img","alt":"�hu = (�hum)dtm=1 denote the","inline":true,"padRight":true},{"text":"estimator of ","element":"span"},{"style":{"height":15.02},"width":45.14,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-7.png","element":"img","alt":" hu","inline":true},{"text":", which we assume obeys the conditions in Assumption ","element":"span"},{"href":"#id-94","text":"5.3. ","element":"a"},{"text":"The estimator ","element":"span"},{"style":{"height":15.02},"width":146.43,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-8.png","element":"img","alt":"�θu of θu","inline":true,"padRight":true},{"text":"is constructed as any approximate ","element":"span"},{"style":{"height":10.22},"width":38.71,"height":25.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-9.png","element":"img","alt":" ϵn","inline":true},{"text":"-solution in Θ","element":"span"},{"style":{"height":5.6},"width":20,"height":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-10.png","element":"img","alt":"u","inline":true,"padRight":true},{"text":"to a sample analog of the moment condition ","element":"span"},{"href":"#id-95","text":"(5.1)","element":"a"},{"text":", i.e.,","element":"span"}],[{"id":"id-105","style":{"width":"94%"},"width":1778,"height":78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-11.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Comment 5.1 ","element":"span"},{"text":"(Handling Over-identified Cases)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"We do not analyze over-identified cases explicitly, but it is helpful to note that they can be handled within the current framework. ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":17.6},"width":303.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-12.png","element":"img","alt":"ψou(Wu, θ, hou(Zu","inline":true},{"text":")) be the original over-identifying moment function. Let ","element":"span"},{"style":{"height":17.6},"width":121.09,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-13.png","element":"img","alt":" Au(Zu","inline":true},{"text":") denote the point- ","element":"span"},{"text":"wise optimal matrix of linear combinations of the moments, so that the final moment function ","element":"span"},{"style":{"height":17.6},"width":847.27,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-14.png","element":"img","alt":"ψu(Wu, θ, h(Zu)) = Au(Zu)ψou(Wu, θ, hou(Zu","inline":true},{"text":")) has the same dimension as ","element":"span"},{"style":{"height":17.6},"width":402.34,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-15.png","element":"img","alt":" θu. Here hu(Zu) =","inline":true,"padRight":true},{"text":"(vec(","element":"span"},{"style":{"height":17.6},"width":381.91,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-16.png","element":"img","alt":"Au(Zu))′, ho′u(Zu))′","inline":true},{"text":"; that is, we simply treat ","element":"span"},{"style":{"height":15.42},"width":52.73,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-17.png","element":"img","alt":" Au ","inline":true,"padRight":true},{"text":"as part of the nuisance function ","element":"span"},{"style":{"height":16.4},"width":168.56,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-18.png","element":"img","alt":" hu being","inline":true,"padRight":true},{"text":"estimated. We do not analyze the preliminary estimation of ","element":"span"},{"style":{"height":15.42},"width":52.73,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-19.png","element":"img","alt":" Au","inline":true,"padRight":true},{"text":"in the present paper in order to maintain the focus on exactly identified cases as in Section 4. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-20.png","element":"img","alt":"■","inline":true}],[{"text":"5.2. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"The Neyman Orthogonality or Immunization Condition. ","element":"span"},{"text":"A key condition needed for regular estimation of ","element":"span"},{"style":{"height":15.02},"width":40.49,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-21.png","element":"img","alt":" θu","inline":true,"padRight":true},{"text":"is an orthogonality or immunization condition. The simplest to explain, yet strongest, form of this condition can be expressed as follows:","element":"span"}],[{"id":"id-99","style":{"width":"70%"},"width":1320,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-22.png","element":"img"}],[{"text":"subject to additional technical conditions such as continuity ","element":"span"},{"href":"#id-96","text":"(5.6) ","element":"a"},{"text":"and dominance ","element":"span"},{"href":"#id-96","text":"(5.7) ","element":"a"},{"text":"stated below, where we use the symbol ","element":"span"},{"style":{"height":15.42},"width":35.17,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-23.png","element":"img","alt":" ∂t","inline":true,"padRight":true},{"text":"to abbreviate ","element":"span"},{"style":{"height":22.09},"width":43.96,"height":55.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-24.png","element":"img","alt":"∂∂t′","inline":true,"padRight":true},{"text":". This condition holds in the previous setting ","element":"span"},{"text":"of inference on treatment effects after interchanging the order of the derivative and expectation. The formulation here also covers certain non-smooth cases such as structural and instrumental quantile regression problems.","element":"span"}],[{"text":"In the formal development, we use a more general form of the orthogonality condition.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 5.1 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Neyman Orthogonality for Moment Condition Models, General Form","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"For each ","element":"span"},{"style":{"height":12.8},"width":113.77,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-25.png","element":"img","alt":" u ∈ U","inline":true},{"style":{"fontStyle":"italic"},"text":", suppose that ","element":"span"},{"href":"#id-95","style":{"fontStyle":"italic"},"text":"(5.1)","element":"a"},{"style":{"fontStyle":"italic"},"text":"–","element":"span"},{"href":"#id-97","style":{"fontStyle":"italic"},"text":"(5.3) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold. Consider ","element":"span"},{"style":{"height":14.62},"width":56.85,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-26.png","element":"img","alt":" Hu","inline":true},{"style":{"fontStyle":"italic"},"text":", a set of measurable functions ","element":"span"},{"style":{"height":9.6},"width":106.72,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-27.png","element":"img","alt":" z �−→","inline":true},{"style":{"height":20.24},"width":1585.68,"height":50.59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-28.png","element":"img","alt":"h(z) ∈ Tu(z) from Zu to Rdt such that ∥h(Zu) − hu(Zu)∥P,2 < ∞ for all h ∈ Hu","inline":true},{"style":{"fontStyle":"italic"},"text":". Suppose also that the set ","element":"span"},{"style":{"height":17.6},"width":103.29,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-29.png","element":"img","alt":" Tu(z)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a convex subset of ","element":"span"},{"style":{"height":19.13},"width":375.9,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-30.png","element":"img","alt":" Rdt for each z ∈ Zu","inline":true},{"style":{"fontStyle":"italic"},"text":". We say that ","element":"span"},{"style":{"height":16.4},"width":48.43,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/19-31.png","element":"img","alt":" ψu","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"obeys a general form of orthogonality with respect to ","element":"span"},{"style":{"height":14.62},"width":56.85,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-0.png","element":"img","alt":" Hu","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":12.8},"width":116.31,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-1.png","element":"img","alt":" u ∈ U","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"if the following conditions hold: For each ","element":"span"},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-2.png","element":"img","alt":"u ∈ U","inline":true},{"style":{"fontStyle":"italic"},"text":", the derivative","element":"span"}],[{"id":"id-96","style":{"width":"99%"},"width":1865,"height":287,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"and obeys the orthogonality condition:","element":"span"}],[{"id":"id-98","style":{"width":"86%"},"width":1627,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-4.png","element":"img"}],[{"text":"The orthogonality condition ","element":"span"},{"href":"#id-98","text":"(5.8) ","element":"a"},{"text":"reduces to ","element":"span"},{"href":"#id-99","text":"(5.5) ","element":"a"},{"text":"when ","element":"span"},{"style":{"height":14.62},"width":56.85,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-5.png","element":"img","alt":" Hu","inline":true,"padRight":true},{"text":"can span all measurable functions ","element":"span"},{"style":{"height":18.3},"width":685.61,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-6.png","element":"img","alt":"h : Zu �−→ Tu such that ∥h∥P,2 < ∞","inline":true,"padRight":true},{"text":"but is more general otherwise.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Comment 5.2 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"An alternative formulation of the Neyman orthogonality condition","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"A slightly more general, though less primitive definition of the orthogonality condition is as follows. For each ","element":"span"},{"style":{"height":12.8},"width":133.84,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-7.png","element":"img","alt":" u ∈ U","inline":true},{"text":", suppose that ","element":"span"},{"href":"#id-95","text":"(5.1)","element":"a"},{"text":"- ","element":"span"},{"href":"#id-97","text":"(5.3) ","element":"a"},{"text":"hold. ","element":"span"},{"text":"Consider ","element":"span"},{"style":{"height":14.62},"width":56.85,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-8.png","element":"img","alt":" Hu","inline":true},{"text":", a set of measurable functions ","element":"span"},{"style":{"height":20.24},"width":1872.6,"height":50.59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-9.png","element":"img","alt":"z �→ h(z) ∈ Tu(z) from Zu to Rdt such that ∥h(Zu) − hu(Zu)∥P,2 < ∞ for all h ∈ Hu, where","inline":true,"padRight":true},{"text":"the set ","element":"span"},{"style":{"height":17.6},"width":85.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-10.png","element":"img","alt":" Tu(z","inline":true},{"text":") is a convex subset of ","element":"span"},{"style":{"height":17.75},"width":381.96,"height":44.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-11.png","element":"img","alt":" Rdt for each z ∈ Zu","inline":true},{"text":". We say that ","element":"span"},{"style":{"height":16.4},"width":48.42,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-12.png","element":"img","alt":" ψu","inline":true,"padRight":true},{"text":"obeys a general form of orthogonality with respect to ","element":"span"},{"style":{"height":14.62},"width":56.85,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-13.png","element":"img","alt":" Hu","inline":true,"padRight":true},{"text":"uniformly in ","element":"span"},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-14.png","element":"img","alt":" u ∈ U","inline":true},{"text":", if the following conditions hold: The Gateaux derivative map","element":"span"}],[{"style":{"width":"66%"},"width":1240,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-15.png","element":"img"}],[{"text":"exists for all ","element":"span"},{"style":{"height":17.6},"width":546.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-16.png","element":"img","alt":" t ∈ [0, 1), h ∈ Hu, and u ∈ U","inline":true,"padRight":true},{"text":"and vanishes at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"= 0 – namely,","element":"span"}],[{"style":{"width":"66%"},"width":1246,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-17.png","element":"img"}],[{"text":"Definition 5.1 implies this definition by the mean-value expansion and the dominated convergence theorem. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-18.png","element":"img","alt":"■","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Comment 5.3 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Orthogonalization typically expands the number of nuisance parameters","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"It is important to use a moment function ","element":"span"},{"style":{"height":16.4},"width":48.42,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-19.png","element":"img","alt":" ψu","inline":true,"padRight":true},{"text":"that satisfies the orthogonality property given in ","element":"span"},{"href":"#id-98","text":"(5.8)","element":"a"},{"text":"; see examples given below. Generally, if we have a moment function ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":16.4},"width":48.42,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-20.png","element":"img","alt":"ψu","inline":true,"padRight":true},{"text":"which identifies ","element":"span"},{"style":{"height":15.02},"width":40.48,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-21.png","element":"img","alt":" θu","inline":true,"padRight":true},{"text":"but does not have this property, we can construct a moment function ","element":"span"},{"style":{"height":16.4},"width":48.42,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-22.png","element":"img","alt":" ψu","inline":true,"padRight":true},{"text":"that identifies ","element":"span"},{"style":{"height":15.02},"width":202.74,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-23.png","element":"img","alt":" θu and has","inline":true,"padRight":true},{"text":"the required orthogonality property by projecting the original function ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":16.4},"width":48.42,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-24.png","element":"img","alt":"ψu","inline":true,"padRight":true},{"text":"onto the orthocomplement of the tangent space for the original set of nuisance functions ","element":"span"},{"style":{"height":16.72},"width":45.14,"height":41.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-25.png","element":"img","alt":" hou","inline":true},{"text":"; see, for example, ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996)","element":"a"},{"text":", ","element":"span"},{"href":"#id-100","referenceIndex":98,"text":"van der Vaart ","element":"a"},{"href":"#id-100","referenceIndex":98,"text":"(1998, ","element":"a"},{"text":"Chap. 25), ","element":"span"},{"href":"#id-101","referenceIndex":72,"text":"Kosorok ","element":"a"},{"href":"#id-101","referenceIndex":72,"text":"(2008)","element":"a"},{"text":", ","element":"span"},{"href":"#id-49","referenceIndex":16,"text":"Belloni et al. ","element":"a"},{"href":"#id-49","referenceIndex":16,"text":"(2013b)","element":"a"},{"text":", and ","element":"span"},{"href":"#id-4","referenceIndex":15,"text":"Belloni et al. ","element":"a"},{"href":"#id-4","referenceIndex":15,"text":"(2014a)","element":"a"},{"text":". This projection creates the semi-parametrically efficient score function. There are other ways to create orthogonal nuisance functions, as illustrated by the second example below.","element":"span"}],[{"text":"Note that the projection typically depends on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", which gives rise to additional nuisance parameters ","element":"span"},{"style":{"height":16.72},"width":46.14,"height":41.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/20-26.png","element":"img","alt":" hnu","inline":true},{"text":", which are then incorporated together with the original nuisance parameters into the new ","element":"span"},{"text":"parameter ","element":"span"},{"style":{"height":19.13},"width":242.34,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-0.png","element":"img","alt":" hu = (h0u, hnu","inline":true},{"text":"). Note that this is a feature of all of the examples we consider. For ex- ","element":"span"},{"text":"ample, the orthogonal moment functions in the exogenous case of the treatment effects framework depend on both the regression function and the propensity score function. This point is clarified further by considering the classical linear model as demonstrated in the next remark. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-1.png","element":"img","alt":"■","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Example 1 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Neyman Orthogonal Equations for Linear Regression","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"To illustrate the orthogonality condition in the simplest possible setting, let us consider the linear model:","element":"span"}],[{"style":{"width":"84%"},"width":1581,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-2.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"is the treatment and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"are the controls of high dimension ","element":"span"},{"style":{"height":13.6},"width":115.83,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-3.png","element":"img","alt":" p ≫ n","inline":true},{"text":". Call the first equation the regression equation, and the second equation the propensity score equation. The orthogonal moment condition that identifies the projection coefficient ","element":"span"},{"style":{"height":15.02},"width":37.48,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-4.png","element":"img","alt":" θ0","inline":true,"padRight":true},{"text":"is the Frisch-Waugh-Lovell partialling out interpretation of ","element":"span"},{"style":{"height":15.02},"width":51.41,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-5.png","element":"img","alt":" θ0:","inline":true}],[{"style":{"width":"60%"},"width":1126,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-6.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"is the population residual left after projecting out the controls ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"from the outcome, i.e. ","element":"span"},{"style":{"height":15.6},"width":643.59,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-7.png","element":"img","alt":"Y = X′δ0 + U, EP UX = 0; and ν","inline":true,"padRight":true},{"text":"is the population residual left after projecting out controls from the treatment as defined in the propensity score equation. The high-dimensional nuisance function is ","element":"span"},{"style":{"height":17.6},"width":613.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-8.png","element":"img","alt":" h(Z) = (X′δ, X′π)′, for Z = X","inline":true},{"text":", with true value denoted by ","element":"span"},{"style":{"height":17.6},"width":633.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-9.png","element":"img","alt":" h0(Z) = (X′δ0, X′π0)′. Now the","inline":true,"padRight":true},{"text":"moment function","element":"span"}],[{"style":{"width":"77%"},"width":1453,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-10.png","element":"img"}],[{"text":"has the required orthogonality property ","element":"span"},{"href":"#id-98","text":"(5.8)","element":"a"},{"text":", since by the law of iterated expectations and some simple algebra:","element":"span"}],[{"style":{"width":"84%"},"width":1588,"height":177,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-11.png","element":"img"}],[{"text":"for ","element":"span"},{"style":{"height":17.6},"width":878.62,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-12.png","element":"img","alt":" a = δ − δ0 and b = π − π0. In fact, ψ(W, θ0, h0","inline":true},{"text":") is the semi-parametrically efficient score for ","element":"span"},{"style":{"height":15.02},"width":51.42,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-13.png","element":"img","alt":" θ0.","inline":true,"padRight":true},{"text":"The resulting estimator of ","element":"span"},{"style":{"height":15.02},"width":221.61,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-14.png","element":"img","alt":" θ0 is root-n","inline":true,"padRight":true},{"text":"consistent and asymptotically normal, uniformly within a class of approximately sparse models as follows from the general results of this section, and is also semi-parametrically efficient. See also ","element":"span"},{"href":"#id-4","referenceIndex":15,"text":"Belloni et al. ","element":"a"},{"href":"#id-4","referenceIndex":15,"text":"(2014a) ","element":"a"},{"text":"which deals with the partially linear model in detail and thus covers this linear example as a special case.","element":"span"}],[{"text":"Note that the orthogonal moment function contains two nuisance functions – the regression function and the propensity score – ","element":"span"},{"style":{"height":15.02},"width":281.3,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-15.png","element":"img","alt":" X′δ0 and X′π0","inline":true},{"text":". We could also identify ","element":"span"},{"style":{"height":15.02},"width":37.48,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-16.png","element":"img","alt":" θ0","inline":true,"padRight":true},{"text":"through non-orthogonal moment conditions containing single nuisance functions:","element":"span"}],[{"style":{"width":"69%"},"width":1309,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-17.png","element":"img"}],[{"text":"The first moment condition corresponds to the regression method, while the second to the so-called covariate balancing method. ","element":"span"},{"text":"Importantly, the use of these non-orthogonal moment conditions generally does not produce an estimator for ","element":"span"},{"style":{"height":17.6},"width":263.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-18.png","element":"img","alt":" θ0 that is √n","inline":true},{"text":"-consistent and asymptotically normal uniformly in the class of approximately sparse models. This failure occurs because we are forced to use highly non-regular estimators to estimate the nuisance functions ","element":"span"},{"style":{"height":16},"width":545.7,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/21-19.png","element":"img","alt":" X′δ0 and X′π0 in the p ≫ n","inline":true,"padRight":true},{"text":"setting. In fact, this failure would also occur with a low number of controls, including having only ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"= 1, whenever selection procedures that exclude irrelevant variables with very high probability are used to estimate the regression parameter ","element":"span"},{"style":{"height":15.02},"width":36.4,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-0.png","element":"img","alt":" δ0","inline":true,"padRight":true},{"text":"or the propensity score parameter ","element":"span"},{"style":{"height":14.62},"width":254.28,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-1.png","element":"img","alt":" π0. For more","inline":true,"padRight":true},{"text":"discussion and documentation of this failure, see Leeb and P¨otscher ","element":"span"},{"href":"#id-0","referenceIndex":73,"text":"(2008a; ","element":"a"},{"href":"#id-1","referenceIndex":74,"text":"2008b)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-2","referenceIndex":87,"text":"P¨otscher ","element":"a"},{"href":"#id-2","referenceIndex":87,"text":"(2009)","element":"a"},{"text":"; and Belloni, Chernozhukov, and Hansen ","element":"span"},{"href":"#id-3","referenceIndex":14,"text":"(2013a; ","element":"a"},{"href":"#id-4","referenceIndex":15,"text":"2014a)","element":"a"},{"text":". ","element":"span"},{"text":"By contrast, constructing orthogonal moment conditions – involving the projection of both the outcome and the treatment onto the controls and thereby combining the regression and covariate balancing methods – makes it possible to achieve ","element":"span"},{"style":{"height":17.6},"width":62.37,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-2.png","element":"img","alt":"√n","inline":true,"padRight":true},{"text":"consistency and asymptotic normality uniformly within a class of approximately sparse models. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-3.png","element":"img","alt":"■","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Example 2 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Neyman Orthogonal Equations for a Class of Conditional Moment Problems","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"Next, consider the conditional moment restrictions framework studied by ","element":"span"},{"href":"#id-102","referenceIndex":29,"text":"Chamberlain ","element":"a"},{"href":"#id-102","referenceIndex":29,"text":"(1992)","element":"a"},{"text":":","element":"span"}],[{"style":{"width":"28%"},"width":540,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-4.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"are random vectors with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"being a sub-vector of ","element":"span"},{"style":{"height":18.33},"width":322.73,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-5.png","element":"img","alt":" W, θ ∈ Θ ⊂ Rd ","inline":true,"padRight":true},{"text":"is a finite-dimensional parameter whose true value ","element":"span"},{"style":{"height":15.02},"width":37.49,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-6.png","element":"img","alt":" θ0","inline":true,"padRight":true},{"text":"is of interest, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is a functional nuisance parameter mapping the support of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"into a convex set ","element":"span"},{"style":{"height":15.93},"width":147.89,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-7.png","element":"img","alt":" V ⊂ Rl ","inline":true,"padRight":true},{"text":"whose true value is ","element":"span"},{"style":{"height":16.4},"width":189.06,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-8.png","element":"img","alt":" g0, and ϕ","inline":true,"padRight":true},{"text":"is a known function with values in ","element":"span"},{"style":{"height":17.53},"width":317.53,"height":43.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-9.png","element":"img","alt":" Rk for k ⩾ d + l.","inline":true}],[{"text":"Here we would like to build a score function (","element":"span"},{"style":{"height":17.6},"width":341.78,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-10.png","element":"img","alt":"θ, h) �→ ψ(W, θ, h","inline":true},{"text":") for estimating ","element":"span"},{"style":{"height":15.6},"width":231.95,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-11.png","element":"img","alt":" θ0, the true","inline":true,"padRight":true},{"text":"value of parameter ","element":"span"},{"style":{"height":15.6},"width":211.59,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-12.png","element":"img","alt":" θ, where h","inline":true,"padRight":true},{"text":"is a new nuisance parameter with true value ","element":"span"},{"style":{"height":15.02},"width":42.14,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-13.png","element":"img","alt":" h0","inline":true,"padRight":true},{"text":"that obeys the strong form of the orthogonality condition ","element":"span"},{"href":"#id-99","text":"(5.5) ","element":"a"},{"text":"and thus also its weak form ","element":"span"},{"href":"#id-98","text":"(5.8)","element":"a"},{"text":". To this end, let ","element":"span"},{"style":{"height":17.6},"width":424.67,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-14.png","element":"img","alt":"t �→ EP [ϕ(W, θ0, t) | X","inline":true},{"text":"] be a function mapping ","element":"span"},{"style":{"height":19.53},"width":972.72,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-15.png","element":"img","alt":" Rl into Rk and let γ(X, θ0, g0) = ∂t′EP [ϕ(W, θ0, t) |","inline":true},{"style":{"height":19.95},"width":360.6,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-16.png","element":"img","alt":"X]|t=g0(X) be a k×l","inline":true,"padRight":true},{"text":"matrix of its derivatives. We will set ","element":"span"},{"style":{"height":17.6},"width":823.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-17.png","element":"img","alt":" Z = X and h(X) = vec(g(X), β(X), Σ(X)),","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"height":16.4},"width":26,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-18.png","element":"img","alt":" β","inline":true,"padRight":true},{"text":"is a function mapping the support of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"into the space of ","element":"span"},{"style":{"height":18.33},"width":590.4,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-19.png","element":"img","alt":" d × k matrices, Rd×k, and Σ is","inline":true,"padRight":true},{"text":"the function mapping the support of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"into the space of ","element":"span"},{"style":{"height":18.33},"width":407.8,"height":45.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-20.png","element":"img","alt":" k × k matrices, Rk×k","inline":true},{"text":". Define the true value ","element":"span"},{"style":{"height":16.4},"width":188.2,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-21.png","element":"img","alt":" β0 of β as","inline":true}],[{"style":{"width":"28%"},"width":525,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-22.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":301.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-23.png","element":"img","alt":" A(X) is a d × k","inline":true,"padRight":true},{"text":"matrix of measurable transformations of ","element":"span"},{"style":{"height":15.6},"width":324.81,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-24.png","element":"img","alt":" X, I is the k × k","inline":true,"padRight":true},{"text":"identity matrix, and Π","element":"span"},{"style":{"height":17.6},"width":430.33,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-25.png","element":"img","alt":"0(X) ̸= Ik×k is a k × k","inline":true,"padRight":true},{"text":"non-identity matrix with the property:","element":"span"}],[{"style":{"width":"73%"},"width":1372,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-26.png","element":"img"}],[{"text":"where Σ","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-27.png","element":"img","alt":"0","inline":true,"padRight":true},{"text":"is the true value of parameter Σ. For example, Π","element":"span"},{"style":{"height":17.6},"width":73.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-28.png","element":"img","alt":"0(X","inline":true},{"text":") can be chosen to be the idempotent matrix:","element":"span"}],[{"style":{"width":"80%"},"width":1508,"height":63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-29.png","element":"img"}],[{"text":"Then an orthogonal score for the problem above can be constructed as","element":"span"}],[{"style":{"width":"79%"},"width":1496,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-30.png","element":"img"}],[{"text":"It is straightforward to check that under mild regularity conditions that the score function ","element":"span"},{"style":{"height":16.4},"width":186.24,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-31.png","element":"img","alt":" ψ satisfies","inline":true,"padRight":true},{"text":"E","element":"span"},{"style":{"height":17.6},"width":1158.65,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-32.png","element":"img","alt":"P [ψ(W, θ0, h0(X))] = 0 for h0(X) = vec(g0(X), β0(X), Σ0(X","inline":true},{"text":")) and also obeys the orthogonality condition:","element":"span"}],[{"style":{"width":"68%"},"width":1278,"height":84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/22-33.png","element":"img"}],[{"text":"Furthermore, by setting","element":"span"}],[{"style":{"width":"95%"},"width":1790,"height":85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-0.png","element":"img"}],[{"text":"and using Π","element":"span"},{"style":{"height":17.6},"width":73.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-1.png","element":"img","alt":"0(X","inline":true},{"text":") suggested above, we obtain the efficient score ","element":"span"},{"style":{"height":16.4},"width":29,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-2.png","element":"img","alt":" ψ","inline":true,"padRight":true},{"text":"that yields an estimator of ","element":"span"},{"style":{"height":15.02},"width":37.49,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-3.png","element":"img","alt":" θ0","inline":true,"padRight":true},{"text":"achieving the semi-parametric efficiency bound provided in ","element":"span"},{"href":"#id-102","referenceIndex":29,"text":"Chamberlain ","element":"a"},{"href":"#id-102","referenceIndex":29,"text":"(1992)","element":"a"},{"text":".","element":"span"}],[{"text":"Here we would like to note that an analogous, though more involved, construction can be provided for the more general class of problems considered in ","element":"span"},{"href":"#id-32","referenceIndex":3,"text":"Ai and Chen ","element":"a"},{"href":"#id-32","referenceIndex":3,"text":"(2003) ","element":"a"},{"text":"where the nuisance functions depend on the endogenous variables. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-4.png","element":"img","alt":"■","inline":true}],[{"text":"5.3. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Regularity Conditions and Results. ","element":"span"},{"text":"In what follows, we shall denote by ","element":"span"},{"style":{"height":15.6},"width":283.6,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-5.png","element":"img","alt":" δ, c0, c, and C","inline":true,"padRight":true},{"text":"some positive constants. For a positive integer ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":", [","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"] denotes the set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . , d","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"We shall impose the following regularity conditions.","element":"span"}],[{"id":"id-104","style":{"fontWeight":"bold"},"text":"Assumption 5.1 ","element":"span"},{"text":"(Moment condition problem)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider a random element ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W","element":"span"},{"style":{"fontStyle":"italic"},"text":", taking values in a measure space ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":156.19,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-6.png","element":"img","alt":"W, AW)","inline":true},{"style":{"fontStyle":"italic"},"text":", with law determined by a probability measure ","element":"span"},{"style":{"height":14.62},"width":149.49,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-7.png","element":"img","alt":" P ∈ Pn","inline":true},{"style":{"fontStyle":"italic"},"text":". The observed data ","element":"span"},{"text":"((","element":"span"},{"style":{"height":18.09},"width":485.02,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-8.png","element":"img","alt":"Wui)u∈U)ni=1 consist of n","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"i.i.d. copies of a random element ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":146.97,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-9.png","element":"img","alt":"Wu)u∈U","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"which is generated ","element":"span"},{"style":{"fontStyle":"italic"},"text":"as a suitably measurable transformation with respect to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":". Uniformly for all ","element":"span"},{"style":{"height":15.02},"width":227.86,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-10.png","element":"img","alt":" n ⩾ n0 and","inline":true},{"style":{"height":14.62},"width":139.3,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-11.png","element":"img","alt":"P ∈ Pn","inline":true},{"style":{"fontStyle":"italic"},"text":", the following conditions hold: (i) The true parameter value ","element":"span"},{"style":{"height":15.02},"width":40.48,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-12.png","element":"img","alt":" θu","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"obeys ","element":"span"},{"href":"#id-95","style":{"fontStyle":"italic"},"text":"(5.1) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and is interior relative to ","element":"span"},{"text":"Θ","element":"span"},{"style":{"height":17.75},"width":254.77,"height":44.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-13.png","element":"img","alt":"u ⊂ Θ ⊂ Rdθ","inline":true},{"style":{"fontStyle":"italic"},"text":", namely there is a ball of radius ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-14.png","element":"img","alt":" δ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"centered at ","element":"span"},{"style":{"height":15.02},"width":40.48,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-15.png","element":"img","alt":" θu","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"contained in ","element":"span"},{"text":"Θ","element":"span"},{"style":{"height":16.4},"width":97.66,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-16.png","element":"img","alt":"u for","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"all ","element":"span"},{"style":{"height":15.6},"width":269.82,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-17.png","element":"img","alt":" u ∈ U, and Θ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is compact. (ii) For ","element":"span"},{"style":{"height":22.7},"width":1115.8,"height":56.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-18.png","element":"img","alt":" ν := (νk)dθ+dtk=1 = (θ, t), each j ∈ [dθ] and u ∈ U, the map","inline":true,"padRight":true},{"text":"Θ","element":"span"},{"style":{"height":18.22},"width":749.5,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-19.png","element":"img","alt":"u × Tu(Zu) ∋ ν �−→ EP [ψuj(Wu, ν)|Zu]","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is twice continuously differentiable a.s. with derivatives obeying the integrability conditions specified in Assumption ","element":"span"},{"href":"#id-103","style":{"fontStyle":"italic"},"text":"5.2. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"(iii) For all ","element":"span"},{"style":{"height":15.2},"width":130.76,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-20.png","element":"img","alt":" u ∈ U,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"the moment function ","element":"span"},{"style":{"height":16.4},"width":48.42,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-21.png","element":"img","alt":" ψu","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"obeys the orthogonality condition given in Definition 5.1 for the set ","element":"span"},{"style":{"height":16.4},"width":365.86,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-22.png","element":"img","alt":" Hu = Hun specified","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"in Assumption ","element":"span"},{"href":"#id-94","style":{"fontStyle":"italic"},"text":"5.3. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"(iv) The following identifiability condition holds: ","element":"span"},{"style":{"height":17.6},"width":515.5,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-23.png","element":"img","alt":" ∥EP [ψu(Wu, θ, hu(Zu))]∥ ⩾","inline":true,"padRight":true},{"text":"2","element":"span"},{"style":{"height":19.13},"width":714.86,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-24.png","element":"img","alt":"−1(∥Ju(θ − θu)∥ ∧ c0) for all θ ∈ Θu,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"where the singular values of ","element":"span"},{"style":{"height":17.6},"width":572.97,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-25.png","element":"img","alt":" Ju := ∂θE[ψu(Wu, θu, hu(Zu))]","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"lie between ","element":"span"},{"style":{"height":16.4},"width":424.26,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-26.png","element":"img","alt":" c and C for all u ∈ U.","inline":true}],[{"text":"The conditions of Assumption ","element":"span"},{"href":"#id-104","text":"5.1 ","element":"a"},{"text":"are mild and standard in moment condition problems. Assumption ","element":"span"},{"href":"#id-104","text":"5.1(","element":"a"},{"text":"iv) encodes sufficient global and local identifiability to obtain a rate result. The suitably measurable condition, defined in Appendix ","element":"span"},{"href":"#id-55","text":"A, ","element":"a"},{"text":"is a mild condition satisfied in most practical cases.","element":"span"}],[{"id":"id-103","style":{"fontWeight":"bold"},"text":"Assumption 5.2 ","element":"span"},{"text":"(Entropy and smoothness)","element":"span"},{"style":{"height":17.6},"width":335.86,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-27.png","element":"img","alt":". The set (U, dU)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a semi-metric space such that ","element":"span"},{"text":"log ","element":"span"},{"style":{"height":17.6},"width":1260,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-28.png","element":"img","alt":" N(ϵ, U, dU) ⩽ C log(e/ϵ) ∨ 0. Let α ∈ [1, 2], and let α1 and α2","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be some positive constants. Uniformly for all ","element":"span"},{"style":{"height":15.02},"width":412.93,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-29.png","element":"img","alt":" n ⩾ n0 and P ∈ Pn","inline":true},{"style":{"fontStyle":"italic"},"text":", the following conditions hold: (i) The set of functions ","element":"span"},{"style":{"height":18.22},"width":837.6,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-30.png","element":"img","alt":"F0 = {ψuj(Wu, θu, hu(Zu)) : j ∈ [dθ], u ∈ U}","inline":true},{"style":{"fontStyle":"italic"},"text":", viewed as functions of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is suitably measurable; has an envelope function ","element":"span"},{"style":{"height":20.59},"width":890,"height":51.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-31.png","element":"img","alt":" F0(W) = supj∈[dθ],u∈U,ν∈Θu×Tu(Zu) |ψuj(Wu, ν)|","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"that is measurable with respect to ","element":"span"},{"style":{"height":18.3},"width":749.97,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-32.png","element":"img","alt":" W and obeys ∥F0∥P,q ⩽ C, where q ⩾ 4","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a fixed constant; and has a uniform covering entropy obeying ","element":"span"},{"text":"sup","element":"span"},{"style":{"height":19.79},"width":886.94,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-33.png","element":"img","alt":"Q log N(ϵ∥F0∥Q,2, F0, ∥ · ∥Q,2) ⩽ C log(e/ϵ) ∨ 0.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"(ii) For all ","element":"span"},{"style":{"height":17.6},"width":526.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-34.png","element":"img","alt":" j ∈ [dθ] and k, r ∈ [dθ + dt],","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":18.22},"width":605.35,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-35.png","element":"img","alt":" ψuj(W) := ψuj(Wu, θu, hu(Zu)),","inline":true}],[{"style":{"width":"87%"},"width":1646,"height":179,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/23-36.png","element":"img"}],[{"style":{"width":"62%"},"width":1173,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-0.png","element":"img"}],[{"text":"Assumption ","element":"span"},{"href":"#id-103","text":"5.2 ","element":"a"},{"text":"imposes smoothness and integrability conditions on various quantities derived from ","element":"span"},{"style":{"height":16.4},"width":48.42,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-1.png","element":"img","alt":" ψu","inline":true},{"text":". It also imposes conditions on the complexity of the relevant function classes.","element":"span"}],[{"text":"In what follows, let ∆","element":"span"},{"style":{"height":16.8},"width":506.5,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-2.png","element":"img","alt":"n ↘ 0, δn ↘ 0, and τn ↘","inline":true,"padRight":true},{"text":"0 be sequences of constants approaching zero from above at a speed at most polynomial in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"(for example, ","element":"span"},{"style":{"height":17.6},"width":505.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-3.png","element":"img","alt":" δn ⩾ 1/nc for some c > 0).","inline":true}],[{"id":"id-94","style":{"fontWeight":"bold"},"text":"Assumption 5.3 ","element":"span"},{"text":"(Estimation of nuisance functions)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The following conditions hold for each ","element":"span"},{"style":{"height":13.82},"width":127.57,"height":34.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-4.png","element":"img","alt":" n ⩾ n0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and all ","element":"span"},{"style":{"height":14.62},"width":143.82,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-5.png","element":"img","alt":" P ∈ Pn","inline":true},{"style":{"fontStyle":"italic"},"text":". The estimated functions ","element":"span"},{"style":{"height":21.4},"width":432.49,"height":53.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-6.png","element":"img","alt":"�hu = (�hum)dtm=1 ∈ Hun","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with probability at least ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":16},"width":116.56,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-7.png","element":"img","alt":" − ∆n,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":14.62},"width":77.46,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-8.png","element":"img","alt":" Hun","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the set of measurable maps ","element":"span"},{"style":{"height":21.4},"width":876.63,"height":53.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-9.png","element":"img","alt":" Zu ∋ z �−→ h = (hm)dtm=1(z) ∈ Tu(z) such that","inline":true}],[{"style":{"width":"34%"},"width":655,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-10.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"and whose complexity does not grow too quickly in the sense that ","element":"span"},{"style":{"height":18.22},"width":587.15,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-11.png","element":"img","alt":" F1 = {ψuj(Wu, θ, h(Zu)) : j ∈","inline":true,"padRight":true},{"text":"[","element":"span"},{"style":{"height":17.6},"width":531.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-12.png","element":"img","alt":"dθ], u ∈ U, θ ∈ Θu, h ∈ Hun}","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is suitably measurable and its uniform covering entropy obeys","element":"span"}],[{"style":{"width":"52%"},"width":982,"height":77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-13.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":17.6},"width":128.23,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-14.png","element":"img","alt":" F1(W)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is an envelope for ","element":"span"},{"style":{"height":15.02},"width":48.36,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-15.png","element":"img","alt":" F1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"which is measurable with respect to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and satisfies ","element":"span"},{"style":{"height":17.6},"width":176.25,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-16.png","element":"img","alt":" F1(W) ⩽","inline":true},{"style":{"height":17.6},"width":252.27,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-17.png","element":"img","alt":"F0(W) for F0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"defined in Assumption ","element":"span"},{"href":"#id-103","style":{"fontStyle":"italic"},"text":"5.2. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"The complexity characteristics ","element":"span"},{"style":{"height":17.6},"width":501.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-18.png","element":"img","alt":" an ⩾ max(n, e) and sn ⩾ 1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"obey the growth conditions:","element":"span"}],[{"style":{"height":31.78},"width":1855.02,"height":79.45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-19.png","element":"img","alt":"n−1/2 ��sn log(an) + n−1/2snn1q log(an)�⩽ τn and τ α/2n �sn log(an) + snn1q − 12 log(an) log n ⩽ δn,","inline":true}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":16},"width":149.41,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-20.png","element":"img","alt":" q and α","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are defined in Assumption ","element":"span"},{"href":"#id-103","style":{"fontStyle":"italic"},"text":"5.2.","element":"a"}],[{"style":{"fontWeight":"bold"},"text":"Comment 5.4 ","element":"span"},{"text":"(On Rate and Entropy Rate Conditions)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"Assumption ","element":"span"},{"href":"#id-94","text":"5.3 ","element":"a"},{"text":"imposes conditions on the estimation rate of the nuisance functions ","element":"span"},{"style":{"height":15.02},"width":74.75,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-21.png","element":"img","alt":" hum","inline":true,"padRight":true},{"text":"and on the complexity of the functions sets that contain the estimators ","element":"span"},{"style":{"height":15.02},"width":75.51,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-22.png","element":"img","alt":"�hum","inline":true},{"text":". This condition allows for a wide variety of modern modeling assumptions and regularization methods for function fitting, including both traditional methods and more recent statistical and machine learning methods. Within the approximately sparse framework, the index ","element":"span"},{"style":{"height":10.62},"width":41.46,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-23.png","element":"img","alt":" sn","inline":true,"padRight":true},{"text":"corresponds to the maximum of the dimension of the approximating models and of the size of the selected models; and ","element":"span"},{"style":{"height":14},"width":205.01,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-24.png","element":"img","alt":" an = p ∨ n","inline":true},{"text":". Under other frameworks, these parameters could be different; yet if they are well-behaved, then our results still apply. Thus, these results cover other frameworks, where structured assumptions other than approximate sparsity are used to make the estimation and modeling problem manageable. ","element":"span"},{"text":"It is important to point out that the class ","element":"span"},{"style":{"height":15.02},"width":48.36,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-25.png","element":"img","alt":"F1","inline":true,"padRight":true},{"text":"generally will not be Donsker because its entropy is allowed to increase with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". Allowing for non-Donsker classes is crucial for accommodating modern, high-dimensional estimation methods for the nuisance functions. This feature makes the conditions imposed here very different from the conditions imposed in various classical references on dealing with nonparametrically estimated nuisance functions; see, for example, ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996)","element":"a"},{"text":", ","element":"span"},{"href":"#id-100","referenceIndex":98,"text":"van der Vaart ","element":"a"},{"href":"#id-100","referenceIndex":98,"text":"(1998)","element":"a"},{"text":", ","element":"span"},{"href":"#id-101","referenceIndex":72,"text":"Kosorok ","element":"a"},{"href":"#id-101","referenceIndex":72,"text":"(2008)","element":"a"},{"text":", and other references listed in the introduction.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Comment 5.5 ","element":"span"},{"text":"(Removing Entropy Rate Conditions by Sample-Splitting)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"We can can set ","element":"span"},{"style":{"height":14.22},"width":123.2,"height":35.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-26.png","element":"img","alt":" sn = 1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":10.62},"width":123.8,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/24-27.png","element":"img","alt":" an = e","inline":true,"padRight":true},{"text":"in Assumption 5.3 if we employ data-splitting. That is, under data-splitting the entropy condition becomes very weak, akin to that in parametric problems, facilitating the application of modern statistical and machine learning methods (e.g. random forest, boosted trees, deep neural nets, and their aggregated and hybrid versions) to estimate the nuisance functions. Thus, with data-splitting Assumption 5.3 only requires that the estimators of nuisance parameters attain sufficiently rapid rates of convergences ","element":"span"},{"style":{"height":10.22},"width":40.08,"height":25.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/25-0.png","element":"img","alt":" τn","inline":true},{"text":", in particular ","element":"span"},{"style":{"height":20.33},"width":241.34,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/25-1.png","element":"img","alt":" τn = o(n−1/4","inline":true},{"text":") in smooth problems. Of course in practice we can not verify that these rates hold in a given problem, but the regularity conditions become ","element":"span"},{"style":{"fontStyle":"italic"},"text":"more plausible ","element":"span"},{"text":"with data-splitting than without it. ","element":"span"},{"href":"#id-14","referenceIndex":22,"text":"Bickel ","element":"a"},{"href":"#id-14","referenceIndex":22,"text":"(1982) ","element":"a"},{"text":"employs the idea of data-splitting, namely setting aside a vanishing fraction of the sample to estimate the nuisance parameter, to set up adaptive estimators of the main parameter; see also ","element":"span"},{"href":"#id-100","referenceIndex":98,"text":"van der Vaart ","element":"a"},{"href":"#id-100","referenceIndex":98,"text":"(1998)","element":"a"},{"text":". This ensures that there is no asymptotic efficiency loss from data-splitting. ","element":"span"},{"text":"Another method, which seems more practical, is to use the following cross-fitting approach: (1) split the sample into two equal parts, the auxiliary and main parts; (2) use the auxiliary part to estimate the nuisance parameter and the main part to estimate the target parameter, obtaining one estimator of the target parameter; (3) by reversing the roles of the main and auxiliary parts, obtain another estimator of the target parameter; and (4) average the two estimators of the target parameter to obtain the final estimator. Theorems 5.1 given below yields the properties of the final estimator. We refer to ","element":"span"},{"href":"#id-15","referenceIndex":33,"text":"Chernozhukov ","element":"a"},{"href":"#id-15","referenceIndex":33,"text":"et al. ","element":"a"},{"href":"#id-15","referenceIndex":33,"text":"(2016) ","element":"a"},{"text":"for further details, including the result that there is no asymptotic efficiency loss from data-splitting under cross-fitting.","element":"span"}],[{"text":"The following theorem is one of the main results of the paper:","element":"span"}],[{"id":"id-141","style":{"fontWeight":"bold"},"text":"Theorem 5.1 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Uniform Functional Central Limit Theorem for a Continuum of Target Parameters in Moment Condition Problems","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under Assumptions ","element":"span"},{"href":"#id-104","style":{"fontStyle":"italic"},"text":"5.1, ","element":"a"},{"href":"#id-103","style":{"fontStyle":"italic"},"text":"5.2, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-94","style":{"fontStyle":"italic"},"text":"5.3, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"for an ","element":"span"},{"href":"#id-105","style":{"height":17.6},"width":839.45,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/25-2.png","element":"img","alt":"estimator (�θu)u∈U that obeys equation (5.4),","inline":true}],[{"style":{"width":"38%"},"width":721,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/25-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"in ","element":"span"},{"style":{"height":19.53},"width":167.93,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/25-4.png","element":"img","alt":" ℓ∞(U)dθ,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":19.41},"width":1062.26,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/25-5.png","element":"img","alt":" P ∈ Pn, where ¯ψu(W) := −J−1u ψu(Wu, θu, hu(Zu)), and","inline":true}],[{"style":{"width":"76%"},"width":1439,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/25-6.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where the paths of ","element":"span"},{"style":{"height":19.01},"width":230.06,"height":47.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/25-7.png","element":"img","alt":" u �−→ GP ¯ψu","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are a.s. uniformly continuous on ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":204.47,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/25-8.png","element":"img","alt":"U, dU) and","inline":true}],[{"style":{"width":"75%"},"width":1416,"height":86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/25-9.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Comment 5.6. ","element":"span"},{"text":"It is important to mention here that this result on a continuum of parameters solving a continuum of moment conditions is completely new. The prior approaches dealing with continua of moment conditions with infinite-dimensional nuisance parameters, for example, the ones given in ","element":"span"},{"href":"#id-17","referenceIndex":39,"text":"Chernozhukov and Hansen ","element":"a"},{"href":"#id-17","referenceIndex":39,"text":"(2006) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-20","referenceIndex":46,"text":"Escanciano and Zhu ","element":"a"},{"href":"#id-20","referenceIndex":46,"text":"(2013)","element":"a"},{"text":", impose Donsker conditions on the class of functions, following ","element":"span"},{"href":"#id-106","referenceIndex":5,"text":"Andrews ","element":"a"},{"href":"#id-106","referenceIndex":5,"text":"(1994a)","element":"a"},{"text":", that contain the values of the estimators of these nuisance functions. ","element":"span"},{"text":"This approach is precluded in our setting because the resulting class of functions in our case has entropy that grows with the sample size and therefore is not Donsker. Hence, we develop a new approach to establishing the results which exploits the interplay between the rate of growth of entropy, the biases, and the size of the estimation error. In addition, the new approach allows for obtaining results that are uniform in ","element":"span"},{"style":{"height":12},"width":412.48,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/25-10.png","element":"img","alt":" P. ■","inline":true}],[{"text":"We can estimate the law of ","element":"span"},{"style":{"height":14.7},"width":55.79,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-0.png","element":"img","alt":" ZP","inline":true,"padRight":true},{"text":"with the bootstrap law of","element":"span"}],[{"id":"id-107","style":{"width":"99%"},"width":1868,"height":301,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-1.png","element":"img"}],[{"text":"and ","element":"span"},{"style":{"height":14.62},"width":44.2,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-2.png","element":"img","alt":"�Ju","inline":true,"padRight":true},{"text":"is a suitable estimator of ","element":"span"},{"style":{"height":17.78},"width":91.85,"height":44.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-3.png","element":"img","alt":" Ju.18 ","inline":true,"padRight":true},{"text":"The bootstrap law is computed by drawing (","element":"span"},{"style":{"height":18.09},"width":104.45,"height":45.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-4.png","element":"img","alt":"ξi)ni=1 ","inline":true,"padRight":true},{"text":"conditional ","element":"span"},{"text":"on the data.","element":"span"}],[{"text":"The following theorem shows that the multiplier bootstrap provides a valid approximation to the large sample law of ","element":"span"},{"style":{"height":17.77},"width":314.52,"height":44.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-5.png","element":"img","alt":"√n(�θu − θu)u∈U.","inline":true}],[{"id":"id-145","style":{"fontWeight":"bold"},"text":"Theorem 5.2 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Uniform Validity of Multiplier Bootstrap","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose Assumptions ","element":"span"},{"href":"#id-104","style":{"fontStyle":"italic"},"text":"5.1, ","element":"a"},{"href":"#id-103","style":{"fontStyle":"italic"},"text":"5.2, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-94","style":{"fontStyle":"italic"},"text":"5.3 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold, the estimator ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":126.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-6.png","element":"img","alt":"�θu)u∈U","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"obeys equation ","element":"span"},{"href":"#id-105","style":{"fontStyle":"italic"},"text":"(5.4)","element":"a"},{"style":{"fontStyle":"italic"},"text":", and that the estimator ","element":"span"},{"text":"( ","element":"span"},{"style":{"height":17.6},"width":318.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-7.png","element":"img","alt":"�Ju)u∈U obeys the","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"following condition: uniformly in ","element":"span"},{"style":{"height":14.62},"width":138.76,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-8.png","element":"img","alt":" P ∈ Pn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with probability ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":18.19},"width":700.59,"height":45.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-9.png","element":"img","alt":" − δn, supu∈U ∥ �Ju − Ju∥ ⩽ ∆n. Then,","inline":true}],[{"style":{"width":"48%"},"width":903,"height":59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-10.png","element":"img"}],[{"text":"We next derive the large sample distribution and validity of the multiplier bootstrap for the estimator ","element":"span"},{"style":{"height":19.14},"width":1264.03,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-11.png","element":"img","alt":"�∆ := φ(�θ) := φ((�θu)u∈U) of the functional ∆ := φ(θ0) = φ((θu)u∈U","inline":true},{"text":") using the functional delta method. ","element":"span"},{"text":"The functional ","element":"span"},{"style":{"height":19.13},"width":236.68,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-12.png","element":"img","alt":" θ0 �−→ φ(θ0","inline":true},{"text":") is defined as a uniformly Hadamard differentiable transform of ","element":"span"},{"style":{"height":19.13},"width":255.17,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-13.png","element":"img","alt":" θ0 = (θu)u∈U","inline":true},{"text":". The following result gives the large sample law of ","element":"span"},{"style":{"height":17.77},"width":320.63,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-14.png","element":"img","alt":"√n(�∆ − ∆), the","inline":true,"padRight":true},{"text":"properly normalized estimator. It also shows that the bootstrap law of ","element":"span"},{"style":{"height":17.77},"width":462.25,"height":44.42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-15.png","element":"img","alt":"√n(�∆∗ − �∆), computed","inline":true,"padRight":true},{"text":"conditionally on the data, is consistent for the large sample law of ","element":"span"},{"style":{"height":17.77},"width":618.62,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-16.png","element":"img","alt":"√n(�∆−∆). Here �∆∗ := φ(�θ∗) =","inline":true},{"style":{"height":17.6},"width":184.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-17.png","element":"img","alt":"φ((�θ∗)u∈U","inline":true},{"text":") is the bootstrap version of ","element":"span"},{"style":{"height":19.9},"width":735.43,"height":49.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-18.png","element":"img","alt":"�∆, and �θ∗u = �θu + n−1 �ni=1 ξi �ψu(Wi","inline":true},{"text":") is the multiplier ","element":"span"},{"text":"bootstrap version of ","element":"span"},{"style":{"height":15.02},"width":40.48,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-19.png","element":"img","alt":"�θu","inline":true,"padRight":true},{"text":"defined via equation ","element":"span"},{"href":"#id-107","text":"(5.16)","element":"a"},{"text":".","element":"span"}],[{"id":"id-93","style":{"fontWeight":"bold"},"text":"Theorem 5.3 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Uniform Limit Theory and Validity of Multiplier Bootstrap for Smooth ","element":"span"},{"style":{"height":17.6},"width":374.65,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-20.png","element":"img","alt":"Functionals of θ).","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"Suppose that for each ","element":"span"},{"style":{"height":19.91},"width":531.82,"height":49.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-21.png","element":"img","alt":" P ∈ P := ∪n⩾n0Pn, θ0 = θ0P ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is an element of a compact ","element":"span"},{"style":{"fontStyle":"italic"},"text":"set ","element":"span"},{"style":{"height":17.6},"width":452.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-22.png","element":"img","alt":" Dθ. Suppose θ �−→ φ(θ)","inline":true},{"style":{"fontStyle":"italic"},"text":", a functional of interest mapping ","element":"span"},{"style":{"height":20.38},"width":692.55,"height":50.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-23.png","element":"img","alt":" Dφ ⊂ D = ℓ∞(U)dθ to ℓ∞(Q), where","inline":true},{"style":{"height":17.24},"width":160.02,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-24.png","element":"img","alt":"Dθ ⊂ Dφ","inline":true},{"style":{"fontStyle":"italic"},"text":", is Hadamard differentiable in ","element":"span"},{"style":{"height":12.8},"width":21,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-25.png","element":"img","alt":" θ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"tangentially to ","element":"span"},{"style":{"height":19.53},"width":274.58,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-26.png","element":"img","alt":" D0 = UC(U)dθ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":15.6},"width":235.24,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-27.png","element":"img","alt":" θ ∈ Dθ, with","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"the linear derivative map ","element":"span"},{"style":{"height":17.71},"width":271.84,"height":44.28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-28.png","element":"img","alt":" φ′θ : D0 �−→ D","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that the mapping ","element":"span"},{"text":"(","element":"span"},{"style":{"height":18.51},"width":622.76,"height":46.28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-29.png","element":"img","alt":"θ, h) �−→ φ′θ(h) from Dθ × D0 to","inline":true},{"style":{"height":17.6},"width":123.65,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-30.png","element":"img","alt":"ℓ∞(Q)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is continuous. Then,","element":"span"}],[{"style":{"width":"82%"},"width":1546,"height":71,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-31.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":14.7},"width":51.5,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-32.png","element":"img","alt":" TP","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a zero mean tight Gaussian process, for each ","element":"span"},{"style":{"height":12.8},"width":120.43,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-33.png","element":"img","alt":" P ∈ P","inline":true},{"style":{"fontStyle":"italic"},"text":". Moreover,","element":"span"}],[{"style":{"width":"77%"},"width":1454,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/26-34.png","element":"img"}],[{"text":"To derive Theorem ","element":"span"},{"href":"#id-93","text":"5.3, ","element":"a"},{"text":"we strengthen the usual notion of Hadamard differentiability to a uniform notion introduced in Definition ","element":"span"},{"href":"#id-91","text":"B.1. ","element":"a"},{"text":"Theorems ","element":"span"},{"href":"#id-108","text":"B.3 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-109","text":"B.4 ","element":"a"},{"text":"show that this uniform Hadamard differentiability is sufficient to guarantee the validity of the functional delta uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":". These new uniform functional delta method theorems may be of independent interest.","element":"span"}]]},{"heading":"6. Theory: Lasso and Post-Lasso for Functional Response Data","paragraphs":[[{"id":"id-53","text":"In this section, we provide results for Lasso and Post-Lasso estimators with function-valued ","element":"span"},{"text":"outcomes and linear or logistic links. As these results are of interest beyond the context of estimation of nuisance functions for moment condition problems or treatment effects estimation, we present this section in a way that leaves it autonomous with respect to the rest of the paper.","element":"span"}],[{"text":"6.1. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"The generic setting with function-valued outcomes. ","element":"span"},{"text":"Consider a data generating process with a functional response variable (","element":"span"},{"style":{"height":17.6},"width":131.09,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-0.png","element":"img","alt":"Yu)u∈U","inline":true,"padRight":true},{"text":"and observable covariates ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"satisfying for each ","element":"span"},{"style":{"height":15.2},"width":121.96,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-1.png","element":"img","alt":" u ∈ U,","inline":true}],[{"id":"id-110","style":{"width":"67%"},"width":1262,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-2.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":16.4},"width":239.88,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-3.png","element":"img","alt":" f : X → Rp ","inline":true,"padRight":true},{"text":"is a set of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"measurable transformations of the initial controls ","element":"span"},{"style":{"height":16},"width":254.45,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-4.png","element":"img","alt":" X, θu is a p-","inline":true,"padRight":true},{"text":"dimensional vector, ","element":"span"},{"style":{"height":10.62},"width":39.69,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-5.png","element":"img","alt":" ru","inline":true,"padRight":true},{"text":"is an approximation error, and Λ is a fixed known link function. The notation in this section differs from the rest of the paper with ","element":"span"},{"style":{"height":15.02},"width":184.62,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-6.png","element":"img","alt":" Yu and X","inline":true,"padRight":true},{"text":"denoting a generic response and a generic vector of covariates to facilitate the application of these results to other contexts. We only consider the linear link function, Λ(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") = ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", and the logistic link function, Λ(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") = exp(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"1+exp(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"text":", in detail.","element":"span"}],[{"text":"Considering the logistic link is useful when the functional response is binary, though the linear link can be used in that case as well under some conditions. For example, it is useful for estimating a high-dimensional generalization of the distributional regression models considered in ","element":"span"},{"href":"#id-83","referenceIndex":36,"text":"Chernozhukov ","element":"a"},{"href":"#id-83","referenceIndex":36,"text":"et al. ","element":"a"},{"href":"#id-83","referenceIndex":36,"text":"(2013) ","element":"a"},{"text":"where the response variable is the continuum (","element":"span"},{"style":{"height":17.6},"width":393.03,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-7.png","element":"img","alt":"Yu = 1(Y ⩽ u))u∈U","inline":true},{"text":". Even though we focus on these two cases we note that the principles discussed here apply to many other ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"text":"-estimators with convex (or approximately convex) criterion functions. ","element":"span"},{"text":"In the remainder of the section, we discuss and establish results for ","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-8.png","element":"img","alt":" ℓ1","inline":true},{"text":"-penalized and post-model selection estimators of (","element":"span"},{"style":{"height":17.6},"width":126.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-9.png","element":"img","alt":"θu)u∈U","inline":true,"padRight":true},{"text":"that hold uniformly over ","element":"span"},{"style":{"height":12.8},"width":121.96,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-10.png","element":"img","alt":" u ∈ U.","inline":true}],[{"text":"Throughout the section, we assume that ","element":"span"},{"style":{"height":19.53},"width":292.76,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-11.png","element":"img","alt":" u ∈ U ⊂ [0, 1]du ","inline":true,"padRight":true},{"text":"and that we have ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"i.i.d. observations from d.g.p.’s where ","element":"span"},{"href":"#id-110","text":"(6.1) ","element":"a"},{"text":"holds, ","element":"span"},{"style":{"height":18.09},"width":345.46,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-12.png","element":"img","alt":" {(Yui)u∈U, Xi)}ni=1","inline":true},{"text":", available for estimating (","element":"span"},{"style":{"height":17.6},"width":447.53,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-13.png","element":"img","alt":"θu)u∈U. For each u ∈ U,","inline":true,"padRight":true},{"text":"penalty level ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-14.png","element":"img","alt":" λ","inline":true},{"text":", and diagonal matrix of penalty loadings ","element":"span"},{"style":{"height":15.2},"width":67.54,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-15.png","element":"img","alt":"�Ψu,","inline":true,"padRight":true},{"text":"we define the Lasso estimator as","element":"span"}],[{"id":"id-116","style":{"width":"72%"},"width":1349,"height":89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-16.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":21.29},"width":431.84,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-17.png","element":"img","alt":" M(y, t) = 12(y − Λ(t))2","inline":true,"padRight":true},{"text":"in the case of linear regression, and ","element":"span"},{"style":{"height":17.6},"width":607.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-18.png","element":"img","alt":" M(y, t) = −{1(y = 1) log Λ(t) +","inline":true,"padRight":true},{"text":"1(","element":"span"},{"style":{"height":17.6},"width":398.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-19.png","element":"img","alt":"y = 0) log(1 − Λ(t))}","inline":true,"padRight":true},{"text":"in the case of the logistic link function for binary response data. For each ","element":"span"},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-20.png","element":"img","alt":"u ∈ U","inline":true},{"text":", the Post-Lasso estimator based on a set of covariates ","element":"span"},{"style":{"height":14.62},"width":45.5,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-21.png","element":"img","alt":"�Tu","inline":true,"padRight":true},{"text":"is then defined as","element":"span"}],[{"id":"id-117","style":{"width":"75%"},"width":1407,"height":75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-22.png","element":"img"}],[{"text":"where the set ","element":"span"},{"style":{"height":14.62},"width":45.5,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-23.png","element":"img","alt":"�Tu","inline":true,"padRight":true},{"text":"contains supp(","element":"span"},{"style":{"height":15.02},"width":40.48,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-24.png","element":"img","alt":"�θu","inline":true},{"text":") and may also contain additional variables deemed as important.","element":"span"},{"style":{"height":8.4},"width":33.93,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-25.png","element":"img","alt":"19","inline":true,"padRight":true},{"text":"We will set ","element":"span"},{"style":{"height":17.6},"width":252.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/27-26.png","element":"img","alt":"�Tu = supp(�θu","inline":true},{"text":") unless otherwise noted.","element":"span"}],[{"text":"The chief departure between the analysis when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"is a singleton and the functional response case is that the penalty level needs to be set to control selection errors uniformly over ","element":"span"},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-0.png","element":"img","alt":" u ∈ U","inline":true},{"text":". To do so, we will set ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-1.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"so that with high probability","element":"span"}],[{"id":"id-112","style":{"width":"71%"},"width":1335,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-2.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c > ","element":"span"},{"text":"1 is a fixed constant. When ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"is a singleton the strategy above is similar to ","element":"span"},{"href":"#id-43","referenceIndex":24,"text":"Bickel et al. ","element":"a"},{"href":"#id-43","referenceIndex":24,"text":"(2009)","element":"a"},{"text":", ","element":"span"},{"href":"#id-48","referenceIndex":11,"text":"Belloni and Chernozhukov ","element":"a"},{"href":"#id-48","referenceIndex":11,"text":"(2013)","element":"a"},{"text":", and ","element":"span"},{"href":"#id-111","referenceIndex":17,"text":"Belloni et al. ","element":"a"},{"href":"#id-111","referenceIndex":17,"text":"(2011)","element":"a"},{"text":", who use an analog of ","element":"span"},{"href":"#id-112","text":"(6.4) ","element":"a"},{"text":"to derive the properties of Lasso and Post-Lasso. When ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"is not a singleton, this strategy was first employed in the context of ","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-3.png","element":"img","alt":" ℓ1","inline":true},{"text":"-penalized quantile regression processes by ","element":"span"},{"href":"#id-35","referenceIndex":10,"text":"Belloni and Chernozhukov ","element":"a"},{"href":"#id-35","referenceIndex":10,"text":"(2011)","element":"a"},{"text":".","element":"span"}],[{"text":"To implement ","element":"span"},{"href":"#id-112","text":"(6.4)","element":"a"},{"text":", we propose setting the penalty level as","element":"span"}],[{"id":"id-113","style":{"width":"64%"},"width":1212,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-4.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.02},"width":42.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-5.png","element":"img","alt":" du","inline":true,"padRight":true},{"text":"is the dimension of ","element":"span"},{"style":{"height":16},"width":416.94,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-6.png","element":"img","alt":" U, 1 − γ with γ = o","inline":true},{"text":"(1) is a confidence level associated with the probability of event ","element":"span"},{"href":"#id-112","text":"(6.4)","element":"a"},{"text":", and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c > ","element":"span"},{"text":"1 is a slack constant.","element":"span"},{"style":{"height":8.4},"width":33.93,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-7.png","element":"img","alt":"20 ","inline":true,"padRight":true},{"text":"When implementing the estimators, we set ","element":"span"},{"style":{"height":17.6},"width":543.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-8.png","element":"img","alt":" c = 1.1. and γ = .1/ log(n","inline":true},{"text":"), which is theoretically motivated and practically tested in an extensive set of simulation experiments in ","element":"span"},{"href":"#id-4","referenceIndex":15,"text":"Belloni et al. ","element":"a"},{"href":"#id-4","referenceIndex":15,"text":"(2014a)","element":"a"},{"text":". In addition to the penalty parameter ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-9.png","element":"img","alt":" λ","inline":true},{"text":", we also need to construct a penalty loading matrix ","element":"span"},{"style":{"height":18.22},"width":592.2,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-10.png","element":"img","alt":"�Ψu = diag({�luj, j = 1, . . . , p}).","inline":true,"padRight":true},{"text":"This loading matrix can be formed according to the following iterative algorithm.","element":"span"}],[{"id":"id-118","style":{"fontWeight":"bold"},"text":"Algorithm 6.1 ","element":"span"},{"text":"(Estimation of Penalty Loadings)","element":"span"},{"style":{"height":19.53},"width":892.48,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-11.png","element":"img","alt":". Choose γ ∈ [1/n, min{1/ log n, pndu−1}] and","inline":true},{"style":{"height":16.4},"width":330.86,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-12.png","element":"img","alt":"c > 1 to form λ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"as defined in ","element":"span"},{"href":"#id-113","style":{"fontStyle":"italic"},"text":"(6.5)","element":"a"},{"style":{"fontStyle":"italic"},"text":", and choose a constant ","element":"span"},{"style":{"height":14.4},"width":138.18,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-13.png","element":"img","alt":" K ⩾ 1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"as an upper bound on the number of iterations. (0) Set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"= 0","element":"span"},{"style":{"fontStyle":"italic"},"text":", and initialize ","element":"span"},{"style":{"height":17.42},"width":493.53,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-14.png","element":"img","alt":"�luj,0 for each j = 1, . . . , p","inline":true},{"style":{"fontStyle":"italic"},"text":". For the linear link function, set ","element":"span"},{"style":{"height":23.22},"width":1000.52,"height":58.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-15.png","element":"img","alt":"�luj,0 = {En[f2j (X)(Yu − ¯Yu)2]}1/2 with ¯Yu = En[Yu]","inline":true},{"style":{"fontStyle":"italic"},"text":". For the logistic link function, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"set ","element":"span"},{"style":{"height":23.22},"width":454.74,"height":58.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-16.png","element":"img","alt":"�luj,0 = 12{En[f2j (X)]}1/2","inline":true},{"style":{"fontStyle":"italic"},"text":". (1) Compute the Lasso and Post-Lasso estimators, ","element":"span"},{"style":{"height":15.6},"width":318.87,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-17.png","element":"img","alt":" �θu and �θu, based","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"on ","element":"span"},{"style":{"height":23.22},"width":1815.32,"height":58.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-18.png","element":"img","alt":"�Ψu = diag({�luj,k, j = 1, . . . , p}). (2) Set �luj,k+1 := {En[f2j (X)(Yu − Λ(f(X)′�θu))2]}1/2. (3) If","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"k > K","element":"span"},{"style":{"fontStyle":"italic"},"text":", stop; otherwise set ","element":"span"},{"style":{"height":14},"width":191.4,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-19.png","element":"img","alt":" k ← k + 1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and go to step (1).","element":"span"}],[{"text":"6.2. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Properties of a Continuum of Lasso and Post-Lasso: Linear Link. ","element":"span"},{"text":"We provide suf-ficient conditions for establishing good performance of the estimators discussed above when the linear link function is used. In the statement of the following assumption, ","element":"span"},{"style":{"height":16.8},"width":405.99,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-20.png","element":"img","alt":" δn ↘ 0 and ∆n ↘ 0","inline":true,"padRight":true},{"text":"are fixed sequences approaching zero from above at a speed at most polynomial in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"(for example, ","element":"span"},{"style":{"height":17.6},"width":1250.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-21.png","element":"img","alt":"δn ⩾ 1/nc for some c > 0), ℓn := log n, and c, C, κ′, κ′′ and ν ∈ (0,","inline":true,"padRight":true},{"text":"1] are positive finite constants.","element":"span"}],[{"id":"id-114","style":{"fontWeight":"bold"},"text":"Assumption 6.1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider a random element ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"style":{"fontStyle":"italic"},"text":"taking values in a measure space ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":169.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-22.png","element":"img","alt":"W, AW),","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with law determined by a probability measure ","element":"span"},{"style":{"height":14.62},"width":139.05,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-23.png","element":"img","alt":" P ∈ Pn","inline":true},{"style":{"fontStyle":"italic"},"text":". The observed data ","element":"span"},{"text":"((","element":"span"},{"style":{"height":18.09},"width":433.25,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-24.png","element":"img","alt":"Yui)u∈U, Xi)ni=1 consist","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i.i.d. copies of random element ","element":"span"},{"text":"((","element":"span"},{"style":{"height":17.6},"width":208.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-25.png","element":"img","alt":"Yu)u∈U, X)","inline":true},{"style":{"fontStyle":"italic"},"text":", which is generated as a suitably measurable transformation of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":". The model ","element":"span"},{"href":"#id-110","style":{"fontStyle":"italic"},"text":"(6.1) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"holds with linear link ","element":"span"},{"style":{"height":17.6},"width":574.01,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-26.png","element":"img","alt":" t �−→ Λ(t) = t for all u ∈ U ⊂","inline":true,"padRight":true},{"text":"[0","element":"span"},{"style":{"height":19.53},"width":294.99,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-27.png","element":"img","alt":", 1]du, where du","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is fixed and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is equipped with the semi-metric ","element":"span"},{"style":{"height":15.5},"width":47.71,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-28.png","element":"img","alt":" dU","inline":true},{"style":{"fontStyle":"italic"},"text":". Uniformly for all ","element":"span"},{"style":{"height":13.82},"width":139.94,"height":34.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-29.png","element":"img","alt":" n ⩾ n0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":14.62},"width":160.76,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/28-30.png","element":"img","alt":" P ∈ Pn","inline":true},{"style":{"fontStyle":"italic"},"text":", the following conditions hold. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"(i) The model ","element":"span"},{"href":"#id-110","style":{"fontStyle":"italic"},"text":"(6.1) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"is approximately sparse with sparsity index obeying ","element":"span"},{"text":"sup","element":"span"},{"style":{"height":18.19},"width":266.66,"height":45.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-0.png","element":"img","alt":"u∈U ∥θu∥0 ⩽ s","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and the growth restriction ","element":"span"},{"text":"log(","element":"span"},{"style":{"height":20.33},"width":496.46,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-1.png","element":"img","alt":"p ∨ n) ⩽ δnn1/3. (ii) The","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"style":{"fontStyle":"italic"},"text":"has uniform covering entropy obeying ","element":"span"},{"text":"log ","element":"span"},{"style":{"height":17.6},"width":562.09,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-2.png","element":"img","alt":" N(ϵ, U, dU) ⩽ du log(1/ϵ) ∨ 0","inline":true},{"style":{"fontStyle":"italic"},"text":", and the collection ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":541.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-3.png","element":"img","alt":"ζu = Yu −EP [Yu | X], ru)u∈U","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are suitably measurable transformations of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"style":{"fontStyle":"italic"},"text":". (iii) Uniformly over ","element":"span"},{"style":{"height":12.8},"width":114.02,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-4.png","element":"img","alt":" u ∈ U","inline":true},{"style":{"fontStyle":"italic"},"text":", the moments of the model are boundedly heteroscedastic, namely ","element":"span"},{"style":{"height":19.13},"width":379.66,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-5.png","element":"img","alt":" c ⩽ EP [ζ2u | X] ⩽ C","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"a.s., and ","element":"span"},{"text":"max","element":"span"},{"style":{"height":19.75},"width":721.42,"height":49.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-6.png","element":"img","alt":"j⩽pEP [|fj(X)ζu|3 + |fj(X)Yu|3] ⩽ C.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"(iv) For a fixed ","element":"span"},{"style":{"height":12.4},"width":122.89,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-7.png","element":"img","alt":" ν > 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and a sequence ","element":"span"},{"style":{"height":15.2},"width":72.61,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-8.png","element":"img","alt":" Kn,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"the dictionary functions, approximation errors, and empirical errors obey the following regularity conditions: (a) ","element":"span"},{"style":{"height":22.02},"width":1561.66,"height":55.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-9.png","element":"img","alt":" c ⩽ EP [f2j (X)] ⩽ C, j = 1, . . . , p; maxj⩽p |fj(X)| ⩽ Kn a.s.; K2ns log(p ∨ n) ⩽","inline":true},{"style":{"height":15.02},"width":81.13,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-10.png","element":"img","alt":"δnn.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"(b) With probability ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":19.75},"width":1311.69,"height":49.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-11.png","element":"img","alt":" − ∆n, supu∈U En[r2u(X)] ⩽ Cs log(p ∨ n)/n; supu∈U maxj⩽p |(En −","inline":true,"padRight":true},{"text":"E","element":"span"},{"style":{"height":24.06},"width":1842.36,"height":60.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-12.png","element":"img","alt":"P )[f2j (X)ζ2u]| ∨ |(En − EP )[f2j (X)Y 2u ]| ⩽ δn; log1/2(p ∨ n) supdU(u,u′)⩽1/n maxj⩽p{En[fj(X)2(ζu −","inline":true},{"style":{"height":23.32},"width":1298.56,"height":58.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-13.png","element":"img","alt":"ζu′)2]}1/2 ⩽ δn, and supdU(u,u′)⩽1/n∥En[f(X)(ζu − ζu′)]∥∞ ⩽ δnn−1/2","inline":true},{"style":{"fontStyle":"italic"},"text":". (c) With probability ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":16},"width":115.45,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-14.png","element":"img","alt":" − ∆n,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"the empirical minimum and maximum sparse eigenvalues are bounded from zero and above, namely ","element":"span"},{"style":{"height":20.59},"width":1382.21,"height":51.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-15.png","element":"img","alt":"κ′ ⩽ inf∥δ∥0⩽sℓn,∥δ∥=1 ∥f(X)′δ∥Pn,2 ⩽ sup∥δ∥0⩽sℓn,∥δ∥=1 ∥f(X)′δ∥Pn,2 ⩽ κ′′.","inline":true}],[{"text":"Assumption ","element":"span"},{"href":"#id-114","text":"6.1 ","element":"a"},{"text":"is only a set of sufficient conditions. The finite sample results in the Supplementary Appendix allow for more general conditions (for example, ","element":"span"},{"style":{"height":15.02},"width":42.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-16.png","element":"img","alt":" du","inline":true,"padRight":true},{"text":"can grow with the sample size). We verify that the more technical conditions in Assumption ","element":"span"},{"href":"#id-114","text":"6.1(","element":"a"},{"text":"iv)(b) hold in a variety of cases, see Lemma ","element":"span"},{"href":"#id-115","text":"I.2 ","element":"a"},{"text":"in Appendix ","element":"span"},{"text":"I ","element":"span"},{"text":"in the Supplementary Appendix. Under Assumption ","element":"span"},{"href":"#id-114","text":"6.1, ","element":"a"},{"text":"we establish results on the performance of the estimators ","element":"span"},{"href":"#id-116","text":"(6.2) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-117","text":"(6.3) ","element":"a"},{"text":"for the linear link function case that hold uniformly over ","element":"span"},{"style":{"height":15.02},"width":361.65,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-17.png","element":"img","alt":" u ∈ U and P ∈ Pn.","inline":true}],[{"id":"id-120","style":{"fontWeight":"bold"},"text":"Theorem 6.1 ","element":"span"},{"text":"(Rates and Sparsity for Functional Responses under Linear Link)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under Assumption ","element":"span"},{"href":"#id-114","style":{"fontStyle":"italic"},"text":"6.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and setting the penalty and loadings as in Algorithm ","element":"span"},{"href":"#id-118","style":{"fontStyle":"italic"},"text":"6.1, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"fontStyle":"italic"},"text":"large enough, uniformly for all ","element":"span"},{"style":{"height":15.1},"width":299.37,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-18.png","element":"img","alt":" P ∈ Pn with PP","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"probability ","element":"span"},{"text":"1","element":"span"},{"style":{"height":17.6},"width":112.94,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-19.png","element":"img","alt":"−o(1)","inline":true},{"style":{"fontStyle":"italic"},"text":", for some constant ","element":"span"},{"text":"¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"style":{"fontStyle":"italic"},"text":", the Lasso estimator ","element":"span"},{"style":{"height":15.02},"width":40.49,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-20.png","element":"img","alt":"�θu","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is uniformly sparse, ","element":"span"},{"text":"sup","element":"span"},{"style":{"height":19.8},"width":293.96,"height":49.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-21.png","element":"img","alt":"u∈U ∥�θu∥0 ⩽ ¯Cs","inline":true},{"style":{"fontStyle":"italic"},"text":", and the following performance bounds hold:","element":"span"}],[{"style":{"width":"86%"},"width":1613,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-22.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"For all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"fontStyle":"italic"},"text":"large enough, uniformly for all ","element":"span"},{"style":{"height":15.6},"width":343.91,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-23.png","element":"img","alt":" P ∈ Pn, with PP","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"probability ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":17.6},"width":123.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-24.png","element":"img","alt":" − o(1)","inline":true},{"style":{"fontStyle":"italic"},"text":", the Post-Lasso estimator corresponding to ","element":"span"},{"style":{"height":16.4},"width":158.12,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-25.png","element":"img","alt":"�θu obeys","inline":true}],[{"style":{"width":"88%"},"width":1648,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-26.png","element":"img"}],[{"text":"We note that the performance bounds are exactly of the type used in Assumption ","element":"span"},{"href":"#id-85","text":"4.2 ","element":"a"},{"text":"(see also Assumption ","element":"span"},{"href":"#id-119","text":"H.1 ","element":"a"},{"text":"in the Supplementary Appendix). Indeed, under the condition ","element":"span"},{"style":{"height":19.88},"width":202.29,"height":49.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-27.png","element":"img","alt":" s2 log2(p ∨","inline":true},{"style":{"height":19.88},"width":307.1,"height":49.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-28.png","element":"img","alt":"n) log2 n ⩽ δnn","inline":true},{"text":", the rate of convergence established in Theorem ","element":"span"},{"href":"#id-120","text":"6.1 ","element":"a"},{"text":"yields ","element":"span"},{"style":{"height":20.8},"width":363.33,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-29.png","element":"img","alt":"�s log(p ∨ n)/n ⩽","inline":true},{"style":{"height":20.33},"width":172.44,"height":50.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-30.png","element":"img","alt":"o(n−1/4).","inline":true}],[{"text":"6.3. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Properties of Lasso and Post-Lasso Estimators: Logistic Link. ","element":"span"},{"text":"We provide sufficient conditions to state results on the performance of the estimators discussed above for the logistic link function. Consider the fixed sequences ","element":"span"},{"style":{"height":16.8},"width":341.33,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-31.png","element":"img","alt":" δn ↘ 0 and ∆n ↘","inline":true,"padRight":true},{"text":"0 approaching zero from above at a speed at most polynomial in ","element":"span"},{"style":{"height":16.4},"width":253.52,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-32.png","element":"img","alt":" n, ℓn := log n","inline":true},{"text":", and the positive finite constants ","element":"span"},{"style":{"height":17.6},"width":481.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/29-33.png","element":"img","alt":" c, C, κ′, κ′′, and c ⩽ 1/2.","inline":true}],[{"id":"id-121","style":{"width":"110%"},"width":2065,"height":1145,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/30-0.png","element":"img"}],[{"text":"The following result characterizes the performance of the estimators ","element":"span"},{"href":"#id-116","text":"(6.2) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-117","text":"(6.3) ","element":"a"},{"text":"for the logistic link function case under Assumption ","element":"span"},{"href":"#id-121","text":"6.2.","element":"a"}],[{"id":"id-122","style":{"fontWeight":"bold"},"text":"Theorem 6.2 ","element":"span"},{"text":"(Rates and Sparsity for Functional Response under Logistic Link)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under Assumption ","element":"span"},{"href":"#id-121","style":{"fontStyle":"italic"},"text":"6.2 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and setting the penalty and loadings as in Algorithm ","element":"span"},{"href":"#id-118","style":{"fontStyle":"italic"},"text":"6.1, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"fontStyle":"italic"},"text":"large enough, uniformly for all ","element":"span"},{"style":{"height":15.1},"width":299.9,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/30-1.png","element":"img","alt":" P ∈ Pn with PP","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"probability ","element":"span"},{"text":"1","element":"span"},{"style":{"height":17.6},"width":113.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/30-2.png","element":"img","alt":"−o(1)","inline":true},{"style":{"fontStyle":"italic"},"text":", the following performance bounds hold for some constant ","element":"span"},{"text":"¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"style":{"fontStyle":"italic"},"text":":","element":"span"}],[{"style":{"width":"86%"},"width":1613,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/30-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"and the estimator is uniformly sparse: ","element":"span"},{"text":"sup","element":"span"},{"style":{"height":19.8},"width":507.96,"height":49.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/30-4.png","element":"img","alt":"u∈U ∥�θu∥0 ⩽ ¯Cs. For all n","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"large enough, uniformly for all ","element":"span"},{"style":{"height":15.6},"width":319.78,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/30-5.png","element":"img","alt":" P ∈ Pn, with PP","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"probability ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":17.6},"width":120.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/30-6.png","element":"img","alt":" − o(1)","inline":true},{"style":{"fontStyle":"italic"},"text":", the Post-Lasso estimator corresponding to ","element":"span"},{"style":{"height":16.4},"width":158.13,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/30-7.png","element":"img","alt":"�θu obeys","inline":true}],[{"style":{"width":"88%"},"width":1648,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/30-8.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Comment 6.1. ","element":"span"},{"text":"The performance bounds derived in Theorem ","element":"span"},{"href":"#id-122","text":"6.2 ","element":"a"},{"text":"satisfy the conditions of Assumption ","element":"span"},{"href":"#id-85","text":"4.2 ","element":"a"},{"text":"(see also Assumption ","element":"span"},{"href":"#id-119","text":"H.1 ","element":"a"},{"text":"in the Supplementary Material). Moreover, since the link function is 1-Lipschitz in the logistic case and the approximation errors are assumed to be small, the results above establish the same rates of convergence for estimators of the conditional probabilities; for example,","element":"span"}],[{"style":{"width":"55%"},"width":1030,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/30-9.png","element":"img"}]]},{"heading":"7. Application: the Effect of 401(k) Participation on Asset Holdings","paragraphs":[[{"id":"id-54","text":"As a practical illustration of the methods developed in this paper, we consider estimation of ","element":"span"},{"text":"the effect of 401(k) eligibility and participation on accumulated assets as in ","element":"span"},{"href":"#id-60","referenceIndex":2,"text":"Abadie ","element":"a"},{"href":"#id-60","referenceIndex":2,"text":"(2003) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004)","element":"a"},{"text":". Our goal here is to illustrate the estimation results and inference statements and to make the following points that underscore our theoretical findings: 1) In a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"low-dimensional setting","element":"span"},{"text":", where the number of controls is low and therefore there is no need for selection, our robust post-selection inference methods perform well. That is, the results of our methods agree with the results of standard methods that do not employ any selection. 2) In a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"high-dimensional ","element":"span"},{"text":"setting, where there are (moderately) many controls, our post-selection inference methods perform well, producing well-behaved estimates and confidence intervals compared to the erratic estimates and confidence intervals produced by standard methods that do not employ selection as a means of regularization. 3) Finally, in a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"very high-dimensional ","element":"span"},{"text":"setting, where the number of controls is comparable to the sample size, the standard methods break down completely, while our methods still produce well-behaved estimates and confidence intervals. These findings are in line with our theoretical results about uniform validity of our inference methods.","element":"span"}],[{"text":"The key problem in determining the effect of participation in 401(k) plans on accumulated assets is saver heterogeneity coupled with the fact that the decision to enroll in a 401(k) is non-random. It is generally recognized that some people have a higher preference for saving than others. It also seems likely that those individuals with high unobserved preference for saving would be most likely to choose to participate in tax-advantaged retirement savings plans and would tend to have otherwise high amounts of accumulated assets. The presence of unobserved savings preferences with these properties then implies that conventional estimates that do not account for saver heterogeneity and endogeneity of participation will be biased upward, tending to overstate the savings effects of 401(k) participation.","element":"span"}],[{"text":"To overcome the endogeneity of 401(k) participation, ","element":"span"},{"href":"#id-60","referenceIndex":2,"text":"Abadie ","element":"a"},{"href":"#id-60","referenceIndex":2,"text":"(2003) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov and ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004) ","element":"a"},{"text":"adopt the strategy detailed in Poterba, Venti, and Wise ","element":"span"},{"href":"#id-123","referenceIndex":83,"text":"(1994; ","element":"a"},{"href":"#id-124","referenceIndex":84,"text":"1995; ","element":"a"},{"href":"#id-125","referenceIndex":85,"text":"1996; ","element":"a"},{"href":"#id-126","referenceIndex":86,"text":"2001) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-127","referenceIndex":20,"text":"Benjamin ","element":"a"},{"href":"#id-127","referenceIndex":20,"text":"(2003)","element":"a"},{"text":", who used data from the 1991 Survey of Income and Program Participation and argue that eligibility for enrolling in a 401(k) plan in this data can be taken as exogenous after conditioning on a few observables of which the most important for their argument is income. The basic idea of their argument is that, at least around the time 401(k)’s initially became available, people were unlikely to be basing their employment decisions on whether an employer offered a 401(k) but would instead focus on income. Thus, eligibility for a 401(k) could be taken as exogenous conditional on income, and the causal effect of 401(k) eligibility could be directly estimated by appropriate comparison across eligible and ineligible individuals.","element":"span"},{"style":{"height":8.4},"width":33.93,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/31-0.png","element":"img","alt":"21 ","inline":true,"padRight":true},{"href":"#id-60","referenceIndex":2,"text":"Abadie ","element":"a"},{"href":"#id-60","referenceIndex":2,"text":"(2003)","element":"a"},{"text":", ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004)","element":"a"},{"text":", and ","element":"span"},{"href":"#id-128","referenceIndex":82,"text":"Ogburn et al. ","element":"a"},{"href":"#id-128","referenceIndex":82,"text":"(2015) ","element":"a"},{"text":"use this argument for the exogeneity of eligibility conditional on controls to argue that 401(k) eligibility provides a valid instrument for 401(k) participation and employ IV methods to estimate the effect of 401(k) participation on accumulated assets.","element":"span"}],[{"text":"As a complement to the work cited above, we estimate various treatment effects of 401(k) participation on financial wealth using high-dimensional methods. A key component of the argument underlying the exogeneity of 401(k) eligibility is that eligibility may only be taken as exogenous after conditioning on income. Both ","element":"span"},{"href":"#id-60","referenceIndex":2,"text":"Abadie ","element":"a"},{"href":"#id-60","referenceIndex":2,"text":"(2003) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004) ","element":"a"},{"text":"adopt this argument but control only for a small number of terms. One might wonder whether the small number of terms considered is sufficient to adequately control for income and other related confounds. At the same time, the power to learn anything about the effect of 401(k) participation decreases as one controls more flexibly for confounds. The methods developed in this paper offer one resolution to this tension by allowing us to consider a very broad set of controls and functional forms under the assumption that among the set of variables we consider there is a relatively low-dimensional set that adequately captures the effect of confounds. This approach is more general than that pursued in previous research which implicitly assumes that confounding effects can adequately be controlled for by a small number of variables chosen ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ex ante ","element":"span"},{"text":"by the researcher.","element":"span"}],[{"text":"We use the same data as ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004)","element":"a"},{"text":". The data consist of 9,915 observations at the household level drawn from the 1991 SIPP. We use net financial assets as the outcome variable, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":", in our analysis. Our treatment variable, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D","element":"span"},{"text":", is an indicator for having positive 401(k) balances; and our instrument, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z","element":"span"},{"text":", is an indicator for being eligible to enroll in a 401(k) plan. The vector of raw covariates, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":", consists of age, income, family size, years of education, a married indicator, a two-earner status indicator, a defined benefit pension status indicator, an IRA participation indicator, and a home ownership indicator. Further details can be found in ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004)","element":"a"},{"text":".","element":"span"}],[{"text":"We present detailed results for three different sets of controls ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":"). The first specification uses indicators of marital status, two-earner status, defined benefit pension status, IRA participation status, and home ownership status, second order polynomials in family size and education, a third order polynomial in age, and a quadratic spline in income with six break points","element":"span"},{"style":{"height":19.56},"width":264.2,"height":48.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/32-0.png","element":"img","alt":"22 (Quadratic","inline":true,"padRight":true},{"text":"Spline specification). ","element":"span"},{"text":"The second specification augments the Quadratic Spline specification by interacting all the non-income variables with each term in the income spline (Quadratic Spline Plus Interactions specification). The final specification forms a larger set of potential controls by starting with all of the variables from the Quadratic Spline specification and forming all two-way interactions between all of the non-income variables. The set of main effects and interactions of all non-income variables is then fully interacted with all of the income terms (Quadratic Spline Plus Many Interactions specification).","element":"span"},{"style":{"height":8.4},"width":33.93,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/32-1.png","element":"img","alt":"23 ","inline":true,"padRight":true},{"text":"The dimensions of the set of controls are thus 35, 311, and 1756 for the Quadratic Spline, Quadratic Spline Plus Interactions, and Quadratic Spline Plus Many Interactions specification, respectively. For methods that do not use variable selection, we use 32, 272, and 1526 variables resulting from removing terms that are perfectly collinear. We refer to the specification without interactions as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"low-","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":", to the specification with only income interactions as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"high-","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":", and to the specification with all two-way interactions further interacted with income as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"very-high-","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":".","element":"span"}],[{"text":"We report a variety of results for each specification. Under the maintained assumption that 401(k) eligibility may be taken as exogenous after controlling for the variables defined in the preceding paragraph, we can use the methods of this paper to estimate intention to treat effects of 401(k) eligibility by setting 401(k) eligibility as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z","element":"span"},{"text":". We report the estimated average intention to treat and average intention to treat on the treated as the ATE and ATE-T, and we report estimates of quantile intention to treat and quantile intention to treat on the treated effects as QTE and QTE-T. We also directly apply the results of this paper to estimate effects of 401(k) participation, reporting estimates of the LATE, LATE-T, LQTE, and LQTE-T for each specifi-cation.","element":"span"},{"style":{"height":8.4},"width":33.93,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/33-0.png","element":"img","alt":"24 ","inline":true,"padRight":true},{"text":"For comparison, we also report estimates of the eligibility effect from the linear model without selection and with selection using the approach of ","element":"span"},{"href":"#id-4","referenceIndex":15,"text":"Belloni et al. ","element":"a"},{"href":"#id-4","referenceIndex":15,"text":"(2014a) ","element":"a"},{"text":"and estimates of the participation effect from linear instrumental variables estimation without selection and with selection as in ","element":"span"},{"href":"#id-11","referenceIndex":40,"text":"Chernozhukov et al. ","element":"a"},{"href":"#id-11","referenceIndex":40,"text":"(2015a)","element":"a"},{"text":".","element":"span"}],[{"text":"Estimation of all these treatment effects depends on first-stage estimates of reduced form functions as detailed in Section ","element":"span"},{"text":"3. ","element":"span"},{"text":"We estimate reduced form functions where the outcome is continuous using ordinary least squares when no model selection is used or Post-Lasso when selection is used. We estimate reduced form functions where the outcome is binary by logistic regression when no model selection is used or Post-","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/33-1.png","element":"img","alt":"ℓ1","inline":true},{"text":"-penalized logistic regression when selection is used. We only report selection-based estimates in the very-high-","element":"span"},{"style":{"height":19.16},"width":211.94,"height":47.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/33-2.png","element":"img","alt":"p setting.25 ","inline":true,"padRight":true},{"text":"We refer to Appendix ","element":"span"},{"href":"#id-68","text":"F ","element":"a"},{"text":"for detailed discussion of implementing our approach in this example.","element":"span"}],[{"text":"Estimates of the ATE, ATE-T, LATE and LATE-T as well as the coefficient on 401(k) eligibility from the linear model and coefficient on 401(k) participation in the linear IV model are given in Table 1. In this table, we provide point estimates for each of the three sets of controls with and without variable selection. We report conventional heteroscedasticity consistent standard error estimates for the linear model and linear IV coefficient. For the ATE, ATE-T, LATE, and LATE-T, we report both analytic and multiplier bootstrap standard errors. The bootstrap standard errors are based on 500 bootstrap replications with ","element":"span"},{"href":"#id-77","referenceIndex":76,"text":"Mammen ","element":"a"},{"href":"#id-77","referenceIndex":76,"text":"(1993) ","element":"a"},{"text":"weights as multipliers.","element":"span"}],[{"text":"$40","element":"span"},{"style":{"height":6.4},"width":29.61,"height":16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/34-0.png","element":"img","alt":"26","inline":true}],[{"text":"We observe somewhat different results in the Quadratic Spline Plus Interactions specification. For both the ATE and the LATE in the Quadratic Spline Plus Interactions case, we see a substantially larger point estimate without selection than with selection, with the selection results being similar to those obtained in the low-p case. Along with the larger point estimate, we also see that the estimated standard errors in the no selection case for the ATE and LATE are roughly three times larger than the standard errors in the selection case. For the ATE-T and LATE-T in the Quadratic Spline Plus Interactions case, point estimates following selection are notably smaller than without selection but estimated standard errors after selection are somewhat larger. We note that one might suspect estimated standard errors for all of the estimators without selection to be substantially downward biased in this case due to the use of many control variables without regularization as in ","element":"span"},{"href":"#id-129","referenceIndex":27,"text":"Cattaneo et al. ","element":"a"},{"href":"#id-129","referenceIndex":27,"text":"(2010)","element":"a"},{"text":". Finally, we see a large difference in the Orthogonal Polynomials Plus Many Interactions Specifications as estimates cannot even be computed reliably without selection due to severe overfitting: The estimated propensity score is either 0 or 1 for every observation.","element":"span"}],[{"text":"We provide estimates of the QTE and QTE-T in Figure 1 and estimates of the LQTE and LQTE-T in Figure 2. The left column of Figure 1 gives results for the QTE, and the right column displays the results for the QTE-T. Similarly, the left and right columns of Figure 2 provide the LQTE and LQTE-T respectively. We give the results for the Quadratic Spline, Quadratic Spline Plus Interactions, and Quadratic Spline Plus Many Interactions specification in the top row, middle row, and bottom row respectively. In each graphic, we use solid lines for point estimates and report uniform 95% confidence intervals with dashed lines.","element":"span"}],[{"text":"Looking across the figures, we see a similar pattern to that seen for the estimates of the average effects in that the selection-based estimates are stable across all specifications and are very similar to the estimates obtained without selection from the baseline low-","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"Quadratic Spline specification. In the more flexible Quadratic Spline plus Interactions specification, the estimates that do not make use of selection behave somewhat erratically. This erratic behavior is especially apparent in the estimated LQTE of 401(k) participation where we observe that small changes in the quantile index may result in large swings in the point estimate of the LQTE and estimated standard errors are quite large. Again, this erratic behavior is likely due to overfitting due to the large set of variables considered. As with the average effects, estimated quantile effects without selection in the Quadratic Spline Plus Many Interactions specification are not reported as the estimated propensity score is always 0 or 1.","element":"span"}],[{"text":"If we focus on the LQTE and LQTE-T estimated from variable selection methods, we find that 401(k) participation has a small impact on accumulated net total financial assets at low quantiles while appearing to have a larger impact at high quantiles. ","element":"span"},{"text":"Looking at the uniform confidence intervals, we can see that this pattern is statistically significant at the 5% level and that we would reject the hypothesis that 401(k) participation has no effect and reject the hypothesis of a constant treatment effect more generally.","element":"span"}],[{"text":"It is also worth discussing the results of the variable selection briefly as well. Due to the number of models and variable selection steps taken, especially in computing quantile effects, it is not practical to give a complete accounting of the selected variables here. Rather, we note that for the linear model, linear IV, ATE, and LATE results, we select between two and 22 variables depending on the specification of controls and left-hand-side variable. The median number of variables selected for the QTE and LQTE results, where the median is taken across index values ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"text":", across the different specifications of controls and left-hand-side variables varies between one and 11. There is considerable variability in the number of variables selected across ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"though, ranging from a minimum of no variables selected to a maximum of 237 selected variables.","element":"span"},{"style":{"height":8.4},"width":33.93,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/35-0.png","element":"img","alt":"27 ","inline":true,"padRight":true},{"text":"The selected variables themselves mostly correspond to capturing the effect of income. For example, the union of the variables selected in forming each of the reduced form quantities used for estimating the LATE in the Quadratic Spline Plus Many Interactions specification consists of 36 variables, only four of which do not include income.","element":"span"},{"style":{"height":6.4},"width":29.61,"height":16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/35-1.png","element":"img","alt":"28 ","inline":true,"padRight":true},{"text":"This pattern of largely selecting terms that are direct income effects or interactions of income with other variables holds up across the specifications considered.","element":"span"}],[{"text":"It is interesting that our results are similar to those in ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004) ","element":"a"},{"text":"despite allowing for a much richer set of controls. The fact that we allow for a rich set of controls but produce similar results to those previously available lends further credibility to the claim that previous work controlled adequately for the available observables.","element":"span"},{"style":{"height":6.4},"width":29.61,"height":16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-0.png","element":"img","alt":"29 ","inline":true,"padRight":true},{"text":"Finally, it is worth noting that this similarity is not mechanical or otherwise built in to the procedure. For example, applications in ","element":"span"},{"href":"#id-6","referenceIndex":9,"text":"Belloni et al. ","element":"a"},{"href":"#id-6","referenceIndex":9,"text":"(2012) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-4","referenceIndex":15,"text":"Belloni et al. ","element":"a"},{"href":"#id-4","referenceIndex":15,"text":"(2014a) ","element":"a"},{"text":"use high-dimensional variable selection methods and produce sets of variables that differ substantially from intuitive baselines.","element":"span"}]]},{"heading":"Appendix A. Notation","paragraphs":[[{"id":"id-55","text":"A.1. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Overall Notation. ","element":"span"},{"text":"We consider a random element ","element":"span"},{"style":{"height":14.7},"width":178.91,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-1.png","element":"img","alt":" W = WP","inline":true,"padRight":true},{"text":"taking values in the measure space (","element":"span"},{"style":{"height":16.3},"width":136.93,"height":40.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-2.png","element":"img","alt":"W, AW","inline":true},{"text":"), with probability law ","element":"span"},{"style":{"height":12.8},"width":130.75,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-3.png","element":"img","alt":" P ∈ P","inline":true},{"text":". Note that it is most convenient to think about ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"as a parameter in a parameter set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":". We shall also work with a bootstrap multiplier variable ","element":"span"},{"style":{"height":16.4},"width":20,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-4.png","element":"img","alt":"ξ","inline":true,"padRight":true},{"text":"taking values in (","element":"span"},{"style":{"height":16},"width":108.75,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-5.png","element":"img","alt":"R, AR","inline":true},{"text":") that is independent of ","element":"span"},{"style":{"height":14.7},"width":67.21,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-6.png","element":"img","alt":" WP","inline":true,"padRight":true},{"text":", having probability law ","element":"span"},{"style":{"height":17.24},"width":43.02,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-7.png","element":"img","alt":" Pξ","inline":true},{"text":", which is fixed throughout. We consider (","element":"span"},{"style":{"height":18.3},"width":586.62,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-8.png","element":"img","alt":"Wi)∞i=1 = (Wi,P )∞i=1 and (ξi)∞i=1 ","inline":true,"padRight":true},{"text":"to be i.i.d. copies of ","element":"span"},{"style":{"height":16.4},"width":164.7,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-9.png","element":"img","alt":" W and ξ","inline":true},{"text":", which are ","element":"span"},{"text":"also independent of each other. The data will be defined as some measurable function of ","element":"span"},{"style":{"height":15.02},"width":123.88,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-10.png","element":"img","alt":" Wi for","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ..., n","element":"span"},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"denotes the sample size.","element":"span"}],[{"text":"We require the sequences (","element":"span"},{"style":{"height":18.09},"width":352.7,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-11.png","element":"img","alt":"Wi)∞i=1 and (ξi)∞i=1 ","inline":true,"padRight":true},{"text":"to live on a probability space (Ω","element":"span"},{"style":{"height":17.6},"width":305.62,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-12.png","element":"img","alt":", AΩ, PP ) for all","inline":true},{"style":{"height":12.8},"width":121.84,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-13.png","element":"img","alt":"P ∈ P","inline":true},{"text":"; note that other variables arising in the proofs do not need to live on the same space. It is important to keep track of the dependence on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"in the analysis since we want the results to hold uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"in some set ","element":"span"},{"style":{"height":14.62},"width":51.35,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-14.png","element":"img","alt":" Pn","inline":true,"padRight":true},{"text":"which may be dependent on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". Typically, this set will increase with ","element":"span"},{"style":{"height":15.82},"width":347.27,"height":39.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-15.png","element":"img","alt":"n; i.e. Pn ⊆ Pn+1.","inline":true}],[{"text":"Throughout the paper we signify the dependence on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"by mostly using ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"as a subscript in P","element":"span"},{"style":{"height":9.3},"width":40.17,"height":23.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-16.png","element":"img","alt":"P ,","inline":true,"padRight":true},{"text":"but in the proofs we sometimes use it as a subscript for variables as in ","element":"span"},{"style":{"height":14.7},"width":67.21,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-17.png","element":"img","alt":" WP","inline":true,"padRight":true},{"text":". The operator E denotes a generic expectation operator with respect to a generic probability measure P, while E","element":"span"},{"style":{"height":15.1},"width":186.54,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-18.png","element":"img","alt":"P denotes","inline":true,"padRight":true},{"text":"the expectation with respect to P","element":"span"},{"style":{"height":8.8},"width":26,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-19.png","element":"img","alt":"P","inline":true,"padRight":true},{"text":". Note also that we use capital letters such as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"to denote random elements and use the corresponding lower case letters such as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"w ","element":"span"},{"text":"to denote fixed values that these random elements can take.","element":"span"}],[{"text":"We denote by ","element":"span"},{"style":{"height":14.62},"width":47.66,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-20.png","element":"img","alt":" Pn","inline":true,"padRight":true},{"text":"the (random) empirical probability measure that assigns probability ","element":"span"},{"style":{"height":15.13},"width":128.35,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-21.png","element":"img","alt":" n−1 to","inline":true,"padRight":true},{"text":"each ","element":"span"},{"style":{"height":18.09},"width":344.97,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-22.png","element":"img","alt":" Wi ∈ (Wi)ni=1. En","inline":true,"padRight":true},{"text":"denotes the expectation with respect to the empirical measure, and ","element":"span"},{"style":{"height":17.5},"width":89.9,"height":43.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-23.png","element":"img","alt":" Gn,P","inline":true,"padRight":true},{"text":"denotes the empirical process ","element":"span"},{"style":{"height":17.77},"width":317.84,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-24.png","element":"img","alt":"√n(En − P), i.e.","inline":true}],[{"style":{"width":"79%"},"width":1483,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-25.png","element":"img"}],[{"text":"indexed by a measurable class of functions ","element":"span"},{"style":{"height":13.2},"width":267.83,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-26.png","element":"img","alt":" F : W �−→ R","inline":true},{"text":"; see ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996, ","element":"a"},{"text":"chap. 2.3). We shall often omit the index ","element":"span"},{"style":{"height":17.5},"width":242.87,"height":43.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-27.png","element":"img","alt":" P from Gn,P","inline":true,"padRight":true},{"text":"and simply write ","element":"span"},{"style":{"height":15.02},"width":54.94,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-28.png","element":"img","alt":" Gn","inline":true},{"text":". In what follows, we use ","element":"span"},{"style":{"height":18.45},"width":124.04,"height":46.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-29.png","element":"img","alt":" ∥ · ∥P,q","inline":true,"padRight":true},{"text":"to denote the ","element":"span"},{"style":{"height":17.6},"width":92.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-30.png","element":"img","alt":" Lq(P","inline":true},{"text":") norm; for example, we use ","element":"span"},{"style":{"height":21.48},"width":730.28,"height":53.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-31.png","element":"img","alt":" ∥f(W)∥P,q = (�|f(w)|qdP(w))1/q and","inline":true},{"style":{"height":21.11},"width":709.76,"height":52.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-32.png","element":"img","alt":"∥f(W)∥Pn,q = (n−1 �ni=1 |f(Wi)|q)1/q","inline":true},{"text":". For a vector ","element":"span"},{"style":{"height":18.22},"width":876.87,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-33.png","element":"img","alt":" v = (v1, . . . , vp)′ ∈ Rp, ∥v∥1 = |v1| + · · · + |vp|","inline":true,"padRight":true},{"text":"denotes the ","element":"span"},{"style":{"height":20.08},"width":512.46,"height":50.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-34.png","element":"img","alt":" ℓ1-norm of v, ∥v∥ = √v′v","inline":true,"padRight":true},{"text":"denotes the Euclidean norm of ","element":"span"},{"style":{"height":17.6},"width":228.47,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-35.png","element":"img","alt":" v, and ∥v∥0","inline":true,"padRight":true},{"text":"denotes the ","element":"span"},{"style":{"height":15.02},"width":279.29,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-36.png","element":"img","alt":"ℓ0-“norm” of v","inline":true,"padRight":true},{"text":"which equals the number of non-zero components of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v","element":"span"},{"text":". For a positive integer ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":", [","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":"] denotes the set ","element":"span"},{"style":{"height":17.6},"width":402.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-37.png","element":"img","alt":" {1, . . . , k}. For xn, yn","inline":true,"padRight":true},{"text":"denoting sequences in ","element":"span"},{"text":"R","element":"span"},{"text":", the statement ","element":"span"},{"style":{"height":16.8},"width":150.33,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-38.png","element":"img","alt":" xn ≲ yn","inline":true,"padRight":true},{"text":"means that ","element":"span"},{"style":{"height":16.8},"width":180.79,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/36-39.png","element":"img","alt":"xn ⩽ Ayn","inline":true,"padRight":true},{"text":"for some constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"that does not depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":".","element":"span"}],[{"text":"We say that a collection of random variables ","element":"span"},{"style":{"height":17.6},"width":923.55,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-0.png","element":"img","alt":" F = {f(W, t), t ∈ T}, where f : W × T → R,","inline":true,"padRight":true},{"text":"indexed by a set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"and viewed as functions of ","element":"span"},{"style":{"height":16.4},"width":581.29,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-1.png","element":"img","alt":" W ∈ W, is suitably measurable","inline":true,"padRight":true},{"text":"with respect to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"if it is image admissible Suslin class, as defined in ","element":"span"},{"href":"#id-130","referenceIndex":43,"text":"Dudley ","element":"a"},{"href":"#id-130","referenceIndex":43,"text":"(1999, ","element":"a"},{"text":"p 186). In particular, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"text":"is suitably measurable if ","element":"span"},{"style":{"height":16.4},"width":292.46,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-2.png","element":"img","alt":" f : W × T → R","inline":true,"padRight":true},{"text":"is measurable and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is a Polish space equipped with its Borel sigma algebra, see ","element":"span"},{"href":"#id-130","referenceIndex":43,"text":"Dudley ","element":"a"},{"href":"#id-130","referenceIndex":43,"text":"(1999, ","element":"a"},{"text":"p 186). This condition is a mild assumption satisfied in practical cases.","element":"span"}],[{"text":"A.2. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Notation for Stochastic Convergence Uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"All parameters, such as the law of the data, are indexed by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":". This dependency is sometimes kept implicit. We shall allow for the possibility that the probability measure ","element":"span"},{"style":{"height":14.62},"width":141.27,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-3.png","element":"img","alt":" P = Pn","inline":true,"padRight":true},{"text":"can depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". We shall conduct our stochastic convergence analysis uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"can vary within some set ","element":"span"},{"style":{"height":14.62},"width":51.35,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-4.png","element":"img","alt":" Pn","inline":true},{"text":", which itself may vary with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":".","element":"span"}],[{"text":"The convergence analysis, namely the stochastic order relations and convergence in distribution, uniformly in ","element":"span"},{"style":{"height":14.62},"width":150.14,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-5.png","element":"img","alt":" P ∈ Pn","inline":true,"padRight":true},{"text":"and the analysis under all sequences ","element":"span"},{"style":{"height":14.62},"width":166.62,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-6.png","element":"img","alt":" Pn ∈ Pn","inline":true,"padRight":true},{"text":"are equivalent. Specifically, consider a sequence of stochastic processes ","element":"span"},{"style":{"height":17.1},"width":92.12,"height":42.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-7.png","element":"img","alt":" Xn,P","inline":true,"padRight":true},{"text":"and a random element ","element":"span"},{"style":{"height":14.7},"width":51.33,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-8.png","element":"img","alt":" YP","inline":true,"padRight":true},{"text":", taking values in the normed space ","element":"span"},{"text":"D","element":"span"},{"text":", defined on the probability space (Ω","element":"span"},{"style":{"height":16},"width":155.78,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-9.png","element":"img","alt":", AΩ, PP","inline":true,"padRight":true},{"text":"). Through most of the Appendix ","element":"span"},{"style":{"height":17.6},"width":192.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-10.png","element":"img","alt":"D = ℓ∞(U","inline":true},{"text":"), the space of uniformly bounded functions mapping an arbitrary index set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"to the real line, or ","element":"span"},{"text":"D ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"UC","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"text":"), the space of uniformly continuous functions mapping an arbitrary index set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"to the real line. Consider also a sequence of deterministic positive constants ","element":"span"},{"style":{"height":10.62},"width":44.07,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-11.png","element":"img","alt":" an","inline":true},{"text":". We shall say that","element":"span"}],[{"style":{"width":"95%"},"width":1789,"height":175,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-12.png","element":"img"}],[{"text":"Here the symbol ","element":"span"},{"style":{"height":0},"width":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-13.png","element":"img","alt":" ⇝","inline":true,"padRight":true},{"text":"denotes weak convergence, i.e. convergence in distribution or law, BL","element":"span"},{"style":{"height":17.6},"width":84.42,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-14.png","element":"img","alt":"1(D)","inline":true,"padRight":true},{"text":"denotes the space of functions mapping ","element":"span"},{"text":"D ","element":"span"},{"text":"to [0","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1] with Lipschitz norm at most 1, and the outer probability and expectation, P","element":"span"},{"style":{"height":17.57},"width":180.26,"height":43.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-15.png","element":"img","alt":"∗P and E∗P","inline":true},{"text":", are invoked whenever (non)-measurability arises.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Lemma A.1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The above notions (i), (ii) and (iii) are equivalent to the following notions (a), (b), and (c), each holding for every sequence ","element":"span"},{"style":{"height":14.62},"width":170.79,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-16.png","element":"img","alt":" Pn ∈ Pn:","inline":true}],[{"style":{"width":"70%"},"width":1315,"height":175,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-17.png","element":"img"}],[{"text":"The claims follow straightforwardly from the definitions, so the proof is omitted. We shall use this equivalence extensively in the proofs of the main results without explicit reference.","element":"span"}]]},{"heading":"Appendix B. Key Tools I: Uniform in P Donsker Theorem, Multiplier Bootstrap,","paragraphs":[[{"style":{"width":"36%"},"width":685,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-18.png","element":"img"}],[{"text":"B.1. ","element":"span"},{"style":{"height":18.09},"width":947.48,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-19.png","element":"img","alt":" Uniform in P Donsker Property. Let (Wi)∞i=1 ","inline":true,"padRight":true},{"text":"be a sequence of i.i.d. copies of the random ","element":"span"},{"text":"element ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"taking values in the measure space (","element":"span"},{"style":{"height":16.3},"width":136.93,"height":40.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-20.png","element":"img","alt":"W, AW","inline":true},{"text":") according to the probability law ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"on that space. Let ","element":"span"},{"style":{"height":18.3},"width":383.41,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-21.png","element":"img","alt":" FP = {ft,P : t ∈ T}","inline":true,"padRight":true},{"text":"be a set of suitably measurable functions ","element":"span"},{"style":{"height":18.3},"width":450.07,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-22.png","element":"img","alt":" w �−→ ft,P (w) mapping","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"to ","element":"span"},{"text":"R","element":"span"},{"text":", equipped with a measurable envelope ","element":"span"},{"style":{"height":14.7},"width":290.14,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-23.png","element":"img","alt":" FP : W �−→ R","inline":true},{"text":". The class is indexed by ","element":"span"},{"style":{"height":12.8},"width":132.57,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-24.png","element":"img","alt":" P ∈ P","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":15.6},"width":284.75,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-25.png","element":"img","alt":" t ∈ T, where T","inline":true,"padRight":true},{"text":"is a fixed, totally bounded semi-metric space equipped with a semi-metric ","element":"span"},{"style":{"height":15.1},"width":61.13,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/37-26.png","element":"img","alt":" dT .","inline":true,"padRight":true},{"text":"Let ","element":"span"},{"style":{"height":18.3},"width":296.72,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-0.png","element":"img","alt":" N(ϵ, FP , ∥ · ∥Q,2","inline":true},{"text":") denote the ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-1.png","element":"img","alt":" ϵ","inline":true},{"text":"-covering number of the class of functions ","element":"span"},{"style":{"height":15.1},"width":57.36,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-2.png","element":"img","alt":" FP","inline":true,"padRight":true},{"text":"with respect to the ","element":"span"},{"style":{"height":19.84},"width":578.63,"height":49.59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-3.png","element":"img","alt":"L2(Q) seminorm ∥ · ∥Q,2 for Q","inline":true,"padRight":true},{"text":"a finitely-discrete measure on (","element":"span"},{"style":{"height":16.3},"width":136.93,"height":40.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-4.png","element":"img","alt":"W, AW","inline":true},{"text":"). We shall use the following result.","element":"span"}],[{"id":"id-131","style":{"fontWeight":"bold"},"text":"Theorem B.1 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Uniform in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Donsker Property","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Work with the set-up above. Suppose that for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q > ","element":"span"},{"text":"2","element":"span"}],[{"id":"id-133","style":{"width":"77%"},"width":1459,"height":82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-5.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Furthermore, suppose that","element":"span"}],[{"id":"id-173","style":{"width":"78%"},"width":1478,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-6.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":15.1},"width":59.94,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-7.png","element":"img","alt":" GP","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denote the P-Brownian Bridge, and consider","element":"span"}],[{"style":{"width":"76%"},"width":1439,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"(a) Then, ","element":"span"},{"style":{"height":18.3},"width":402.48,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-9.png","element":"img","alt":" Zn,P ⇝ ZP in ℓ∞(T)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":16.4},"width":287.13,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-10.png","element":"img","alt":" P ∈ P, namely","inline":true}],[{"style":{"width":"46%"},"width":870,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-11.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"(b) The process ","element":"span"},{"style":{"height":17.1},"width":85.76,"height":42.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-12.png","element":"img","alt":" Zn,P","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is stochastically equicontinuous uniformly in ","element":"span"},{"style":{"height":12.8},"width":120.44,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-13.png","element":"img","alt":" P ∈ P","inline":true},{"style":{"fontStyle":"italic"},"text":", i.e., for every ","element":"span"},{"style":{"height":14.8},"width":113.35,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-14.png","element":"img","alt":" ε > 0,","inline":true}],[{"style":{"width":"58%"},"width":1100,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-15.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"(c) The limit process ","element":"span"},{"style":{"height":14.7},"width":55.79,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-16.png","element":"img","alt":" ZP","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"has the following continuity properties:","element":"span"}],[{"style":{"width":"67%"},"width":1256,"height":83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-17.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"(d) The paths ","element":"span"},{"style":{"height":17.6},"width":217.99,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-18.png","element":"img","alt":" t �−→ ZP (t)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are a.s. uniformly continuous on ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":114.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-19.png","element":"img","alt":"T, dT )","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"under each ","element":"span"},{"style":{"height":12.8},"width":134.35,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-20.png","element":"img","alt":" P ∈ P.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Comment B.1. [Important Feature of the Theorem] ","element":"span"},{"text":"This is an extension of the uniform Donsker theorem stated in Theorem 2.8.2 in ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996)","element":"a"},{"text":", which allows for the function classes ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"text":"to be ","element":"span"},{"style":{"fontWeight":"bold"},"text":"dependent on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":". This generalization is crucial and is required in all of our problems.","element":"span"}],[{"text":"B.2. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Uniform in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Validity of Multiplier Bootstrap. ","element":"span"},{"text":"Consider the setting of the preceding subsection. Let (","element":"span"},{"style":{"height":18.09},"width":104.46,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-21.png","element":"img","alt":"ξi)ni=1 ","inline":true,"padRight":true},{"text":"be i.i.d multipliers whose distribution does not depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", such that ","element":"span"},{"text":"E","element":"span"},{"style":{"height":19.13},"width":723.29,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-22.png","element":"img","alt":"ξ = 0, Eξ2 = 1, and E|ξ|q ⩽ C for q >","inline":true,"padRight":true},{"text":"2. Consider the multiplier empirical process:","element":"span"}],[{"style":{"width":"69%"},"width":1295,"height":138,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-23.png","element":"img"}],[{"text":"Here ","element":"span"},{"style":{"height":15.02},"width":54.94,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-24.png","element":"img","alt":" Gn","inline":true,"padRight":true},{"text":"is taken to be an extended empirical processes defined by the empirical measure that assigns mass 1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/n ","element":"span"},{"text":"to each point (","element":"span"},{"style":{"height":18.3},"width":1239.23,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/38-25.png","element":"img","alt":"Wi, ξi) for i = 1, ..., n. Let ZP = (ZP (t))t∈T = (GP (ft,P ))t∈T as","inline":true,"padRight":true},{"text":"defined in Theorem ","element":"span"},{"href":"#id-131","text":"B.1.","element":"a"}],[{"id":"id-134","style":{"fontWeight":"bold"},"text":"Theorem B.2 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Uniform in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Validity of Multiplier Bootstrap","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Assume the conditions of Theorem ","element":"span"},{"href":"#id-131","style":{"fontStyle":"italic"},"text":"B.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold. Then (a) the following unconditional convergence takes place, ","element":"span"},{"style":{"height":19.91},"width":279.29,"height":49.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-0.png","element":"img","alt":" Z∗n,P ⇝ ZP in","inline":true},{"style":{"height":17.6},"width":119.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-1.png","element":"img","alt":"ℓ∞(T)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":16.4},"width":287.13,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-2.png","element":"img","alt":" P ∈ P, namely","inline":true}],[{"style":{"width":"46%"},"width":871,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"and (b) the following conditional convergence takes place, ","element":"span"},{"style":{"height":20.77},"width":459.13,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-4.png","element":"img","alt":" Z∗n,P ⇝B ZP in ℓ∞(T)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":12.8},"width":120.44,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-5.png","element":"img","alt":"P ∈ P","inline":true},{"style":{"fontStyle":"italic"},"text":", namely uniformly in ","element":"span"},{"style":{"height":12.8},"width":120.44,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-6.png","element":"img","alt":" P ∈ P","inline":true}],[{"style":{"width":"47%"},"width":882,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-7.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"text":"E","element":"span"},{"style":{"height":10.4},"width":43.55,"height":25.99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-8.png","element":"img","alt":"Bn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denotes the expectation over the multiplier weights ","element":"span"},{"text":"(","element":"span"},{"style":{"height":18.09},"width":104.46,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-9.png","element":"img","alt":"ξi)ni=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"holding the data ","element":"span"},{"text":"(","element":"span"},{"style":{"height":18.09},"width":239.19,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-10.png","element":"img","alt":"Wi)ni=1 fixed.","inline":true}],[{"text":"B.3. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Uniform in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Functional Delta Method and Bootstrap. ","element":"span"},{"text":"We shall use the functional delta method, as formulated in ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996, ","element":"a"},{"text":"Chap. 3.9). Let ","element":"span"},{"style":{"height":15.6},"width":306.7,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-11.png","element":"img","alt":" D0, D, and E be","inline":true,"padRight":true},{"text":"normed spaces, with ","element":"span"},{"style":{"height":18.04},"width":1471.04,"height":45.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-12.png","element":"img","alt":" D0 ⊂ D. A map φ : Dφ ⊂ D �−→ E is called Hadamard-differentiable at ρ ∈ Dφ","inline":true,"padRight":true},{"text":"tangentially to ","element":"span"},{"style":{"height":14.62},"width":48.52,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-13.png","element":"img","alt":" D0","inline":true,"padRight":true},{"text":"if there is a continuous linear map ","element":"span"},{"style":{"height":19.52},"width":448.78,"height":48.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-14.png","element":"img","alt":" φ′ρ : D0 �−→ E such that","inline":true}],[{"style":{"width":"39%"},"width":745,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-15.png","element":"img"}],[{"text":"for all sequences ","element":"span"},{"style":{"height":17.64},"width":1376.98,"height":44.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-16.png","element":"img","alt":" tn → 0 in R and hn → h ∈ D0 in D such that ρ + tnhn ∈ Dφ for every n.","inline":true}],[{"text":"We now define the following notion of the uniform Hadamard differentiability:","element":"span"}],[{"id":"id-91","style":{"fontWeight":"bold"},"text":"Definition B.1 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Uniform Hadamard Tangential Differentiability","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider a map ","element":"span"},{"style":{"height":16.4},"width":60.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-17.png","element":"img","alt":" φ :","inline":true},{"style":{"height":17.24},"width":186.14,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-18.png","element":"img","alt":"Dφ �−→ E","inline":true},{"style":{"fontStyle":"italic"},"text":", where the domain of the map ","element":"span"},{"style":{"height":17.24},"width":51.52,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-19.png","element":"img","alt":" Dφ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a subset of a normed space ","element":"span"},{"text":"D ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and the range is a subset of the normed space ","element":"span"},{"style":{"height":14.62},"width":184.31,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-20.png","element":"img","alt":" E. Let D0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a normed space, with ","element":"span"},{"style":{"height":17.82},"width":299.61,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-21.png","element":"img","alt":" D0 ⊂ D, and Dρ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a compact metric space, a subset of ","element":"span"},{"style":{"height":17.64},"width":508,"height":44.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-22.png","element":"img","alt":" Dφ. The map φ : Dφ �−→ E","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is called Hadamard-differentiable uniformly in ","element":"span"},{"style":{"height":17.42},"width":124.41,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-23.png","element":"img","alt":" ρ ∈ Dρ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"tangentially to ","element":"span"},{"style":{"height":14.62},"width":48.52,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-24.png","element":"img","alt":" D0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with derivative map ","element":"span"},{"style":{"height":20.32},"width":288.52,"height":50.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-25.png","element":"img","alt":" h �−→ φ′ρ(h), if","inline":true}],[{"style":{"width":"73%"},"width":1371,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-26.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for all convergent sequences ","element":"span"},{"style":{"height":17.82},"width":1308.95,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-27.png","element":"img","alt":" ρn → ρ in Dρ, tn → 0 in R, and hn → h ∈ D0 in D such that","inline":true},{"style":{"height":17.64},"width":583.27,"height":44.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-28.png","element":"img","alt":"ρn + tnhn ∈ Dφ for every n.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"As a part of the definition, we require that the derivative map ","element":"span"},{"style":{"height":20.32},"width":490.34,"height":50.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-29.png","element":"img","alt":"h �−→ φ′ρ(h) from D0 to E","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is linear for each ","element":"span"},{"style":{"height":17.42},"width":1017.16,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-30.png","element":"img","alt":" ρ ∈ Dρ. ■","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Comment B.2. ","element":"span"},{"text":"Note that the definition requires that the derivative map (","element":"span"},{"style":{"height":20.31},"width":409.85,"height":50.78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-31.png","element":"img","alt":"ρ, h) �−→ φ′ρ(h), map-","inline":true,"padRight":true},{"text":"ping ","element":"span"},{"style":{"height":17.42},"width":251.51,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-32.png","element":"img","alt":" Dρ × D0 to E","inline":true},{"text":", is continuous at each (","element":"span"},{"style":{"height":18.62},"width":1055.07,"height":46.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-33.png","element":"img","alt":"ρ, h) ∈ Dρ × D0. ■","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Comment B.3 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Important Details of the Definition","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"Definition ","element":"span"},{"href":"#id-91","text":"B.1 ","element":"a"},{"text":"is different from the definition of uniform differentiability given in ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996, ","element":"a"},{"text":"p. 379, eq. (3.9.12)), since our definition allows ","element":"span"},{"style":{"height":17.42},"width":48.52,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-34.png","element":"img","alt":" Dρ","inline":true,"padRight":true},{"text":"to be much smaller than ","element":"span"},{"style":{"height":17.24},"width":51.52,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-35.png","element":"img","alt":" Dφ","inline":true,"padRight":true},{"text":"and allows ","element":"span"},{"style":{"height":17.42},"width":48.52,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-36.png","element":"img","alt":" Dρ","inline":true,"padRight":true},{"text":"to be endowed with a much stronger metric than the metric induced by the norm of ","element":"span"},{"text":"D","element":"span"},{"text":". These differences are essential for infinite-dimensional applications. For example, the quantile/inverse map is uniformly Hadamard differentiable in the sense of Definition ","element":"span"},{"href":"#id-91","text":"B.1 ","element":"a"},{"text":"for a suitable choice of ","element":"span"},{"style":{"height":18.62},"width":632.5,"height":46.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-37.png","element":"img","alt":" Dρ: Let T = [ϵ, 1−ϵ], D = ℓ∞(T),","inline":true},{"style":{"height":18.62},"width":1078.63,"height":46.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-38.png","element":"img","alt":"Dφ= set of cadlag functions on T, D0 = UC(T), and Dρ","inline":true,"padRight":true},{"text":"be a compact subset of ","element":"span"},{"style":{"height":19.13},"width":101.21,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-39.png","element":"img","alt":" C1(T","inline":true},{"text":") such that each ","element":"span"},{"style":{"height":18.62},"width":1028.75,"height":46.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/39-40.png","element":"img","alt":" ρ ∈ Dρ obeys ∂ρ(t)/∂t ⩾ c > 0 on t ∈ T, where c","inline":true,"padRight":true},{"text":"is a positive constant. However, the quantile/inverse map is not Hadamard differentiable uniformly on ","element":"span"},{"style":{"height":17.82},"width":603.64,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-0.png","element":"img","alt":" Dρ if we set Dρ = Dφ and hence","inline":true,"padRight":true},{"text":"is not uniformly differentiable in the sense of the definition given in ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996) ","element":"a"},{"text":"which requires ","element":"span"},{"style":{"height":17.42},"width":165.41,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-1.png","element":"img","alt":" Dρ = Dφ","inline":true},{"text":". It is important and practical to keep the distinction between ","element":"span"},{"style":{"height":17.42},"width":48.51,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-2.png","element":"img","alt":" Dρ","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":17.24},"width":51.52,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-3.png","element":"img","alt":" Dφ","inline":true,"padRight":true},{"text":"since the estimated values ","element":"span"},{"style":{"height":12},"width":24.8,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-4.png","element":"img","alt":" �ρ","inline":true,"padRight":true},{"text":"may well be outside ","element":"span"},{"style":{"height":17.42},"width":48.51,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-5.png","element":"img","alt":" Dρ","inline":true,"padRight":true},{"text":"unless explicitly imposed in estimation even though the population values of ","element":"span"},{"style":{"height":17.42},"width":207.4,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-6.png","element":"img","alt":" ρ are in Dρ","inline":true,"padRight":true},{"text":"by assumption. For example, the empirical cdf is in ","element":"span"},{"style":{"height":17.24},"width":51.52,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-7.png","element":"img","alt":" Dφ","inline":true},{"text":", but is outside ","element":"span"},{"style":{"height":17.42},"width":1455.45,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-8.png","element":"img","alt":" Dρ. ■","inline":true}],[{"id":"id-108","style":{"height":18.44},"width":1871.91,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-9.png","element":"img","alt":"Theorem B.3 (Functional delta-method uniformly in P ∈ P). Let φ : Dφ ⊂ D �−→ E","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be Hadamard-differentiable uniformly in ","element":"span"},{"style":{"height":17.42},"width":282.1,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-10.png","element":"img","alt":" ρ ∈ Dρ ⊂ Dφ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"tangentially to ","element":"span"},{"style":{"height":14.62},"width":48.51,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-11.png","element":"img","alt":" D0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with derivative map ","element":"span"},{"style":{"height":19.52},"width":255.28,"height":48.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-12.png","element":"img","alt":"φ′ρ. Let �ρn,P","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a sequence of stochastic processes taking values in ","element":"span"},{"style":{"height":17.24},"width":51.52,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-13.png","element":"img","alt":" Dφ","inline":true},{"style":{"fontStyle":"italic"},"text":", where each ","element":"span"},{"style":{"height":17.1},"width":202.76,"height":42.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-14.png","element":"img","alt":" �ρn,P is an","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"estimator of the parameter ","element":"span"},{"style":{"height":17.42},"width":161.44,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-15.png","element":"img","alt":" ρP ∈ Dρ","inline":true},{"style":{"fontStyle":"italic"},"text":". Suppose there exists a sequence of constants ","element":"span"},{"style":{"height":15.02},"width":266.02,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-16.png","element":"img","alt":" rn → ∞ such","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"that ","element":"span"},{"style":{"height":18.3},"width":664.97,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-17.png","element":"img","alt":" Zn,P = rn(�ρn,P − ρP ) ⇝ ZP in D","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":14.62},"width":149.24,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-18.png","element":"img","alt":" P ∈ Pn","inline":true},{"style":{"fontStyle":"italic"},"text":". The limit process ","element":"span"},{"style":{"height":14.7},"width":55.79,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-19.png","element":"img","alt":" ZP","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is separable and takes its values in ","element":"span"},{"style":{"height":17.02},"width":973.8,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-20.png","element":"img","alt":" D0 for all P ∈ P = ∪n⩾n0Pn, where n0 is fixed.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"Moreover, the set of stochastic processes ","element":"span"},{"style":{"height":17.6},"width":291.7,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-21.png","element":"img","alt":" {ZP : P ∈ P}","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is relatively compact in the topology of weak convergence in ","element":"span"},{"style":{"height":14.62},"width":48.52,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-22.png","element":"img","alt":"D0","inline":true},{"style":{"fontStyle":"italic"},"text":", that is, every sequence in this set can be split into weakly convergent subsequences. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Then, ","element":"span"},{"style":{"height":20.32},"width":737.86,"height":50.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-23.png","element":"img","alt":"rn (φ(�ρn,P ) − φ(ρP )) ⇝ φ′ρP (ZP ) in E","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":20.32},"width":563.62,"height":50.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-24.png","element":"img","alt":" P ∈ Pn. If (ρ, h) �−→ φ′ρ(h)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is defined and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"continuous on the whole of ","element":"span"},{"style":{"height":17.42},"width":138.66,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-25.png","element":"img","alt":" Dρ × D","inline":true},{"style":{"fontStyle":"italic"},"text":", then the sequence ","element":"span"},{"style":{"height":20.31},"width":815.93,"height":50.78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-26.png","element":"img","alt":" rn (φ(�ρn,P ) − φ(ρP )) − φ′ρP (rn(�ρn,P − ρP ))","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"converges to zero in outer probability uniformly in ","element":"span"},{"style":{"height":14.62},"width":138.76,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-27.png","element":"img","alt":" P ∈ Pn","inline":true},{"style":{"fontStyle":"italic"},"text":". Moreover, the set of stochastic processes ","element":"span"},{"style":{"height":20.32},"width":363.32,"height":50.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-28.png","element":"img","alt":"{φ′ρP (ZP ) : P ∈ P}","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is relatively compact in the topology of weak convergence in ","element":"span"},{"text":"E","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"text":"The following result on the functional delta method applies to any bootstrap or other simulation method obeying certain conditions. Such methods include the multiplier bootstrap as a special case. ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":18.3},"width":351.82,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-29.png","element":"img","alt":" Dn,P = (Wi,P )ni=1 ","inline":true,"padRight":true},{"text":"denote the data vector and ","element":"span"},{"style":{"height":18.09},"width":255.5,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-30.png","element":"img","alt":" Bn = (ξi)ni=1 ","inline":true,"padRight":true},{"text":"be a vector of random ","element":"span"},{"text":"variables used to generate bootstrap or simulation draws (the specifics may vary depending on the particular method employed). Consider sequences of stochastic processes ","element":"span"},{"style":{"height":18.3},"width":366.68,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-31.png","element":"img","alt":" �ρn,P = �ρn,P (Dn,P ),","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"height":18.3},"width":531.1,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-32.png","element":"img","alt":" Zn,P = rn(�ρn,P − ρP ) ⇝ ZP","inline":true,"padRight":true},{"text":"in the normed space ","element":"span"},{"text":"D ","element":"span"},{"text":"uniformly in ","element":"span"},{"style":{"height":14.62},"width":138.93,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-33.png","element":"img","alt":" P ∈ Pn","inline":true},{"text":". Also consider the bootstrap stochastic process ","element":"span"},{"style":{"height":20.77},"width":780.11,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-34.png","element":"img","alt":" Z∗n,P = Zn,P (Dn,P , Bn) in D, where Zn,P","inline":true,"padRight":true},{"text":"is a measurable function of ","element":"span"},{"style":{"height":14.62},"width":54.1,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-35.png","element":"img","alt":"Bn","inline":true,"padRight":true},{"text":"for each value of ","element":"span"},{"style":{"height":14.62},"width":57.13,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-36.png","element":"img","alt":" Dn","inline":true},{"text":". Suppose that ","element":"span"},{"style":{"height":19.91},"width":85.76,"height":49.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-37.png","element":"img","alt":" Z∗n,P ","inline":true,"padRight":true},{"text":"converges conditionally given ","element":"span"},{"style":{"height":14.62},"width":57.13,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-38.png","element":"img","alt":" Dn","inline":true,"padRight":true},{"text":"in distribution to ","element":"span"},{"style":{"height":14.7},"width":55.79,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-39.png","element":"img","alt":" ZP","inline":true,"padRight":true},{"text":"uniformly in ","element":"span"},{"style":{"height":14.62},"width":138.76,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-40.png","element":"img","alt":" P ∈ Pn","inline":true},{"text":", namely that","element":"span"}],[{"style":{"width":"44%"},"width":833,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-41.png","element":"img"}],[{"text":"uniformly in ","element":"span"},{"style":{"height":16.7},"width":368.22,"height":41.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-42.png","element":"img","alt":" P ∈ Pn, where EBn","inline":true,"padRight":true},{"text":"denotes the expectation computed with respect to the law of ","element":"span"},{"style":{"height":14.62},"width":54.1,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-43.png","element":"img","alt":" Bn","inline":true,"padRight":true},{"text":"holding the data ","element":"span"},{"style":{"height":17.1},"width":92.09,"height":42.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-44.png","element":"img","alt":" Dn,P","inline":true,"padRight":true},{"text":"fixed. This is denoted as “","element":"span"},{"style":{"height":19.91},"width":244.46,"height":49.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-45.png","element":"img","alt":"Z∗n,P ⇝B ZP","inline":true,"padRight":true},{"text":"uniformly in ","element":"span"},{"style":{"height":14.62},"width":142.52,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-46.png","element":"img","alt":" P ∈ Pn","inline":true},{"text":".” Finally, let ","element":"span"},{"style":{"height":20.77},"width":423.33,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-47.png","element":"img","alt":"�ρ∗n,P = �ρn,P + Z∗n,P /rn","inline":true,"padRight":true},{"text":"denote the bootstrap or simulation draw of ","element":"span"},{"style":{"height":13.1},"width":92.7,"height":32.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-48.png","element":"img","alt":" �ρn,P .","inline":true}],[{"id":"id-109","style":{"fontWeight":"bold"},"text":"Theorem B.4 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Uniform in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"style":{"fontWeight":"bold"},"text":"functional delta-method for bootstrap and other simulation methods","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Assume the conditions of Theorem ","element":"span"},{"href":"#id-108","style":{"fontStyle":"italic"},"text":"B.3 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold. Let ","element":"span"},{"style":{"height":19.97},"width":504.73,"height":49.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-49.png","element":"img","alt":" �ρn,P and �ρ∗n,P be maps as","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"indicated previously taking values in ","element":"span"},{"style":{"height":20.77},"width":1173.38,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-50.png","element":"img","alt":" Dφ such that rn(�ρn,P − ρP ) ⇝ ZP and rn(�ρ∗n,P − �ρn,P ) ⇝B ZP","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"in ","element":"span"},{"text":"D ","element":"span"},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":20.77},"width":1261.45,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-51.png","element":"img","alt":" P ∈ Pn. Then, X∗n,P = rn(φ(�ρ∗n,P ) − φ(�ρn,P )) ⇝B XP = φ′ρP (ZP )","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":14.62},"width":153.3,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-52.png","element":"img","alt":"P ∈ Pn.","inline":true}],[{"text":"B.4. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-131","style":{"fontWeight":"bold"},"text":"B.1. ","element":"a"},{"text":"Part (a) and (b) are a direct consequence of Lemma ","element":"span"},{"href":"#id-132","text":"M.1. ","element":"a"},{"text":"In particular, Lemma ","element":"span"},{"href":"#id-132","text":"M.1(","element":"a"},{"text":"a) implies stochastic equicontinuity under arbitrary subsequences ","element":"span"},{"style":{"height":15.2},"width":149.96,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/40-53.png","element":"img","alt":" Pn ∈ P,","inline":true,"padRight":true},{"text":"which implies part (b). Part (a) follows from Lemma ","element":"span"},{"href":"#id-132","text":"M.1(","element":"a"},{"text":"b) by splitting an arbitrary sequence ","element":"span"},{"style":{"height":12.8},"width":113.98,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-0.png","element":"img","alt":"n ∈ N","inline":true,"padRight":true},{"text":"into subsequences ","element":"span"},{"style":{"height":12.8},"width":129.48,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-1.png","element":"img","alt":" n ∈ N′ ","inline":true,"padRight":true},{"text":"along each of which the covariance function (","element":"span"},{"style":{"height":17.6},"width":381.35,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-2.png","element":"img","alt":"t, s) �−→ cPn(t, s) :=","inline":true},{"style":{"height":17.5},"width":551.1,"height":43.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-3.png","element":"img","alt":"Pnfs,Pnft,Pn − Pnfs,PnPnft,Pn","inline":true,"padRight":true},{"text":"converges uniformly and therefore also pointwise to a uniformly continuous function on (","element":"span"},{"style":{"height":15.6},"width":95.24,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-4.png","element":"img","alt":"T, dT","inline":true,"padRight":true},{"text":"). This convergence is possible because ","element":"span"},{"style":{"height":17.6},"width":589.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-5.png","element":"img","alt":" {(t, s) �−→ cP (t, s) : P ∈ P} is","inline":true,"padRight":true},{"text":"a relatively compact set in ","element":"span"},{"style":{"height":17.6},"width":189.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-6.png","element":"img","alt":" ℓ∞(T × T","inline":true},{"text":") in view of the Arzela-Ascoli Theorem, the assumptions in equation ","element":"span"},{"href":"#id-133","text":"(B.1)","element":"a"},{"text":", and total boundedness of (","element":"span"},{"style":{"height":15.6},"width":95.24,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-7.png","element":"img","alt":"T, dT","inline":true,"padRight":true},{"text":"). By Lemma ","element":"span"},{"href":"#id-132","text":"M.1(","element":"a"},{"text":"b) pointwise convergence of the covariance function implies weak convergence to a tight Gaussian process which may depend on the identity ","element":"span"},{"style":{"height":12.4},"width":47.52,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-8.png","element":"img","alt":" N′ ","inline":true,"padRight":true},{"text":"of the subsequence. Since this argument applies to each such subsequence that split the overall sequence, part (b) follows.","element":"span"}],[{"text":"Part (c) is immediate from the imposed uniform covering entropy condition and Dudley’s metric entropy inequality for expectations of suprema of Gaussian processes (Corollary 2.2.8 in ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996)","element":"a"},{"text":"). ","element":"span"},{"text":"Claim (d) follows from claim (c) and a standard argument, based on the application of the Borel-Cantelli lemma. ","element":"span"},{"text":"Indeed, let ","element":"span"},{"style":{"height":12.8},"width":142.97,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-9.png","element":"img","alt":" m ∈ N","inline":true,"padRight":true},{"text":"be a sequence and ","element":"span"},{"style":{"height":15.02},"width":119.26,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-10.png","element":"img","alt":" δm :=","inline":true,"padRight":true},{"text":"2","element":"span"},{"style":{"height":32},"width":1271.51,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-11.png","element":"img","alt":"−m∧ sup�δ > 0 : supP∈P EP supdT (t,¯t)⩽δ |ZP (t) − ZP (¯t)| < 2−2m�,","inline":true,"padRight":true},{"text":"then by the Markov inequality P","element":"span"},{"style":{"height":31.6},"width":1151.27,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-12.png","element":"img","alt":"P�supdT (t,¯t)⩽δm |ZP (t) − ZP (¯t)| > 2−m�⩽ 2−2m+m = 2−m.","inline":true,"padRight":true},{"text":"This sums to a finite number over ","element":"span"},{"style":{"height":12.8},"width":125.16,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-13.png","element":"img","alt":"m ∈ N","inline":true},{"text":". Hence, by the Borel-Cantelli lemma, for almost all states ","element":"span"},{"style":{"height":18.02},"width":608.02,"height":45.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-14.png","element":"img","alt":" ω ∈ Ω, |ZP (t)(ω) − ZP (¯t)(ω)| ⩽","inline":true,"padRight":true},{"text":"2","element":"span"},{"style":{"height":18.02},"width":781.05,"height":45.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-15.png","element":"img","alt":"−m for all dT (t, ¯t) ⩽ δm ⩽ 2−m and all m","inline":true,"padRight":true},{"text":"sufficiently large. Hence claim (d) follows. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-16.png","element":"img","alt":"■","inline":true}],[{"text":"B.5. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-134","style":{"fontWeight":"bold"},"text":"B.2. ","element":"a"},{"text":"Claim (a) is verified by invoking Theorem ","element":"span"},{"href":"#id-131","text":"B.1. ","element":"a"},{"text":"We begin by showing that ","element":"span"},{"style":{"height":18.37},"width":369.89,"height":45.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-17.png","element":"img","alt":" Z∗P = (GP ξft,P )t∈T","inline":true,"padRight":true},{"text":"is equal in distribution to ","element":"span"},{"style":{"height":18.3},"width":348.79,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-18.png","element":"img","alt":" ZP = (GP ft,P )t∈T","inline":true,"padRight":true},{"text":", in particular, ","element":"span"},{"style":{"height":17.51},"width":55.79,"height":43.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-19.png","element":"img","alt":" Z∗P","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":14.7},"width":55.79,"height":36.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-20.png","element":"img","alt":" ZP","inline":true,"padRight":true},{"text":"share identical mean and covariance function, and thus they share the continuity properties established in Theorem ","element":"span"},{"href":"#id-131","text":"B.1. ","element":"a"},{"text":"This claim is immediate from the fact that multiplication by ","element":"span"},{"style":{"height":16.4},"width":165.46,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-21.png","element":"img","alt":" ξ of each","inline":true},{"style":{"height":18.3},"width":462.88,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-22.png","element":"img","alt":"f ∈ FP = {ft,P : t ∈ T}","inline":true,"padRight":true},{"text":"yields a set ","element":"span"},{"style":{"height":16.4},"width":78.46,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-23.png","element":"img","alt":" ξFP","inline":true,"padRight":true},{"text":"of measurable functions ","element":"span"},{"style":{"height":17.6},"width":598.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-24.png","element":"img","alt":" ξf : (w, ξ) �−→ ξf(w), mapping","inline":true},{"style":{"height":12.8},"width":233.35,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-25.png","element":"img","alt":"W × R to R","inline":true},{"text":". Each such function has mean zero under ","element":"span"},{"style":{"height":19.6},"width":813.65,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-26.png","element":"img","alt":" P × Pξ, i.e. �sf(w)dPξ(s)dP(w) = 0, and","inline":true,"padRight":true},{"text":"covariance function (","element":"span"},{"style":{"height":20.6},"width":481.08,"height":51.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-27.png","element":"img","alt":"ξf, ξ ˜f) �−→ Pf ˜f −PfP ˜f","inline":true},{"text":". Hence the Gaussian process (","element":"span"},{"style":{"height":18.44},"width":415.08,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-28.png","element":"img","alt":"GP (ξf))ξf∈ξFP shares","inline":true,"padRight":true},{"text":"the zero mean and the covariance function of (","element":"span"},{"style":{"height":18.44},"width":244.14,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-29.png","element":"img","alt":"GP (f))f∈FP .","inline":true}],[{"text":"We are claiming that ","element":"span"},{"style":{"height":20.77},"width":395.16,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-30.png","element":"img","alt":" Z∗n,P ⇝ Z∗P in ℓ∞(T","inline":true},{"text":") uniformly in ","element":"span"},{"style":{"height":20.77},"width":711.64,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-31.png","element":"img","alt":" P ∈ P, where Z∗n,P := (Gnξft,P )t∈T .","inline":true,"padRight":true},{"text":"We note that the function class ","element":"span"},{"style":{"height":15.1},"width":57.36,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-32.png","element":"img","alt":" FP","inline":true,"padRight":true},{"text":"and the corresponding envelope ","element":"span"},{"style":{"height":14.7},"width":54.06,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-33.png","element":"img","alt":" FP","inline":true,"padRight":true},{"text":"satisfy the conditions of Theorem ","element":"span"},{"href":"#id-131","text":"B.1. ","element":"a"},{"text":"The same is also true for the function class ","element":"span"},{"style":{"height":16.4},"width":78.46,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-34.png","element":"img","alt":" ξFP","inline":true,"padRight":true},{"text":"defined by (","element":"span"},{"style":{"height":17.6},"width":456.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-35.png","element":"img","alt":"w, ξ) �−→ ξfP (w), which","inline":true,"padRight":true},{"text":"maps ","element":"span"},{"style":{"height":12.8},"width":223.83,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-36.png","element":"img","alt":" W × R to R","inline":true,"padRight":true},{"text":"and its envelope ","element":"span"},{"style":{"height":17.6},"width":252.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-37.png","element":"img","alt":" |ξ|FP , since ξ","inline":true,"padRight":true},{"text":"is independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W","element":"span"},{"text":". Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"now denote a finitely discrete measure over ","element":"span"},{"style":{"height":12.8},"width":124.91,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-38.png","element":"img","alt":" W ×R","inline":true},{"text":". By Lemma ","element":"span"},{"href":"#id-135","text":"K.1 ","element":"a"},{"text":"multiplication by ","element":"span"},{"style":{"height":16.4},"width":20,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-39.png","element":"img","alt":" ξ","inline":true,"padRight":true},{"text":"does not change qualitatively the uniform covering entropy bound:","element":"span"}],[{"style":{"width":"76%"},"width":1423,"height":84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-40.png","element":"img"}],[{"text":"Moreover, multiplication by ","element":"span"},{"style":{"height":16.4},"width":20,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-41.png","element":"img","alt":" ξ","inline":true,"padRight":true},{"text":"does not affect the norms, ","element":"span"},{"style":{"height":20.15},"width":761.26,"height":50.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-42.png","element":"img","alt":" ∥ξfP (W)∥P×Pξ,2 = ∥fP (W)∥P,2, since ξ","inline":true,"padRight":true},{"text":"is independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"by construction and E","element":"span"},{"style":{"height":18.73},"width":597.73,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/41-43.png","element":"img","alt":"ξ2 = 1. The claim then follows.","inline":true}],[{"text":"Claim (b). For each ","element":"span"},{"style":{"height":17.6},"width":480.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-0.png","element":"img","alt":" δ > 0 and t ∈ T, let πδ(t","inline":true},{"text":") denote a closest element in a given, finite ","element":"span"},{"style":{"height":12.8},"width":96.21,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-1.png","element":"img","alt":" δ-net","inline":true,"padRight":true},{"text":"over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":". We begin by noting that","element":"span"}],[{"style":{"width":"102%"},"width":1916,"height":613,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-2.png","element":"img"}],[{"text":"and the second assertion holds by Theorem ","element":"span"},{"href":"#id-131","text":"B.1 ","element":"a"},{"text":"(c).","element":"span"}],[{"text":"Second, E","element":"span"},{"style":{"height":31.6},"width":1669.5,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-3.png","element":"img","alt":"∗P IIIP ⩽ E∗P�supdT (t,¯t)⩽δ |Z∗n,P (t) − Z∗n,P (¯t)| ∧ 2�=: µ∗P (δ) and limn→∞ supP∈P |µ∗P (δ)−","inline":true},{"style":{"height":17.6},"width":213.57,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-4.png","element":"img","alt":"µP (δ)| = 0.","inline":true,"padRight":true},{"text":"The first assertion follows because E","element":"span"},{"style":{"height":17.51},"width":118.57,"height":43.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-5.png","element":"img","alt":"∗P IIIP","inline":true,"padRight":true},{"text":"is bounded by","element":"span"}],[{"style":{"width":"92%"},"width":1731,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-6.png","element":"img"}],[{"text":"The second assertion holds by part (a) of the present theorem.","element":"span"}],[{"text":"Define ","element":"span"},{"style":{"height":18.19},"width":484.05,"height":45.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-7.png","element":"img","alt":" ϵ(δ) := δ ∨ supP∈P µP (δ).","inline":true,"padRight":true},{"text":"Then, by Markov’s inequality, followed by ","element":"span"},{"style":{"height":12.4},"width":149.7,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-8.png","element":"img","alt":" n → ∞,","inline":true}],[{"style":{"width":"84%"},"width":1585,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-9.png","element":"img"}],[{"text":"Finally, by Lemma ","element":"span"},{"href":"#id-136","text":"B.1, ","element":"a"},{"text":"for each ","element":"span"},{"style":{"height":10.4},"width":66.46,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-10.png","element":"img","alt":" ε >","inline":true,"padRight":true},{"text":"0, lim sup","element":"span"},{"style":{"height":18.37},"width":587.21,"height":45.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-11.png","element":"img","alt":"n→∞ supP∈P P∗P (IIP > ε) = 0.","inline":true}],[{"text":"We can now conclude. Note that ","element":"span"},{"style":{"height":17.6},"width":291.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-12.png","element":"img","alt":" ϵ(δ) ↘ 0 if δ ↘","inline":true,"padRight":true},{"text":"0, which holds by the definition of ","element":"span"},{"style":{"height":17.6},"width":230.42,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-13.png","element":"img","alt":" ϵ(δ) and the","inline":true,"padRight":true},{"text":"property sup","element":"span"},{"style":{"height":18.19},"width":426.95,"height":45.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-14.png","element":"img","alt":"P∈P µP (δ) ↘ 0 if δ ↘","inline":true,"padRight":true},{"text":"0 noted above. Hence for each ","element":"span"},{"style":{"height":10.4},"width":69.26,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-15.png","element":"img","alt":" ε >","inline":true,"padRight":true},{"text":"0 and all 0 ","element":"span"},{"style":{"height":15.02},"width":270.51,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-16.png","element":"img","alt":" < δ < δε such","inline":true,"padRight":true},{"text":"that 3","element":"span"},{"style":{"height":20.8},"width":206.87,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-17.png","element":"img","alt":"�ϵ(δ) < ε,","inline":true}],[{"style":{"width":"83%"},"width":1560,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-18.png","element":"img"}],[{"text":"Sending ","element":"span"},{"style":{"height":16.8},"width":77.17,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-19.png","element":"img","alt":" δ ↘","inline":true,"padRight":true},{"text":"0 gives the result. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-20.png","element":"img","alt":"■","inline":true}],[{"text":"B.6. ","element":"span"},{"style":{"height":19.13},"width":1773.18,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-21.png","element":"img","alt":" Auxiliary Result: Conditional Multiplier CLT in Rd uniformly in P ∈ P. We rely","inline":true,"padRight":true},{"text":"on the following lemma, which is apparently new. An analogous result can be derived for almost sure convergence from well-known non-uniform multiplier central limit theorems, but this strategy requires us to put all the variables indexed by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"on the single underlying probability space, which is much less convenient in applications.","element":"span"}],[{"id":"id-136","style":{"fontWeight":"bold"},"text":"Lemma B.1 ","element":"span"},{"text":"(Conditional Multiplier Central Limit Theorem in ","element":"span"},{"style":{"height":15.13},"width":49.51,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-22.png","element":"img","alt":" Rd ","inline":true,"padRight":true},{"text":"uniformly in ","element":"span"},{"style":{"height":17.6},"width":256.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-23.png","element":"img","alt":" P ∈ P). Let","inline":true,"padRight":true},{"text":"(","element":"span"},{"style":{"height":18.3},"width":323.2,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-24.png","element":"img","alt":"Zi,P )∞i=1 be i.i.d.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"random vectors on ","element":"span"},{"style":{"height":15.13},"width":49.52,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-25.png","element":"img","alt":" Rd","inline":true},{"style":{"fontStyle":"italic"},"text":", indexed by a parameter ","element":"span"},{"style":{"height":12.8},"width":156.95,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-26.png","element":"img","alt":" P ∈ P.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"The parameter ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"style":{"fontStyle":"italic"},"text":"represents probability laws on ","element":"span"},{"style":{"height":15.93},"width":405.04,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-27.png","element":"img","alt":" Rd. For each P ∈ P","inline":true},{"style":{"fontStyle":"italic"},"text":", these vectors are assumed to be independent of the i.i.d. sequence ","element":"span"},{"text":"(","element":"span"},{"style":{"height":19.62},"width":636.09,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-28.png","element":"img","alt":"ξi)∞i=1 with Eξ = 0 and Eξ2 = 1","inline":true},{"style":{"fontStyle":"italic"},"text":". There exist constants ","element":"span"},{"text":"2 ","element":"span"},{"style":{"height":16},"width":278.22,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/42-29.png","element":"img","alt":" < q < ∞ and","inline":true,"padRight":true},{"text":"0 ","element":"span"},{"style":{"height":12.8},"width":199.61,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-0.png","element":"img","alt":" < M < ∞","inline":true},{"style":{"fontStyle":"italic"},"text":", such that ","element":"span"},{"text":"E","element":"span"},{"style":{"height":21.04},"width":694.44,"height":52.59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-1.png","element":"img","alt":"P Z1,P = 0 and (EP ∥Z1,P ∥q)1/q ⩽ M","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"uniformly for all ","element":"span"},{"style":{"height":12.8},"width":123.25,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-2.png","element":"img","alt":" P ∈ P","inline":true},{"style":{"fontStyle":"italic"},"text":". Then, for every ","element":"span"},{"style":{"height":12.4},"width":100.53,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-3.png","element":"img","alt":" ε > 0","inline":true}],[{"style":{"width":"88%"},"width":1666,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-4.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"text":"E","element":"span"},{"style":{"height":10.39},"width":43.55,"height":25.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-5.png","element":"img","alt":"Bn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denotes the expectation over ","element":"span"},{"text":"(","element":"span"},{"style":{"height":18.3},"width":556.84,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-6.png","element":"img","alt":"ξi)ni=1 holding (Zi,P )ni=1 fixed.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-136","style":{"fontWeight":"bold"},"text":"B.1. ","element":"a"},{"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":"be random variables in ","element":"span"},{"style":{"height":15.13},"width":49.52,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-7.png","element":"img","alt":" Rd","inline":true},{"text":", then define ","element":"span"},{"style":{"height":17.6},"width":264.01,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-8.png","element":"img","alt":" dBL(X, Y ) :=","inline":true,"padRight":true},{"text":"sup","element":"span"},{"style":{"height":20.66},"width":532.82,"height":51.65,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-9.png","element":"img","alt":"h∈BL1(Rd) |Eh(X) − Eh(Y )|.","inline":true,"padRight":true},{"text":"It suffices to show that for any sequence ","element":"span"},{"style":{"height":15.02},"width":401.19,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-10.png","element":"img","alt":" Pn ∈ P and N∗ ∼","inline":true},{"style":{"height":31.6},"width":1277.2,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-11.png","element":"img","alt":"n−1/2 �ni=1 ξiZi,Pn | (Zi,Pn)ni=1, dBL�N∗, N(0, EPnZ1,PnZ′1,Pn)�→","inline":true,"padRight":true},{"text":"0 in probability (under P","element":"span"},{"style":{"height":17.6},"width":72.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-12.png","element":"img","alt":"Pn).","inline":true}],[{"text":"Following ","element":"span"},{"href":"#id-137","referenceIndex":23,"text":"Bickel and Freedman ","element":"a"},{"href":"#id-137","referenceIndex":23,"text":"(1981)","element":"a"},{"text":", we shall rely on the Mallow’s metric, written ","element":"span"},{"style":{"height":15.6},"width":231.71,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-13.png","element":"img","alt":" mr, which is","inline":true,"padRight":true},{"text":"a metric on the space of distribution functions on ","element":"span"},{"style":{"height":15.13},"width":49.52,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-14.png","element":"img","alt":" Rd","inline":true},{"text":". For our purposes it suffices to recall that given a sequence of distribution functions ","element":"span"},{"style":{"height":17.6},"width":92.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-15.png","element":"img","alt":" {Fk}","inline":true,"padRight":true},{"text":"and a distribution function ","element":"span"},{"style":{"height":17.6},"width":308.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-16.png","element":"img","alt":" F, mr(Fk, F) →","inline":true,"padRight":true},{"text":"0 if and only if","element":"span"},{"style":{"height":19.6},"width":313.36,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-17.png","element":"img","alt":"�gdFk →�gdF","inline":true,"padRight":true},{"text":"for each continuous and bounded ","element":"span"},{"style":{"height":20.67},"width":876.76,"height":51.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-18.png","element":"img","alt":" g : Rd → R, and�∥z∥rdFk(z) →�∥z∥rdF(z).","inline":true,"padRight":true},{"text":"See ","element":"span"},{"href":"#id-137","referenceIndex":23,"text":"Bickel and Freedman ","element":"a"},{"href":"#id-137","referenceIndex":23,"text":"(1981) ","element":"a"},{"text":"for the definition of ","element":"span"},{"style":{"height":10.62},"width":68.53,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-19.png","element":"img","alt":" mr.","inline":true}],[{"text":"Under the assumptions of the lemma, we can split the sequence ","element":"span"},{"style":{"height":12.8},"width":137.75,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-20.png","element":"img","alt":" n ∈ N","inline":true,"padRight":true},{"text":"into subsequences ","element":"span"},{"style":{"height":12.8},"width":147.98,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-21.png","element":"img","alt":"n ∈ N′","inline":true},{"text":", along each of which the distribution function of ","element":"span"},{"style":{"height":17.1},"width":95.61,"height":42.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-22.png","element":"img","alt":" Z1,Pn","inline":true,"padRight":true},{"text":"converges to some distribution function ","element":"span"},{"style":{"height":12},"width":50.12,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-23.png","element":"img","alt":" F ′ ","inline":true,"padRight":true},{"text":"with respect to the Mallow’s metric ","element":"span"},{"style":{"height":10.62},"width":53.31,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-24.png","element":"img","alt":" mr","inline":true},{"text":", for some 2 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"< r < q","element":"span"},{"text":". This also implies that ","element":"span"},{"style":{"height":20.77},"width":367.17,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-25.png","element":"img","alt":"N(0, EPnZ1,PnZ′1,Pn","inline":true},{"text":") converges weakly to a normal limit ","element":"span"},{"style":{"height":19.6},"width":576.34,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-26.png","element":"img","alt":" N(0, Q′) with Q′ =�zz′dF ′(z","inline":true},{"text":") such that ","element":"span"},{"style":{"height":17.6},"width":529.78,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-27.png","element":"img","alt":"∥Q′∥ ⩽ M. Both Q′ and F ′ ","inline":true,"padRight":true},{"text":"can depend on the subsequence ","element":"span"},{"style":{"height":12.4},"width":54.7,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-28.png","element":"img","alt":" N′.","inline":true}],[{"text":"Let ","element":"span"},{"style":{"height":14.84},"width":46.06,"height":37.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-29.png","element":"img","alt":" Fk","inline":true,"padRight":true},{"text":"be the empirical distribution function of a sequence (","element":"span"},{"style":{"height":20.02},"width":105.65,"height":50.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-30.png","element":"img","alt":"zi)ki=1 ","inline":true,"padRight":true},{"text":"of constant vectors in ","element":"span"},{"style":{"height":18.33},"width":62.94,"height":45.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-31.png","element":"img","alt":" Rd,","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"height":13.2},"width":113.09,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-32.png","element":"img","alt":" k ∈ N","inline":true},{"text":". The law of ","element":"span"},{"style":{"height":24.26},"width":424.19,"height":60.66,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-33.png","element":"img","alt":" N∗Fk = k−1/2 �ki=1 ξizi","inline":true,"padRight":true},{"text":"is completely determined by ","element":"span"},{"style":{"height":14.84},"width":46.06,"height":37.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-34.png","element":"img","alt":" Fk","inline":true,"padRight":true},{"text":"and the law of ","element":"span"},{"style":{"height":16.4},"width":20,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-35.png","element":"img","alt":" ξ","inline":true,"padRight":true},{"text":"(the latter is fixed, so it does not enter as the subscript in the definition of ","element":"span"},{"style":{"height":20.62},"width":459.99,"height":51.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-36.png","element":"img","alt":" N∗Fk). If mr(Fk, F ′) → 0","inline":true,"padRight":true},{"text":"as ","element":"span"},{"style":{"height":20.62},"width":683.41,"height":51.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-37.png","element":"img","alt":" k → ∞, then dBL(N∗Fk, N(0, Q′)) →","inline":true,"padRight":true},{"text":"0 by Lindeberg’s central limit theorem.","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":14.62},"width":47.67,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-38.png","element":"img","alt":" Fn","inline":true,"padRight":true},{"text":"denote the empirical distribution function of (","element":"span"},{"style":{"height":43.46},"width":1872.06,"height":108.66,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-39.png","element":"img","alt":"Zi,Pn)ni=1. Note that N∗ = N∗Fn ∼n−1/2 �ni=1 ξiZi,Pn | (Zi,Pn)ni=1","inline":true},{"text":". By the law of large numbers for arrays, ","element":"span"},{"style":{"height":19.6},"width":431.31,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-40.png","element":"img","alt":"�gdFn → �gdF ′ and","inline":true},{"style":{"height":19.6},"width":543.94,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-41.png","element":"img","alt":"�∥z∥rdFn(z) →�∥z∥rdF ′(z","inline":true},{"text":") in probability along the subsequence ","element":"span"},{"style":{"height":17.6},"width":581.34,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-42.png","element":"img","alt":" n ∈ N′. Hence mr(Fn, F ′) → 0","inline":true,"padRight":true},{"text":"in probability along the same subsequence. We can conclude that ","element":"span"},{"style":{"height":20.04},"width":421.9,"height":50.09,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-43.png","element":"img","alt":" dBL(N∗Fn, N(0, Q′)) →","inline":true,"padRight":true},{"text":"0 in prob- ","element":"span"},{"text":"ability along the same subsequence by the extended continuous mapping theorem ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"(van der Vaart ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"and Wellner, ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"1996, ","element":"a"},{"text":"Theorem 1.11.1).","element":"span"}],[{"text":"The argument applies to every subsequence ","element":"span"},{"style":{"height":12.4},"width":47.52,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-44.png","element":"img","alt":" N′ ","inline":true,"padRight":true},{"text":"of the stated form. The claim in the first paragraph of the proof thus follows. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-45.png","element":"img","alt":"■","inline":true}],[{"text":"B.7. ","element":"span"},{"style":{"height":18.09},"width":1501.96,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-46.png","element":"img","alt":" Donsker Theorems for Function Classes that depend on n. Let (Wi)∞i=1 ","inline":true,"padRight":true},{"text":"be a sequence ","element":"span"},{"text":"of i.i.d. ","element":"span"},{"text":"copies of the random element ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"taking values in the measure space (","element":"span"},{"style":{"height":17.6},"width":304.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-47.png","element":"img","alt":"W, AW), whose","inline":true,"padRight":true},{"text":"law is determined by the probability measure ","element":"span"},{"style":{"height":18.22},"width":468.6,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-48.png","element":"img","alt":" P, and let w �−→ fn,t(w","inline":true},{"text":") be measurable functions ","element":"span"},{"style":{"height":17.42},"width":254.76,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-49.png","element":"img","alt":"fn,t : W → R","inline":true,"padRight":true},{"text":"indexed by ","element":"span"},{"style":{"height":12.8},"width":114.66,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-50.png","element":"img","alt":" n ∈ N","inline":true,"padRight":true},{"text":"and a fixed, totally bounded semi-metric space (","element":"span"},{"style":{"height":15.6},"width":95.24,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-51.png","element":"img","alt":"T, dT","inline":true,"padRight":true},{"text":"). Consider the stochastic process","element":"span"}],[{"style":{"width":"50%"},"width":948,"height":139,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/43-52.png","element":"img"}],[{"text":"This empirical process is indexed by a class of functions ","element":"span"},{"style":{"height":18.22},"width":387.41,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-0.png","element":"img","alt":" Fn = {fn,t : t ∈ T}","inline":true,"padRight":true},{"text":"with a measurable envelope function ","element":"span"},{"style":{"height":14.62},"width":49.06,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-1.png","element":"img","alt":" Fn","inline":true},{"text":". It is important to note here that the dependence on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"allows us to have ","element":"span"},{"style":{"fontStyle":"italic"},"text":"the class itself ","element":"span"},{"text":"be possibly dependent on the law ","element":"span"},{"style":{"height":14.62},"width":62.56,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-2.png","element":"img","alt":" Pn.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Lemma B.2 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Donsker Theorem for Classes Changing with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Work with the set-up above. Suppose that for some fixed constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q > ","element":"span"},{"text":"2 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and every sequence ","element":"span"},{"style":{"height":16.8},"width":145.63,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-3.png","element":"img","alt":" δn ↘ 0:","inline":true}],[{"style":{"width":"99%"},"width":1869,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-4.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"(a) Then the empirical process ","element":"span"},{"text":"(","element":"span"},{"style":{"height":18.22},"width":197.82,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-5.png","element":"img","alt":"Gnfn,t)t∈T","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is asymptotically tight in ","element":"span"},{"style":{"height":17.6},"width":201.53,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-6.png","element":"img","alt":" ℓ∞(T) i.e.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"stochastically equicontinuous. (b) For any subsequence such that the covariance function ","element":"span"},{"style":{"height":17.42},"width":460.18,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-7.png","element":"img","alt":" Pnfn,sfn,t−Pnfn,sPnfn,t","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"converges pointwise on ","element":"span"},{"style":{"height":18.22},"width":342.17,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-8.png","element":"img","alt":" T ×T, (Gnfn,t)t∈T","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"converges in ","element":"span"},{"style":{"height":17.6},"width":119.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-9.png","element":"img","alt":" ℓ∞(T)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"to a Gaussian process with covariance function given by the limit of the covariance function along that subsequence.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Proof. ","element":"span"},{"text":"The use of Theorem 2.11.1 in ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996)","element":"a"},{"text":", which does allow for the probability space to depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", allows us to establish claim (a), by repeating the proof (verbatim) of Theorem 2.11.22 in ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996, ","element":"a"},{"text":"p. 220-221), except that the probability law is allowed to depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". ","element":"span"},{"text":"(For the sake of completeness, the Supplementary Appendix, provides the complete proof). The proof of claim (b) follows by a standard argument from the stochastic equicontinuity established in claim (a) and finite-dimensional convergence along the indicated subsequences. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-10.png","element":"img","alt":"■","inline":true}],[{"text":"B.8. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorems ","element":"span"},{"href":"#id-108","style":{"fontWeight":"bold"},"text":"B.3 ","element":"a"},{"style":{"fontWeight":"bold"},"text":"and ","element":"span"},{"href":"#id-109","style":{"fontWeight":"bold"},"text":"B.4. ","element":"a"},{"text":"The proof consists of two parts, each proving the corresponding theorem.","element":"span"}],[{"text":"Part 1. We can split ","element":"span"},{"text":"N ","element":"span"},{"text":"into subsequences ","element":"span"},{"style":{"height":17.6},"width":86.51,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-11.png","element":"img","alt":" {N′}","inline":true,"padRight":true},{"text":"along each of which ","element":"span"},{"style":{"height":17.1},"width":579.59,"height":42.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-12.png","element":"img","alt":" Zn,Pn ⇝ Z′ ∈ D0 in D, ρPn →","inline":true},{"style":{"height":18.62},"width":661.47,"height":46.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-13.png","element":"img","alt":"ρ′ in Dρ (n ∈ N′), where Z′ and ρ′ ","inline":true,"padRight":true},{"text":"can possibly depend on ","element":"span"},{"style":{"height":12.4},"width":47.52,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-14.png","element":"img","alt":" N′","inline":true},{"text":". It suffices to verify that for each ","element":"span"},{"style":{"height":12.4},"width":54.7,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-15.png","element":"img","alt":" N′:","inline":true}],[{"id":"id-138","style":{"width":"78%"},"width":1478,"height":209,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-16.png","element":"img"}],[{"text":"where the last two claims hold provided that (","element":"span"},{"style":{"height":20.32},"width":278.95,"height":50.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-17.png","element":"img","alt":"ρ, h) �−→ φ′ρ(h","inline":true},{"text":") is defined and continuous on the ","element":"span"},{"text":"whole of ","element":"span"},{"style":{"height":17.42},"width":136.19,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-18.png","element":"img","alt":" Dρ × D","inline":true},{"text":". The claim ","element":"span"},{"href":"#id-138","text":"(B.5) ","element":"a"},{"text":"is not needed in Part 1, but we need it for Part 2.","element":"span"}],[{"text":"The map ","element":"span"},{"style":{"height":19.98},"width":1630.64,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-19.png","element":"img","alt":" gn(h) = rn(φ(ρPn + r−1n h) − φ(ρPn)), from Dn = {h ∈ D : ρPn + r−1n h ∈ Dφ} to E,","inline":true,"padRight":true},{"text":"satisfies ","element":"span"},{"style":{"height":21.4},"width":299.77,"height":53.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-20.png","element":"img","alt":" gn(hn) → φ′ρ′(h","inline":true},{"text":") for every subsequence ","element":"span"},{"style":{"height":17.6},"width":534.33,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-21.png","element":"img","alt":" hn → h ∈ D0 (with n ∈ N′","inline":true},{"text":"). Application of the ","element":"span"},{"text":"extended continuous mapping theorem ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"(van der Vaart and Wellner, ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"1996, ","element":"a"},{"text":"Theorem 1.11.1) yields ","element":"span"},{"href":"#id-138","text":"(B.3)","element":"a"},{"text":".","element":"span"}],[{"text":"Similarly, the map ","element":"span"},{"style":{"height":22.79},"width":1453.37,"height":56.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-22.png","element":"img","alt":" mn(h) = rn(φ(ρPn + r−1n h) − φ(ρPn)) − φ′ρPn(h), from Dn = {h ∈ D : ρPn +","inline":true},{"style":{"height":22.94},"width":1871.76,"height":57.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-23.png","element":"img","alt":"r−1n h ∈ Dφ} to E, satisfies mn(hn) → φ′ρ′(h) − φ′ρ′(h) = 0 for every subsequence hn → h ∈ D0 (with","inline":true},{"style":{"height":12.8},"width":136.59,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-24.png","element":"img","alt":"n ∈ N′","inline":true},{"text":"). Application of the extended continuous mapping theorem ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"(van der Vaart and Wellner, ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"1996, ","element":"a"},{"text":"Theorem 1.11.1) yields ","element":"span"},{"href":"#id-138","text":"(B.4)","element":"a"},{"text":". The proof of ","element":"span"},{"href":"#id-138","text":"(B.5) ","element":"a"},{"text":"is completely analogous and is omitted.","element":"span"}],[{"text":"To establish relative compactness, we work with each ","element":"span"},{"style":{"height":21.25},"width":732.22,"height":53.13,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-25.png","element":"img","alt":" N′. Then φ′ρPn(h) mapping D0 to E","inline":true,"padRight":true},{"text":"satisfies ","element":"span"},{"style":{"height":21.4},"width":337.12,"height":53.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-26.png","element":"img","alt":" φ′ρPn(hn) → φ′ρ′(h","inline":true},{"text":") for every subsequence ","element":"span"},{"style":{"height":17.6},"width":513.95,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/44-27.png","element":"img","alt":" hn → h ∈ D0 (with n ∈ N′","inline":true},{"text":"). Application of the ","element":"span"},{"text":"extended continuous mapping theorem ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"(van der Vaart and Wellner, ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"1996, ","element":"a"},{"text":"Theorem 1.11.1) yields that ","element":"span"},{"style":{"height":21.4},"width":392.26,"height":53.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-0.png","element":"img","alt":" φ′ρPn(ZP ) ⇝ φ′ρ′(Z′).","inline":true}],[{"text":"Part 2. We can split ","element":"span"},{"text":"N ","element":"span"},{"text":"into subsequences ","element":"span"},{"style":{"height":17.6},"width":86.51,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-1.png","element":"img","alt":" {N′}","inline":true,"padRight":true},{"text":"as above. Along each ","element":"span"},{"style":{"height":15.2},"width":54.7,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-2.png","element":"img","alt":" N′,","inline":true}],[{"style":{"width":"95%"},"width":1794,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":12},"width":58.09,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-4.png","element":"img","alt":" Z′′ ","inline":true,"padRight":true},{"text":"is a separable process in ","element":"span"},{"style":{"height":14.62},"width":48.52,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-5.png","element":"img","alt":" D0","inline":true,"padRight":true},{"text":"(which is given by ","element":"span"},{"style":{"height":12},"width":48.91,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-6.png","element":"img","alt":" Z′ ","inline":true,"padRight":true},{"text":"plus its independent copy ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":17.6},"width":231.59,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-7.png","element":"img","alt":"Z′). Indeed,","inline":true,"padRight":true},{"text":"note that ","element":"span"},{"style":{"height":21.66},"width":974.78,"height":54.14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-8.png","element":"img","alt":" rn(�ρ∗ρn,Pn − ρPn) = Z∗n,Pn + Zn,Pn, and (Z∗n,Pn, Zn,Pn","inline":true},{"text":") converge weakly unconditionally to ","element":"span"},{"text":"( ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":15.2},"width":112.39,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-9.png","element":"img","alt":"Z′, Z′","inline":true},{"text":") by a standard argument.","element":"span"}],[{"text":"Given each ","element":"span"},{"style":{"height":12.4},"width":47.52,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-10.png","element":"img","alt":" N′ ","inline":true,"padRight":true},{"text":"the proof is similar to the proof of Theorem 3.9.15 of ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996)","element":"a"},{"text":". We can assume without loss of generality that the derivative ","element":"span"},{"style":{"height":20.6},"width":234.71,"height":51.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-11.png","element":"img","alt":" φ′ρ′ : D → E","inline":true,"padRight":true},{"text":"is defined and ","element":"span"},{"text":"continuous on the whole of ","element":"span"},{"text":"D","element":"span"},{"text":". Otherwise, if ","element":"span"},{"style":{"height":20.6},"width":55.34,"height":51.5,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-12.png","element":"img","alt":" φ′ρ′","inline":true,"padRight":true},{"text":"is defined and continuous only on ","element":"span"},{"style":{"height":14.62},"width":48.52,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-13.png","element":"img","alt":" D0","inline":true},{"text":", we can extend ","element":"span"},{"text":"it to ","element":"span"},{"text":"D ","element":"span"},{"text":"by a Hahn-Banach extension such that ","element":"span"},{"style":{"height":21.4},"width":629.17,"height":53.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-14.png","element":"img","alt":" C = ∥φ′ρ′∥D0→E = ∥φ′ρ′∥D→E < ∞","inline":true},{"text":"; see ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996, ","element":"a"},{"text":"p. 380) for details. For each ","element":"span"},{"style":{"height":12.4},"width":47.52,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-15.png","element":"img","alt":" N′","inline":true},{"text":", by claim ","element":"span"},{"href":"#id-138","text":"(B.5)","element":"a"},{"text":", applied to ","element":"span"},{"style":{"height":19.97},"width":343.93,"height":49.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-16.png","element":"img","alt":" �ρn,Pn and to �ρ∗n,Pn","inline":true,"padRight":true},{"text":"replacing ","element":"span"},{"style":{"height":13.1},"width":108.39,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-17.png","element":"img","alt":" �ρn,Pn,","inline":true}],[{"style":{"width":"55%"},"width":1038,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-18.png","element":"img"}],[{"text":"Subtracting these equations conclude that for each ","element":"span"},{"style":{"height":12.4},"width":100.53,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-19.png","element":"img","alt":" ε > 0","inline":true}],[{"id":"id-139","style":{"width":"92%"},"width":1731,"height":83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-20.png","element":"img"}],[{"text":"For every ","element":"span"},{"style":{"height":17.6},"width":201.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-21.png","element":"img","alt":" h ∈ BL1(E","inline":true},{"text":"), the function ","element":"span"},{"style":{"height":20.6},"width":104.86,"height":51.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-22.png","element":"img","alt":" h◦φ′ρ′","inline":true,"padRight":true},{"text":"is contained in BL","element":"span"},{"style":{"height":17.6},"width":77.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-23.png","element":"img","alt":"C(D","inline":true},{"text":"). Moreover, ","element":"span"},{"style":{"height":20.77},"width":426.83,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-24.png","element":"img","alt":" rn(�ρ∗n,P −�ρn,P ) ⇝B ZP","inline":true,"padRight":true},{"text":"in ","element":"span"},{"text":"D ","element":"span"},{"text":"uniformly in ","element":"span"},{"style":{"height":20.77},"width":735.75,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-25.png","element":"img","alt":" P ∈ Pn implies rn(�ρ∗n,P − �ρn,P ) ⇝B Z′","inline":true,"padRight":true},{"text":"along the subsequence ","element":"span"},{"style":{"height":12.8},"width":127.04,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-26.png","element":"img","alt":" n ∈ N′","inline":true},{"text":". These two ","element":"span"},{"text":"facts imply that","element":"span"}],[{"style":{"width":"74%"},"width":1403,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-27.png","element":"img"}],[{"text":"Next for each ","element":"span"},{"style":{"height":10.4},"width":66.46,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-28.png","element":"img","alt":" ε >","inline":true,"padRight":true},{"text":"0 and along ","element":"span"},{"style":{"height":12.8},"width":127.04,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-29.png","element":"img","alt":" n ∈ N′","inline":true}],[{"style":{"width":"82%"},"width":1546,"height":206,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-30.png","element":"img"}],[{"text":"where the ","element":"span"},{"style":{"height":12.3},"width":60.63,"height":30.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-31.png","element":"img","alt":" oPn","inline":true},{"text":"(1) conclusion follows by the Markov inequality and by ","element":"span"},{"href":"#id-139","text":"(B.6)","element":"a"},{"text":". Conclude that","element":"span"}],[{"style":{"width":"77%"},"width":1460,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-32.png","element":"img"}]]},{"heading":"Appendix C. Key Tools II: Probabilistic Inequalities","paragraphs":[[{"text":"Let (","element":"span"},{"style":{"height":18.09},"width":126.57,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-33.png","element":"img","alt":"Wi)ni=1 ","inline":true,"padRight":true},{"text":"be a sequence of i.i.d. copies of random element ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"taking values in the measure ","element":"span"},{"text":"space (","element":"span"},{"style":{"height":16.3},"width":136.93,"height":40.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-34.png","element":"img","alt":"W, AW","inline":true},{"text":") according to probability law ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":". Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"text":"be a set of suitably measurable functions ","element":"span"},{"style":{"height":16.4},"width":235.66,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-35.png","element":"img","alt":"f : W �−→ R","inline":true},{"text":", equipped with a measurable envelope ","element":"span"},{"style":{"height":12.8},"width":255.24,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/45-36.png","element":"img","alt":" F : W �−→ R.","inline":true}],[{"text":"The following maximal inequality is due to ","element":"span"},{"href":"#id-140","referenceIndex":34,"text":"Chernozhukov et al. ","element":"a"},{"href":"#id-140","referenceIndex":34,"text":"(2012)","element":"a"},{"text":".","element":"span"}],[{"id":"id-143","style":{"fontWeight":"bold"},"text":"Lemma C.1 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"A Maximal Inequality","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Work with the setup above. Suppose that ","element":"span"},{"style":{"height":19.79},"width":287.19,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-0.png","element":"img","alt":" F ⩾ supf∈F |f|","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a measurable envelope with ","element":"span"},{"style":{"height":19.84},"width":1296.11,"height":49.59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-1.png","element":"img","alt":" ∥F∥P,q < ∞ for some q ⩾ 2. Let M = maxi⩽n F(Wi) and σ2 > 0 be","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"any positive constant such that ","element":"span"},{"text":"sup","element":"span"},{"style":{"height":22.31},"width":502.08,"height":55.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-2.png","element":"img","alt":"f∈F ∥f∥2P,2 ⩽ σ2 ⩽ ∥F∥2P,2","inline":true},{"style":{"fontStyle":"italic"},"text":". Suppose that there exist constants","element":"span"}],[{"style":{"width":"98%"},"width":1849,"height":204,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is an absolute constant. Moreover, for every ","element":"span"},{"style":{"height":14},"width":95.94,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-4.png","element":"img","alt":" t ⩾ 1","inline":true},{"style":{"fontStyle":"italic"},"text":", with probability ","element":"span"},{"style":{"height":19.13},"width":228.45,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-5.png","element":"img","alt":" > 1 − t−q/2,","inline":true}],[{"style":{"height":32.4},"width":1792.18,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-6.png","element":"img","alt":"∥Gn∥F ⩽ (1 + α)EP [∥Gn∥F] + K(q)�(σ + n−1/2∥M∥PP ,q)√t + α−1n−1/2∥M∥PP ,2t�, ∀α > 0,","inline":true}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"q","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"> ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is a constant depending only on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q","element":"span"},{"style":{"fontStyle":"italic"},"text":". In particular, setting ","element":"span"},{"style":{"height":16.4},"width":480.29,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-7.png","element":"img","alt":" a ⩾ n and t = log n, with","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"probability ","element":"span"},{"style":{"height":19.13},"width":322.14,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-8.png","element":"img","alt":" > 1 − c(log n)−1,","inline":true}],[{"style":{"width":"85%"},"width":1604,"height":136,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-9.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":21.04},"width":744.29,"height":52.59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-10.png","element":"img","alt":" ∥M∥PP ,q ⩽ n1/q∥F∥P,q and K(q, c) > 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a constant depending only on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}]]},{"heading":"Appendix D. Proofs for Section 4","paragraphs":[[{"text":"These results follow from the application of results given in Section 5. The details are given in the Supplementary Appendix.","element":"span"}]]},{"heading":"Appendix E. Proofs for Section 5","paragraphs":[[{"id":"id-56","text":"E.1. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-141","style":{"fontWeight":"bold"},"text":"5.1. ","element":"a"},{"text":"In the proof ","element":"span"},{"style":{"height":16.8},"width":118.23,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-11.png","element":"img","alt":" a ≲ b","inline":true,"padRight":true},{"text":"means that ","element":"span"},{"style":{"height":15.2},"width":150.96,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-12.png","element":"img","alt":" a ⩽ Ab","inline":true},{"text":", where the constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"depends on the constants in Assumptions ","element":"span"},{"href":"#id-104","text":"5.1–","element":"a"},{"href":"#id-94","text":"5.3, ","element":"a"},{"text":"but not on ","element":"span"},{"style":{"height":13.82},"width":271.05,"height":34.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-13.png","element":"img","alt":" n once n ⩾ n0","inline":true},{"text":", and not on ","element":"span"},{"style":{"height":14.62},"width":154.4,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-14.png","element":"img","alt":" P ∈ Pn.","inline":true,"padRight":true},{"text":"Since the argument is asymptotic, we can assume that ","element":"span"},{"style":{"height":13.82},"width":127.56,"height":34.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-15.png","element":"img","alt":" n ⩾ n0","inline":true,"padRight":true},{"text":"in what follows. In order to establish the result uniformly in ","element":"span"},{"style":{"height":14.62},"width":149.94,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-16.png","element":"img","alt":" P ∈ Pn","inline":true},{"text":", it suffices to establish the result under the probability measure induced by any sequence ","element":"span"},{"style":{"height":14.62},"width":255.41,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-17.png","element":"img","alt":" P = Pn ∈ Pn","inline":true},{"text":". In the proof we shall use ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", suppressing the dependency of ","element":"span"},{"style":{"height":14.62},"width":49.02,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-18.png","element":"img","alt":" Pn","inline":true,"padRight":true},{"text":"on the sample size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". Also, let","element":"span"}],[{"style":{"width":"84%"},"width":1573,"height":146,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-19.png","element":"img"}],[{"text":"Step 1. (A Preliminary Rate Result). In this step we claim that wp 1","element":"span"},{"style":{"height":18.19},"width":548.9,"height":45.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-20.png","element":"img","alt":"−o(1), supu∈U ∥�θu−θu∥ ≲ τn.","inline":true,"padRight":true},{"text":"By definition","element":"span"}],[{"id":"id-142","style":{"width":"99%"},"width":1868,"height":267,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-21.png","element":"img"}],[{"text":"for ","element":"span"},{"style":{"height":15.02},"width":180.14,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-22.png","element":"img","alt":" I1 and I2","inline":true,"padRight":true},{"text":"defined in Step 2 below. The ","element":"span"},{"style":{"height":16.8},"width":34,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-23.png","element":"img","alt":" ≲","inline":true,"padRight":true},{"text":"bound in ","element":"span"},{"href":"#id-142","text":"(E.2) ","element":"a"},{"text":"follows from Step 2 and from the assumption ","element":"span"},{"style":{"height":20.33},"width":239.97,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-24.png","element":"img","alt":" ϵn = o(n−1/2","inline":true},{"text":"). Since by Assumption ","element":"span"},{"href":"#id-104","text":"5.1(","element":"a"},{"text":"iv), 2","element":"span"},{"style":{"height":19.13},"width":381.5,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/46-25.png","element":"img","alt":"−1(∥Ju(�θu−θu)∥∧c0","inline":true},{"text":") does not exceed the","element":"span"}],[{"text":"left side of ","element":"span"},{"href":"#id-142","text":"(E.2) ","element":"a"},{"text":"and inf","element":"span"},{"style":{"height":17.6},"width":308.81,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-0.png","element":"img","alt":"u∈U mineig(J′uJu","inline":true},{"text":") is bounded away from zero uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", we conclude ","element":"span"},{"text":"that sup","element":"span"},{"style":{"height":20.93},"width":962.1,"height":52.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-1.png","element":"img","alt":"u∈U ∥�θu − θu∥ ≲ (infu∈U mineig(J′uJu))−1/2τn ≲ τn.","inline":true}],[{"text":"Step 2. (Define and bound ","element":"span"},{"style":{"height":15.02},"width":173.69,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-2.png","element":"img","alt":" I1 and I2","inline":true},{"text":") We claim that with probability 1 ","element":"span"},{"style":{"height":17.6},"width":132.55,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-3.png","element":"img","alt":" − o(1):","inline":true}],[{"style":{"width":"70%"},"width":1324,"height":214,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-4.png","element":"img"}],[{"text":"To establish this, we can bound ","element":"span"},{"style":{"height":15.24},"width":529.96,"height":38.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-5.png","element":"img","alt":" I1 ⩽ 2I1a + I1b and I2 ⩽ I1a","inline":true},{"text":", where with probability 1 ","element":"span"},{"style":{"height":17.6},"width":132.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-6.png","element":"img","alt":" − o(1),","inline":true}],[{"style":{"width":"80%"},"width":1499,"height":221,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-7.png","element":"img"}],[{"text":"These bounds in turn hold by the following arguments. In order to bound ","element":"span"},{"style":{"height":14.84},"width":51.12,"height":37.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-8.png","element":"img","alt":" I1b","inline":true,"padRight":true},{"text":"we employ Taylor’s expansion and the triangle inequality. For ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":17.6},"width":194.89,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-9.png","element":"img","alt":"h(Z, u, j, θ","inline":true},{"text":") denoting a point on a line connecting vectors ","element":"span"},{"style":{"height":17.6},"width":499.22,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-10.png","element":"img","alt":"h(Zu) and hu(Zu), and tm","inline":true,"padRight":true},{"text":"denoting the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"th element of the vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"93%"},"width":1752,"height":233,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-11.png","element":"img"}],[{"text":"where the last inequality holds by the definition of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"W","element":"span"},{"text":") given earlier and H¨older’s inequality. By Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"ii)(c), ","element":"span"},{"style":{"height":18.3},"width":221.86,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-12.png","element":"img","alt":" ∥B∥P,2 ⩽ C","inline":true},{"text":", and by Assumption ","element":"span"},{"href":"#id-94","text":"5.3, ","element":"a"},{"text":"sup","element":"span"},{"style":{"height":20.59},"width":681.02,"height":51.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-13.png","element":"img","alt":"u∈U,h∈Hun,m∈[dt] ∥hm−hum∥P,2 ≲ τn,","inline":true,"padRight":true},{"text":"hence we conclude that ","element":"span"},{"style":{"height":16.8},"width":641.95,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-14.png","element":"img","alt":" I1b ≲ τn since dθ and dt are fixed.","inline":true}],[{"text":"In order to bound ","element":"span"},{"style":{"height":14.62},"width":54.12,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-15.png","element":"img","alt":" I1a","inline":true,"padRight":true},{"text":"we employ the maximal inequality of Lemma ","element":"span"},{"href":"#id-143","text":"C.1 ","element":"a"},{"text":"to the class","element":"span"}],[{"style":{"width":"67%"},"width":1265,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-16.png","element":"img"}],[{"text":"defined in Assumption ","element":"span"},{"href":"#id-94","text":"5.3 ","element":"a"},{"text":"and equipped with an envelope ","element":"span"},{"style":{"height":14.62},"width":150.23,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-17.png","element":"img","alt":" F1 ⩽ F0","inline":true},{"text":", to conclude that with probability 1 ","element":"span"},{"style":{"height":17.6},"width":132.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-18.png","element":"img","alt":" − o(1),","inline":true}],[{"style":{"width":"55%"},"width":1034,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-19.png","element":"img"}],[{"text":"Here we use that log sup","element":"span"},{"style":{"height":19.79},"width":786.63,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-20.png","element":"img","alt":"Q N(ϵ∥F1∥Q,2, F1, ∥·∥Q,2) ⩽ sn log(an/ϵ)∨","inline":true},{"text":"0 by Assumption ","element":"span"},{"href":"#id-94","text":"5.3; ","element":"a"},{"style":{"height":18.3},"width":232.89,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-21.png","element":"img","alt":" ∥F0∥P,q ⩽ C","inline":true,"padRight":true},{"text":"and sup","element":"span"},{"style":{"height":22.31},"width":828.16,"height":55.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-22.png","element":"img","alt":"f∈F1 ∥f∥2P,2 ⩽ σ2 ⩽ ∥F0∥2P,2 for c ⩽ σ ⩽ C","inline":true,"padRight":true},{"text":"by Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"i); ","element":"span"},{"style":{"height":16.4},"width":435.07,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-23.png","element":"img","alt":" an ⩾ n and sn ⩾ 1 by","inline":true,"padRight":true},{"text":"Assumption ","element":"span"},{"href":"#id-94","text":"5.3.","element":"a"}],[{"style":{"width":"90%"},"width":1703,"height":411,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/47-24.png","element":"img"}],[{"style":{"width":"0%"},"width":13,"height":4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-0.png","element":"img"}],[{"text":"where the terms ","element":"span"},{"style":{"height":17.6},"width":323.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-1.png","element":"img","alt":" II1(u) and II2(u","inline":true},{"text":") are defined in Step 4 and D","element":"span"},{"style":{"height":18.22},"width":211.64,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-2.png","element":"img","alt":"u,0(�hu − hu","inline":true},{"text":") is treated in the next paragraph. Then by the triangle inequality for all ","element":"span"},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-3.png","element":"img","alt":" u ∈ U","inline":true,"padRight":true},{"text":"and Steps 4 and 5 we have","element":"span"}],[{"style":{"width":"82%"},"width":1544,"height":231,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-4.png","element":"img"}],[{"text":"where the ","element":"span"},{"style":{"height":10.7},"width":47.15,"height":26.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-5.png","element":"img","alt":" oP","inline":true,"padRight":true},{"text":"(1) bound follows from Step 4, ","element":"span"},{"style":{"height":17.6},"width":182,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-6.png","element":"img","alt":" ϵn√n = o","inline":true},{"text":"(1) by assumption, and Step 5.","element":"span"}],[{"text":"Moreover, by the orthogonality condition:","element":"span"}],[{"style":{"width":"92%"},"width":1730,"height":150,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-7.png","element":"img"}],[{"text":"Conclude using Assumption ","element":"span"},{"href":"#id-104","text":"5.1(","element":"a"},{"text":"iv) that","element":"span"}],[{"style":{"width":"84%"},"width":1590,"height":83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-8.png","element":"img"}],[{"text":"Furthermore, the empirical process (","element":"span"},{"style":{"height":19.13},"width":665.69,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-9.png","element":"img","alt":"−√nEnJ−1u ψu(Wu, θu, hu(Zu)))u∈U ","inline":true,"padRight":true},{"text":"is equivalent to an em- ","element":"span"},{"text":"pirical process ","element":"span"},{"style":{"height":15.02},"width":54.94,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-10.png","element":"img","alt":" Gn","inline":true,"padRight":true},{"text":"indexed by ","element":"span"},{"style":{"height":32},"width":999.56,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-11.png","element":"img","alt":" FP :=�¯ψuj : j ∈ [dθ], u ∈ U�, where ¯ψuj is the j","inline":true},{"text":"-th element of","element":"span"}],[{"style":{"height":19.13},"width":431.8,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-12.png","element":"img","alt":"−J−1u ψu(Wu, θu, hu(Zu","inline":true},{"text":")) and we make explicit the dependence of ","element":"span"},{"style":{"height":18.44},"width":580,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-13.png","element":"img","alt":" FP on P. Let M = {Mujk :","inline":true},{"style":{"height":18.44},"width":811.19,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-14.png","element":"img","alt":"j, k ∈ [dθ], u ∈ U}, where Mujk is the (j, k","inline":true},{"text":") element of the matrix ","element":"span"},{"style":{"height":19.05},"width":154.82,"height":47.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-15.png","element":"img","alt":" J−1u . M","inline":true,"padRight":true},{"text":"is a class of uniformly ","element":"span"},{"text":"H¨older continuous functions on (","element":"span"},{"style":{"height":15.6},"width":98.75,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-16.png","element":"img","alt":"U, dU","inline":true},{"text":") with a uniform covering entropy bounded by ","element":"span"},{"style":{"height":17.6},"width":258.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-17.png","element":"img","alt":" C log(e/ϵ) ∨ 0","inline":true,"padRight":true},{"text":"and equipped with a constant envelope ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":", given the stated assumptions. This result follows from the fact that by Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"ii)(b)","element":"span"}],[{"id":"id-144","style":{"width":"79%"},"width":1497,"height":182,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-18.png","element":"img"}],[{"text":"and the constant envelope follows by Assumption ","element":"span"},{"href":"#id-104","text":"5.1(","element":"a"},{"text":"iv). Since ","element":"span"},{"style":{"height":15.1},"width":57.36,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-19.png","element":"img","alt":" FP","inline":true,"padRight":true},{"text":"is generated as a finite sum of products of the elements of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"and the class ","element":"span"},{"style":{"height":15.02},"width":48.36,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-20.png","element":"img","alt":" F0","inline":true,"padRight":true},{"text":"defined in Assumption ","element":"span"},{"href":"#id-103","text":"5.2, ","element":"a"},{"text":"the properties of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"and the conditions on ","element":"span"},{"style":{"height":15.02},"width":48.36,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-21.png","element":"img","alt":" F0","inline":true,"padRight":true},{"text":"in Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"ii) imply that ","element":"span"},{"style":{"height":15.1},"width":57.36,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-22.png","element":"img","alt":" FP","inline":true,"padRight":true},{"text":"has a uniformly well-behaved uniform covering entropy by Lemma ","element":"span"},{"href":"#id-135","text":"K.1, ","element":"a"},{"text":"namely","element":"span"}],[{"style":{"width":"62%"},"width":1175,"height":85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-23.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.1},"width":203.83,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-24.png","element":"img","alt":" FP = CF0","inline":true,"padRight":true},{"text":"is an envelope for ","element":"span"},{"style":{"height":21.32},"width":1159.18,"height":53.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-25.png","element":"img","alt":" FP since supf∈FP |f| ≲ supu∈U ∥J−1u ∥ supf∈F0 |f| ⩽ CF0 by","inline":true,"padRight":true},{"text":"Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"i). The class ","element":"span"},{"style":{"height":15.1},"width":57.36,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-26.png","element":"img","alt":" FP","inline":true,"padRight":true},{"text":"is therefore Donsker uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"because sup","element":"span"},{"style":{"height":18.3},"width":283.46,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-27.png","element":"img","alt":"P∈P ∥FP ∥P,q ⩽","inline":true},{"style":{"height":18.3},"width":330.12,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-28.png","element":"img","alt":"C supP∈P ∥F0∥P,q","inline":true,"padRight":true},{"text":"is bounded by Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"ii), and sup","element":"span"},{"style":{"height":20.11},"width":721.66,"height":50.27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/48-29.png","element":"img","alt":"P∈P ∥ ¯ψu − ¯ψ¯u∥P,2 → 0 as dU(u, ¯u) → 0","inline":true,"padRight":true},{"text":"by Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"b) and ","element":"span"},{"href":"#id-144","text":"(E.3)","element":"a"},{"text":". Application of Theorem ","element":"span"},{"href":"#id-131","text":"B.1 ","element":"a"},{"text":"gives the results of the theorem.","element":"span"}],[{"text":"Step 4. ","element":"span"},{"text":"(Define and Bound ","element":"span"},{"style":{"height":24.67},"width":1244.07,"height":61.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-0.png","element":"img","alt":" II1(u) and II2(u)). Let II1(u) := (II1j(u))dθj=1 and II2(u) =","inline":true,"padRight":true},{"text":"(","element":"span"},{"style":{"height":24.67},"width":351.87,"height":61.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-1.png","element":"img","alt":"II2j(u))dθj=1, where","inline":true}],[{"style":{"width":"98%"},"width":1851,"height":283,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-2.png","element":"img"}],[{"text":"and ¯","element":"span"},{"style":{"height":17.6},"width":149.94,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-3.png","element":"img","alt":"νu(Zu, j","inline":true},{"text":") is a vector on the line connecting ","element":"span"},{"style":{"height":17.6},"width":368.38,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-4.png","element":"img","alt":" νu(Zu) and �νu(Zu).","inline":true}],[{"text":"First, by Assumptions ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"ii)(d) and ","element":"span"},{"href":"#id-94","text":"5.3, ","element":"a"},{"text":"the claim of Step 1, and the H¨older inequality,","element":"span"}],[{"style":{"width":"80%"},"width":1509,"height":240,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-5.png","element":"img"}],[{"text":"Second, we have that with probability 1 ","element":"span"},{"style":{"height":19.95},"width":1021.18,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-6.png","element":"img","alt":" − o(1), maxj∈[dθ] supu∈U |II2j(u)| ≲ supf∈F2 |Gn(f)|,","inline":true,"padRight":true},{"text":"where, for Θ","element":"span"},{"style":{"height":17.6},"width":629.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-7.png","element":"img","alt":"un := {θ ∈ Θu : ∥θ − θu∥ ⩽ Cτn},","inline":true}],[{"style":{"width":"99%"},"width":1867,"height":275,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-8.png","element":"img"}],[{"text":"since sup","element":"span"},{"style":{"height":19.79},"width":588.96,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-9.png","element":"img","alt":"f∈F2 |f| ⩽ 2 supf∈F1 |f| ⩽ 2F0","inline":true,"padRight":true},{"text":"by Assumption ","element":"span"},{"href":"#id-94","text":"5.3; ","element":"a"},{"style":{"height":18.3},"width":245.64,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-10.png","element":"img","alt":" ∥F0∥P,q ⩽ C","inline":true,"padRight":true},{"text":"by Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"i); log sup","element":"span"},{"style":{"height":19.79},"width":1098.89,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-11.png","element":"img","alt":"Q N(ϵ∥F2∥Q,2, F2, ∥ · ∥Q,2) ≲ (sn log an + sn log(an/ϵ)) ∨","inline":true,"padRight":true},{"text":"0 by Lemma ","element":"span"},{"href":"#id-135","text":"K.1 ","element":"a"},{"text":"because ","element":"span"},{"style":{"height":15.02},"width":105.74,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-12.png","element":"img","alt":" F2 =","inline":true},{"style":{"height":15.02},"width":529.15,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-13.png","element":"img","alt":"F1 − F0 for the F0 and F1","inline":true,"padRight":true},{"text":"defined in Assumptions ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"i) and ","element":"span"},{"href":"#id-94","text":"5.3; ","element":"a"},{"text":"and ","element":"span"},{"style":{"height":8},"width":25,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-14.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"can be chosen so that sup","element":"span"},{"style":{"height":25.39},"width":608.16,"height":63.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-15.png","element":"img","alt":"f∈F2 ∥f∥P,2 ⩽ σ ≲ τ α/2n . Indeed,","inline":true}],[{"style":{"width":"86%"},"width":1610,"height":287,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-16.png","element":"img"}],[{"text":"where the first inequality follows by the law of iterated expectations; the second inequality follows by Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"ii)(a); and the last inequality follows from ","element":"span"},{"style":{"height":17.6},"width":137.74,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-17.png","element":"img","alt":" α ∈ [1,","inline":true,"padRight":true},{"text":"2] by Assumption ","element":"span"},{"href":"#id-103","text":"5.2, ","element":"a"},{"text":"the monotonicity of the norm ","element":"span"},{"style":{"height":18.3},"width":374.92,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-18.png","element":"img","alt":" ∥ · ∥P,α in α ∈ [1, ∞","inline":true},{"text":"], and Assumption ","element":"span"},{"href":"#id-94","text":"5.3.","element":"a"}],[{"style":{"width":"99%"},"width":1871,"height":426,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/49-19.png","element":"img"}],[{"style":{"width":"99%"},"width":1869,"height":264,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-0.png","element":"img"}],[{"text":"where the first term on the right side is zero by definition of ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":18.22},"width":688.85,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-1.png","element":"img","alt":"θu and Du,0(�hu − hu) = 0. ■","inline":true}],[{"text":"E.2. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-145","style":{"fontWeight":"bold"},"text":"5.2. ","element":"a"},{"text":"Step 0. ","element":"span"},{"text":"In the proof ","element":"span"},{"style":{"height":16.8},"width":100.25,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-2.png","element":"img","alt":" a ≲ b","inline":true,"padRight":true},{"text":"means that ","element":"span"},{"style":{"height":15.2},"width":132.97,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-3.png","element":"img","alt":" a ⩽ Ab","inline":true},{"text":", where the constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"depends on the constants in Assumptions ","element":"span"},{"href":"#id-104","text":"5.1– ","element":"a"},{"href":"#id-94","text":"5.3, ","element":"a"},{"text":"but not on ","element":"span"},{"style":{"height":13.82},"width":262.32,"height":34.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-4.png","element":"img","alt":" n once n ⩾ n0","inline":true},{"text":", and not on ","element":"span"},{"style":{"height":14.62},"width":152.31,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-5.png","element":"img","alt":" P ∈ Pn.","inline":true,"padRight":true},{"text":"In Step 1, we consider a sequence ","element":"span"},{"style":{"height":14.62},"width":171.76,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-6.png","element":"img","alt":" Pn in Pn","inline":true},{"text":", but for simplicity, we write ","element":"span"},{"style":{"height":14.62},"width":148.6,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-7.png","element":"img","alt":" P = Pn","inline":true,"padRight":true},{"text":"throughout the proof, suppressing the index ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". Since the argument is asymptotic, we can assume that ","element":"span"},{"style":{"height":14.22},"width":185.78,"height":35.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-8.png","element":"img","alt":" n ⩾ n0 in","inline":true,"padRight":true},{"text":"what follows.","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":14.62},"width":47.67,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-9.png","element":"img","alt":" Pn","inline":true,"padRight":true},{"text":"denote the measure that puts mass ","element":"span"},{"style":{"height":15.13},"width":69.54,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-10.png","element":"img","alt":" n−1 ","inline":true,"padRight":true},{"text":"at the points (","element":"span"},{"style":{"height":17.6},"width":592.77,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-11.png","element":"img","alt":"ξi, Wi) for i = 1, ..., n. Let En","inline":true,"padRight":true},{"text":"denote the expectation with respect to this measure, so that ","element":"span"},{"style":{"height":19.89},"width":673.48,"height":49.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-12.png","element":"img","alt":" Enf = n−1 �ni=1 f(ξi, Wi), and Gn","inline":true,"padRight":true},{"text":"denote the corresponding empirical process ","element":"span"},{"style":{"height":17.77},"width":317.84,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-13.png","element":"img","alt":"√n(En − P), i.e.","inline":true}],[{"style":{"width":"74%"},"width":1396,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-14.png","element":"img"}],[{"text":"Recall that we define the bootstrap draw as:","element":"span"}],[{"style":{"width":"64%"},"width":1208,"height":139,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-15.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.13},"width":668.77,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-16.png","element":"img","alt":"�ψu(W) = − �J−1u ψu(Wu, �θu,�hu(Zu)).","inline":true}],[{"text":"Step 1.","element":"span"},{"text":"(Linearization) In this step we establish that","element":"span"}],[{"style":{"width":"99%"},"width":1868,"height":155,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-17.png","element":"img"}],[{"text":"The claim would follow from demonstrating that (a)","element":"span"}],[{"id":"id-146","style":{"width":"69%"},"width":1304,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-18.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":23.38},"width":1167.46,"height":58.45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-19.png","element":"img","alt":" Z⋆n,P := (Gnξ ˇψu)u∈U, and ˇψu(W) = −J−1u ψu(Wu, �θu,�hu(Zu","inline":true},{"text":")) (not that the hat from ","element":"span"},{"style":{"height":14.62},"width":44.2,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-20.png","element":"img","alt":" Ju","inline":true,"padRight":true},{"text":"disappeared) ; and (b)","element":"span"}],[{"id":"id-147","style":{"width":"69%"},"width":1302,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-21.png","element":"img"}],[{"text":"To show claim ","element":"span"},{"href":"#id-146","text":"(E.7)","element":"a"},{"text":", we note that with probability 1 ","element":"span"},{"style":{"height":17.6},"width":773.82,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-22.png","element":"img","alt":" − δn, �hu ∈ Hun, �θu ∈ Θun = {θ ∈ Θu :","inline":true}],[{"style":{"width":"81%"},"width":1528,"height":154,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-23.png","element":"img"}],[{"text":"where ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":20.02},"width":368.24,"height":50.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-24.png","element":"img","alt":"ψuj(¯θu, ¯hu) is the j","inline":true},{"text":"-th element of ","element":"span"},{"style":{"height":20.02},"width":810.89,"height":50.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-25.png","element":"img","alt":" −J−1u ψu(Wu, ¯θu, ¯hu(Zu)), and ¯ψuj is the j","inline":true},{"text":"-th element of ","element":"span"},{"style":{"height":19.13},"width":431.8,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/50-26.png","element":"img","alt":"−J−1u ψu(Wu, θu, hu(Zu","inline":true},{"text":")). By the arguments similar to those employed in the proof of the previous","element":"span"}],[{"text":"theorem, ","element":"span"},{"style":{"height":16.4},"width":171.55,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-0.png","element":"img","alt":" F3 obeys","inline":true}],[{"style":{"width":"64%"},"width":1202,"height":78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-1.png","element":"img"}],[{"text":"for an envelope ","element":"span"},{"style":{"height":16.8},"width":161.2,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-2.png","element":"img","alt":" F3 ≲ F0","inline":true},{"text":". By Lemma ","element":"span"},{"href":"#id-135","text":"K.1, ","element":"a"},{"text":"multiplication of this class by ","element":"span"},{"style":{"height":16.4},"width":20,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-3.png","element":"img","alt":" ξ","inline":true,"padRight":true},{"text":"does not change the entropy bound modulo an absolute constant, namely","element":"span"}],[{"style":{"width":"67%"},"width":1268,"height":78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-4.png","element":"img"}],[{"text":"Also E[exp(","element":"span"},{"style":{"height":17.6},"width":187.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-5.png","element":"img","alt":"|ξ|)] < ∞","inline":true,"padRight":true},{"text":"implies (E[max","element":"span"},{"style":{"height":20.55},"width":382.56,"height":51.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-6.png","element":"img","alt":"i⩽n |ξi|2])1/2 ≲ log n","inline":true},{"text":", so that, using independence of (","element":"span"},{"style":{"height":18.09},"width":104.45,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-7.png","element":"img","alt":"ξi)ni=1","inline":true,"padRight":true},{"text":"from (","element":"span"},{"style":{"height":18.09},"width":126.58,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-8.png","element":"img","alt":"Wi)ni=1 ","inline":true,"padRight":true},{"text":"and Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"i),","element":"span"}],[{"style":{"width":"68%"},"width":1278,"height":75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-9.png","element":"img"}],[{"text":"Applying Lemma ","element":"span"},{"href":"#id-143","text":"C.1,","element":"a"}],[{"style":{"width":"72%"},"width":1350,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-10.png","element":"img"}],[{"text":"for sup","element":"span"},{"style":{"height":25.39},"width":819.33,"height":63.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-11.png","element":"img","alt":"f∈ξF3 ∥f∥P,2 = supf∈F3 ∥f∥P,2 ≲ σn ≲ τ α/2n","inline":true,"padRight":true},{"text":", where the details of calculations are similar to those in the proof of Theorem ","element":"span"},{"href":"#id-141","text":"5.1. ","element":"a"},{"text":"Indeed, with probability 1 ","element":"span"},{"style":{"height":17.6},"width":152.67,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-12.png","element":"img","alt":" − o(δn),","inline":true}],[{"style":{"width":"98%"},"width":1842,"height":188,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-13.png","element":"img"}],[{"text":"where the first inequality follows from the triangle inequality and the law of iterated expectations; the second inequality follows by Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"ii)(a) and Assumption ","element":"span"},{"href":"#id-103","text":"5.2(","element":"a"},{"text":"i); the third inequality follows from ","element":"span"},{"style":{"height":17.6},"width":127.35,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-14.png","element":"img","alt":" α ∈ [1,","inline":true,"padRight":true},{"text":"2] by Assumption ","element":"span"},{"href":"#id-103","text":"5.2, ","element":"a"},{"text":"the monotonicity of the norm ","element":"span"},{"style":{"height":18.3},"width":481.78,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-15.png","element":"img","alt":" ∥ · ∥P,α in α ∈ [1, ∞], and","inline":true,"padRight":true},{"text":"Assumption ","element":"span"},{"href":"#id-94","text":"5.3; ","element":"a"},{"text":"and the last inequality follows from ","element":"span"},{"style":{"height":18.3},"width":310.41,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-16.png","element":"img","alt":" ∥ν − νu∥P,2 ≲ τn","inline":true,"padRight":true},{"text":"by the definition of Θ","element":"span"},{"style":{"height":15.02},"width":126.02,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-17.png","element":"img","alt":"un and","inline":true},{"style":{"height":14.62},"width":77.46,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-18.png","element":"img","alt":"Hun","inline":true},{"text":". The claim ","element":"span"},{"href":"#id-146","text":"(E.7) ","element":"a"},{"text":"follows.","element":"span"}],[{"text":"To show claim ","element":"span"},{"href":"#id-147","text":"(E.8)","element":"a"},{"text":", bound","element":"span"}],[{"style":{"width":"90%"},"width":1688,"height":84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-19.png","element":"img"}],[{"text":"since sup","element":"span"},{"style":{"height":19.73},"width":422.64,"height":49.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-20.png","element":"img","alt":"u∈U ∥ �J−1u Ju − I∥ = oP ","inline":true,"padRight":true},{"text":"(1) by the assumption of the theorem, and since ","element":"span"},{"style":{"height":20.77},"width":333.78,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-21.png","element":"img","alt":" ∥Z∗n,P ∥D = OP (1)","inline":true,"padRight":true},{"text":"by ","element":"span"},{"style":{"height":20.77},"width":765.45,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-22.png","element":"img","alt":" ∥Z∗n,P ∥D = ∥G∗n,P + oP (1)∥D ⇝B ∥ZP ∥D","inline":true},{"text":", which follows by claim ","element":"span"},{"href":"#id-146","text":"(E.7) ","element":"a"},{"text":"and by ","element":"span"},{"style":{"height":19.97},"width":303.26,"height":49.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-23.png","element":"img","alt":" G∗n,P ⇝B ZP in","inline":true,"padRight":true},{"text":"D ","element":"span"},{"text":"holding by Theorem ","element":"span"},{"href":"#id-134","text":"B.2.","element":"a"}],[{"text":"Step 2","element":"span"},{"text":". Here we are claiming that ","element":"span"},{"style":{"height":22.71},"width":547.42,"height":56.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-24.png","element":"img","alt":" Z∗n,P ⇝B ZP in D = ℓ∞(U)dθ","inline":true},{"text":", under any sequence ","element":"span"},{"style":{"height":14.62},"width":183.94,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-25.png","element":"img","alt":" P = Pn ∈","inline":true},{"style":{"height":19.41},"width":509.69,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-26.png","element":"img","alt":"Pn, were ZP = (GP ¯ψu)u∈U","inline":true},{"text":". By the triangle inequality and Step 1,","element":"span"}],[{"style":{"width":"92%"},"width":1736,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-27.png","element":"img"}],[{"text":"where the first term is ","element":"span"},{"style":{"height":20.77},"width":495.45,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-28.png","element":"img","alt":" o∗P (1), since G∗n,P ⇝B ZP","inline":true,"padRight":true},{"text":"by Theorem ","element":"span"},{"href":"#id-134","text":"B.2, ","element":"a"},{"text":"and the second term is ","element":"span"},{"style":{"height":17.6},"width":105.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-29.png","element":"img","alt":" oP (1)","inline":true,"padRight":true},{"text":"because ","element":"span"},{"style":{"height":20.77},"width":254.18,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-30.png","element":"img","alt":" ∥ζ∗n,P ∥D = oP","inline":true,"padRight":true},{"text":"(1) implies that E","element":"span"},{"style":{"height":20.77},"width":785.04,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-31.png","element":"img","alt":"P (∥ζ∗n,P ∥D ∧ 2) = EP EBn(∥ζ∗n,P ∥D ∧ 2) →","inline":true,"padRight":true},{"text":"0, which in turn ","element":"span"},{"text":"implies that E","element":"span"},{"style":{"height":20.77},"width":403.39,"height":51.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-32.png","element":"img","alt":"Bn(∥ζ∗n,P ∥D ∧ 2) = oP","inline":true,"padRight":true},{"text":"(1) by the Markov inequality. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/51-33.png","element":"img","alt":"■","inline":true}],[{"text":"E.3. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-93","style":{"fontWeight":"bold"},"text":"5.3. ","element":"a"},{"text":"This is an immediate consequence of Theorems ","element":"span"},{"href":"#id-141","text":"5.1, ","element":"a"},{"href":"#id-145","text":"5.2, ","element":"a"},{"href":"#id-108","text":"B.3, ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-109","text":"B.4. ","element":"a"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-0.png","element":"img","alt":"■","inline":true}]]},{"heading":"Appendix F. Implementation Details","paragraphs":[[{"id":"id-68","text":"In this section, we provide details about how we implemented the methodology developed in ","element":"span"},{"text":"the main body of the paper in the empirical example. We first discuss estimation of local average treatment effects (LATE) and then extend this discussion to estimation of local quantile treatment effects (LQTE). Estimation of all other quantities proceeds in a similar fashion and so is not discussed.","element":"span"}],[{"text":"F.1. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Local Average Treatment Effects. ","element":"span"},{"text":"Recall that the LATE of treatment ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"on outcome ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":"is defined as","element":"span"}],[{"style":{"width":"79%"},"width":1497,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-1.png","element":"img"}],[{"text":"for ","element":"span"},{"style":{"height":17.6},"width":304.37,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-2.png","element":"img","alt":" αV (z) and θY (d","inline":true},{"text":") defined in equations ","element":"span"},{"href":"#id-58","text":"(2.1) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-62","text":"(2.3) ","element":"a"},{"text":"respectively. It then follows by plugging in the definition of ","element":"span"},{"style":{"height":17.6},"width":95.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-3.png","element":"img","alt":" αV (z","inline":true},{"text":") that we can express the LATE as","element":"span"}],[{"style":{"width":"32%"},"width":614,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-4.png","element":"img"}],[{"text":"To obtain an estimate of the LATE, we thus need estimates of ","element":"span"},{"style":{"height":19.95},"width":553.29,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-5.png","element":"img","alt":" αY (z) and α11(D)(z). Using","inline":true,"padRight":true},{"text":"the low-bias moment function given in equation ","element":"span"},{"href":"#id-148","text":"(3.13)","element":"a"},{"text":", estimates of these key quantities can be constructed from estimates of E","element":"span"},{"style":{"height":17.6},"width":1267.87,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-6.png","element":"img","alt":"P [Y |Z = 1, X], EP [Y |Z = 0, X], EP [D|Z = 1, X], EP [D|Z = 0, X],","inline":true,"padRight":true},{"text":"and E","element":"span"},{"style":{"height":17.6},"width":321.25,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-7.png","element":"img","alt":"P [Z|X] where Z","inline":true,"padRight":true},{"text":"is the binary instrument (401(k) eligibility); ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"is the binary treatment (401(k) participation); ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"is the set of raw covariates discussed in the empirical section; and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":"is net financial assets. In our application, we have E","element":"span"},{"style":{"height":17.6},"width":911.33,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-8.png","element":"img","alt":"P [D|Z = 0, X] = 0 since one cannot participate","inline":true,"padRight":true},{"text":"unless one is eligible. We use Post-Lasso to estimate E","element":"span"},{"style":{"height":17.6},"width":789.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-9.png","element":"img","alt":"P [Y |Z = 1, X] and EP [Y |Z = 0, X] and","inline":true,"padRight":true},{"text":"post-","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-10.png","element":"img","alt":"ℓ1","inline":true},{"text":"-penalized logistic regression to estimate E","element":"span"},{"style":{"height":17.6},"width":551.87,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-11.png","element":"img","alt":"P [D|Z = 1, X] and EP [Z|X].","inline":true}],[{"text":"To estimate E","element":"span"},{"style":{"height":17.6},"width":263.25,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-12.png","element":"img","alt":"P [Y |Z = 1, X","inline":true},{"text":"], we postulate that E","element":"span"},{"style":{"height":17.6},"width":502.02,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-13.png","element":"img","alt":"P [Y |Z = 1, X] ≈ f(X)′βY","inline":true,"padRight":true},{"text":"(1), where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":") is one of the pre-specified sets of controls discussed in the empirical section with dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":". Let ","element":"span"},{"style":{"height":14.62},"width":40.76,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-14.png","element":"img","alt":"I1","inline":true,"padRight":true},{"text":"denote the indices of observations that have ","element":"span"},{"style":{"height":17.6},"width":923.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-15.png","element":"img","alt":" zi = 1. To estimate the coefficients βY (1), we","inline":true,"padRight":true},{"text":"apply the formulation of the Post-Lasso estimator given in ","element":"span"},{"href":"#id-6","referenceIndex":9,"text":"Belloni et al. ","element":"a"},{"href":"#id-6","referenceIndex":9,"text":"(2012) ","element":"a"},{"text":"with outcomes ","element":"span"},{"style":{"height":17.67},"width":146.34,"height":44.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-16.png","element":"img","alt":"{yi}i∈I1","inline":true,"padRight":true},{"text":"and covariates ","element":"span"},{"style":{"height":19.2},"width":1104.66,"height":48.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-17.png","element":"img","alt":" {f(xi)}i∈I1. We set λ = 1.1√nΦ−1(1 − (.1/ log(n))/(2(2p","inline":true},{"text":"))) where Φ(","element":"span"},{"style":{"height":17.6},"width":73.67,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-18.png","element":"img","alt":"·) is","inline":true,"padRight":true},{"text":"the standard normal distribution function. We calculate penalty loadings according to Algorithm A.1 of ","element":"span"},{"href":"#id-6","referenceIndex":9,"text":"Belloni et al. ","element":"a"},{"href":"#id-6","referenceIndex":9,"text":"(2012) ","element":"a"},{"text":"using Post-Lasso coefficient estimates at each iteration and with a the maximum number of iterations set to 15.","element":"span"},{"style":{"height":18.76},"width":193,"height":46.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-19.png","element":"img","alt":"30 Let �βY","inline":true,"padRight":true},{"text":"(1) denote the resulting Post-Lasso estimates of the coefficients using ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-20.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"given above and the final set of penalty loadings. We then estimate E","element":"span"},{"style":{"height":17.6},"width":602.51,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-21.png","element":"img","alt":"P [Y |Z = 1, X = xi] as f(xi)′ �βY","inline":true,"padRight":true},{"text":"(1) for each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ..., n","element":"span"},{"text":". We follow the same procedure to obtain estimates of E","element":"span"},{"style":{"height":17.6},"width":588.54,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-22.png","element":"img","alt":"P [Y |Z = 0, X = xi] as f(xi)′ �βY","inline":true,"padRight":true},{"text":"(0) for each ","element":"span"},{"style":{"height":16.4},"width":379.79,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-23.png","element":"img","alt":" i = 1, ..., n where �βY","inline":true,"padRight":true},{"text":"(0) are the Post-Lasso estimates using only the observations with ","element":"span"},{"style":{"height":14.22},"width":125.84,"height":35.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/52-24.png","element":"img","alt":" zi = 0.","inline":true}],[{"text":"Estimation of E","element":"span"},{"style":{"height":17.6},"width":561.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-0.png","element":"img","alt":"P [D|Z = 1, X] and EP [Z|X","inline":true},{"text":"] proceed similarly replacing Post-Lasso estimation with post-","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-1.png","element":"img","alt":"ℓ1","inline":true},{"text":"-penalized logistic regression. Specifically, we assume that E","element":"span"},{"style":{"height":17.6},"width":344.5,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-2.png","element":"img","alt":"P [D|Z = 1, X] ≈","inline":true,"padRight":true},{"text":"Λ","element":"span"},{"style":{"height":17.6},"width":199.33,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-3.png","element":"img","alt":"0(f(X)′βD","inline":true},{"text":"(1)) where Λ","element":"span"},{"style":{"height":17.6},"width":47.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-4.png","element":"img","alt":"0(·","inline":true},{"text":") is the logistic link function. We then obtain estimates of ","element":"span"},{"style":{"height":17.6},"width":174.98,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-5.png","element":"img","alt":" βD(1) by","inline":true,"padRight":true},{"text":"using the post-","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-6.png","element":"img","alt":"ℓ1","inline":true},{"text":"-penalized estimator defined in equations ","element":"span"},{"href":"#id-149","text":"(3.10) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-150","text":"(3.11) ","element":"a"},{"text":"based on the logistic link function and with outcomes ","element":"span"},{"style":{"height":17.67},"width":147.66,"height":44.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-7.png","element":"img","alt":" {di}i∈I1","inline":true,"padRight":true},{"text":"and covariates ","element":"span"},{"style":{"height":17.67},"width":330.21,"height":44.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-8.png","element":"img","alt":" {f(xi)}i∈I1 for I1","inline":true,"padRight":true},{"text":"defined as above. We set ","element":"span"},{"style":{"height":19.13},"width":717.64,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-9.png","element":"img","alt":"λ = 1.1√nΦ−1(1 − (.1/ log(n))/(2(2p","inline":true},{"text":"))) where Φ(","element":"span"},{"style":{"height":5.6},"width":12,"height":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-10.png","element":"img","alt":"·","inline":true},{"text":") is the standard normal distribution function. We calculate penalty loadings using Algorithm ","element":"span"},{"href":"#id-118","text":"6.1 ","element":"a"},{"text":"of the main text with a maximum of 15 iterations. Let ","element":"span"},{"style":{"height":16.4},"width":52.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-11.png","element":"img","alt":"�βD","inline":true},{"text":"(1) denote the resulting post-","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-12.png","element":"img","alt":"ℓ1","inline":true},{"text":"-penalized estimates of the coefficients using ","element":"span"},{"style":{"height":16.4},"width":140.28,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-13.png","element":"img","alt":" λ given","inline":true,"padRight":true},{"text":"above and the final set of penalty loadings. We estimate E","element":"span"},{"style":{"height":17.6},"width":745.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-14.png","element":"img","alt":"P [D|Z = 1, X = xi] as Λ0(f(xi)′ �βD(1))","inline":true,"padRight":true},{"text":"for each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ..., n","element":"span"},{"text":". We follow this procedure to obtain estimates of E","element":"span"},{"style":{"height":17.6},"width":543.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-15.png","element":"img","alt":"P [Z|X = xi] as Λ0(f(xi)′ �βZ)","inline":true,"padRight":true},{"text":"for each ","element":"span"},{"style":{"height":16.4},"width":389.02,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-16.png","element":"img","alt":" i = 1, ..., n where �βZ","inline":true,"padRight":true},{"text":"are the post-","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-17.png","element":"img","alt":"ℓ1","inline":true},{"text":"-penalized coefficient estimates obtained with ","element":"span"},{"style":{"height":18.09},"width":132.32,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-18.png","element":"img","alt":" {zi}ni=1","inline":true,"padRight":true},{"text":"as the outcome and ","element":"span"},{"style":{"height":18.09},"width":196.97,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-19.png","element":"img","alt":" {f(xi)}ni=1 ","inline":true,"padRight":true},{"text":"as covariates using ","element":"span"},{"style":{"height":19.13},"width":712.84,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-20.png","element":"img","alt":" λ = 1.1√nΦ−1(1 − (.1/ log(n))/(2p)).","inline":true}],[{"text":"Using these baseline quantities, we obtain estimates","element":"span"}],[{"style":{"width":"78%"},"width":1474,"height":589,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-21.png","element":"img"}],[{"text":"We then plug these estimates in to obtain","element":"span"}],[{"style":{"width":"32%"},"width":614,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-22.png","element":"img"}],[{"text":"We report both analytic and bootstrap standard error estimates for the LATE. The analytic standard errors are calculated as","element":"span"}],[{"style":{"width":"52%"},"width":980,"height":158,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-23.png","element":"img"}],[{"text":"We use wild bootstrap weights for obtaining the multiplier bootstrap estimates of the standard errors with 500 bootstrap replications. Specifically, for each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"b ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ..., ","element":"span"},{"text":"500, we calculate a bootstrap estimate of the LATE as","element":"span"}],[{"style":{"width":"32%"},"width":601,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-24.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":22.53},"width":627.43,"height":56.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-25.png","element":"img","alt":" ξbi = 1 + rb1,i/√2 + ((rb2,i)2 − 1)/","inline":true},{"text":"2 is the bootstrap draw for multiplier weight for observa- ","element":"span"},{"text":"tion ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"in bootstrap repetition ","element":"span"},{"style":{"height":22.42},"width":395.12,"height":56.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-26.png","element":"img","alt":" b where rb1,i and rb2,i ","inline":true,"padRight":true},{"text":"are random numbers generated as iid draws ","element":"span"},{"text":"from two independent standard normal random variables. The bootstrap standard error estimate is then the bootstrap interquartile range rescaled with the normal distribution: [","element":"span"},{"style":{"height":17.6},"width":258.34,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/53-27.png","element":"img","alt":"qLATE(.75) −","inline":true,"padRight":true},{"style":{"height":17.6},"width":552.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-0.png","element":"img","alt":"qLATE(.25)]/[qN(.75) − qN(0.","inline":true},{"text":"25)], where ","element":"span"},{"style":{"height":17.6},"width":333.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-1.png","element":"img","alt":" qLATE(p) is the p","inline":true},{"text":"th quantile of ","element":"span"},{"style":{"height":20.45},"width":490.94,"height":51.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-2.png","element":"img","alt":" {�∆bLATE}500b=1 and qN(p) is","inline":true,"padRight":true},{"text":"the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"th quantile of the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"text":"(0","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1).","element":"span"}],[{"text":"F.2. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Local Quantile Treatment Effects. ","element":"span"},{"text":"Calculation and inference for LQTE is more cumbersome than for the LATE. We begin by choosing the set over which we would like to look at the LQTE. In our example, we chose to look at quantiles in the interval [0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"9].","element":"span"}],[{"text":"To calculate the LQTE, we first calculate the local average structural function for outcomes ","element":"span"},{"style":{"height":17.6},"width":292.05,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-3.png","element":"img","alt":"Yu = 1(Y ≤ u","inline":true},{"text":") for a set of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"and then invert to obtain estimates of the LQTE. In our example, we chose to look at ","element":"span"},{"style":{"height":17.6},"width":902.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-4.png","element":"img","alt":" u ∈ [qY (.05), qY (.95)] where qY (.05) and qY (.","inline":true},{"text":"95) are respectively the sample 5","element":"span"},{"style":{"height":15.53},"width":220.15,"height":38.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-5.png","element":"img","alt":"th and 95th ","inline":true,"padRight":true},{"text":"percentiles of the outcome of interest ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":". ","element":"span"},{"text":"Since looking at the continuum of values in this interval is infeasible, we discretize the interval and look at ","element":"span"},{"style":{"height":17.6},"width":375.38,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-6.png","element":"img","alt":" Yu = 1(Y ≤ u) for","inline":true},{"style":{"height":17.6},"width":1112.25,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-7.png","element":"img","alt":"u ∈ {qY (.05), qY (.06), qY (.07), ..., qY (.93), qY (.94), qY (.95)}","inline":true},{"text":". I.e. we set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"equal to each percentile of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":"between the 5","element":"span"},{"style":{"height":15.53},"width":212.75,"height":38.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-8.png","element":"img","alt":"th and 95th ","inline":true,"padRight":true},{"text":"percentiles for a total of 91 different values of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"to be considered. For each value of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u","element":"span"},{"text":", we need an estimate of the local average structural function defined in ","element":"span"},{"href":"#id-62","text":"(2.3) ","element":"a"},{"text":"for ","element":"span"},{"style":{"height":17.6},"width":194.71,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-9.png","element":"img","alt":" d ∈ {0, 1}:","inline":true}],[{"style":{"width":"48%"},"width":907,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-10.png","element":"img"}],[{"text":"As with the LATE, we need estimates of E","element":"span"},{"style":{"height":17.6},"width":550.87,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-11.png","element":"img","alt":"P [D|Z = 1, X] and EP [Z|X","inline":true},{"text":"]. We estimate these quantities as we did for the LATE but change the value of the penalty parameter used to reflect the fact that we are now interested in a large set, in theory a continuum, of model selection problems. Specifically, we assume that E","element":"span"},{"style":{"height":17.6},"width":589.21,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-12.png","element":"img","alt":"P [D|Z = 1, X] ≈ Λ0(f(X)′βD","inline":true},{"text":"(1)) where Λ","element":"span"},{"style":{"height":17.6},"width":47.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-13.png","element":"img","alt":"0(·","inline":true},{"text":") is the logistic link function and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":") is one of the pre-specified sets of controls discussed in the empirical section with dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":". We then obtain estimates of ","element":"span"},{"style":{"height":16.4},"width":52.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-14.png","element":"img","alt":" βD","inline":true},{"text":"(1) by using the post-","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-15.png","element":"img","alt":"ℓ1","inline":true},{"text":"-penalized estimator defined in equations ","element":"span"},{"href":"#id-149","text":"(3.10) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-150","text":"(3.11) ","element":"a"},{"text":"based on the logistic link function and with outcomes ","element":"span"},{"style":{"height":17.67},"width":236.83,"height":44.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-16.png","element":"img","alt":" {di}i∈I1 and","inline":true,"padRight":true},{"text":"covariates ","element":"span"},{"style":{"height":17.67},"width":343.23,"height":44.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-17.png","element":"img","alt":" {f(xi)}i∈I1 for I1","inline":true,"padRight":true},{"text":"defined as above. We set ","element":"span"},{"style":{"height":19.13},"width":788.56,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-18.png","element":"img","alt":" λ = 1.1√nΦ−1(1 − (1/ log(n))/(2n(2p)))","inline":true,"padRight":true},{"text":"where Φ(","element":"span"},{"style":{"height":5.6},"width":12,"height":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-19.png","element":"img","alt":"·","inline":true},{"text":") is the standard normal distribution function. ","element":"span"},{"text":"We calculate penalty loadings using Algorithm ","element":"span"},{"href":"#id-118","text":"6.1 ","element":"a"},{"text":"with a maximum of 15 iterations. Let ","element":"span"},{"style":{"height":16.4},"width":52.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-20.png","element":"img","alt":"�βD","inline":true},{"text":"(1) denote the resulting post-","element":"span"},{"style":{"height":16},"width":229.59,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-21.png","element":"img","alt":"ℓ1-penalized","inline":true,"padRight":true},{"text":"estimates of the coefficients using ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-22.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"given above and the final set of penalty loadings. We estimate E","element":"span"},{"style":{"height":17.6},"width":677.93,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-23.png","element":"img","alt":"P [D|Z = 1, X = xi] as Λ0(f(xi)′ �βD","inline":true},{"text":"(1)) for each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ..., n","element":"span"},{"text":". We follow this procedure to obtain estimates of E","element":"span"},{"style":{"height":17.6},"width":1079.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-24.png","element":"img","alt":"P [Z|X] as Λ0(f(xi)′ �βZ) for each i = 1, ..., n where �βZ","inline":true,"padRight":true},{"text":"are the post-","element":"span"},{"style":{"height":16},"width":229.59,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-25.png","element":"img","alt":"ℓ1-penalized","inline":true,"padRight":true},{"text":"coefficient estimates obtained with ","element":"span"},{"style":{"height":18.09},"width":132.32,"height":45.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-26.png","element":"img","alt":" {zi}ni=1 ","inline":true,"padRight":true},{"text":"as the outcome and ","element":"span"},{"style":{"height":18.09},"width":196.96,"height":45.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-27.png","element":"img","alt":" {f(xi)}ni=1 ","inline":true,"padRight":true},{"text":"as covariates and ","element":"span"},{"style":{"height":12.8},"width":75.63,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-28.png","element":"img","alt":" λ =","inline":true,"padRight":true},{"text":"1","element":"span"},{"style":{"height":19.13},"width":578.69,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-29.png","element":"img","alt":".1√nΦ−1(1 − (1/ log(n))/(2np","inline":true},{"text":")). We also still have E","element":"span"},{"style":{"height":17.6},"width":824.69,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-30.png","element":"img","alt":"P [D|Z = 0, X] = 0 in our application since","inline":true,"padRight":true},{"text":"one cannot participate in a 401(k) unless one is eligible. We then plug-in these estimates to obtain","element":"span"}],[{"style":{"width":"80%"},"width":1500,"height":360,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/54-31.png","element":"img"}],[{"text":"We also need to obtain estimates of ","element":"span"},{"style":{"height":19.95},"width":273.65,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-0.png","element":"img","alt":" α1d(D)1(Y ≤u)(z","inline":true},{"text":") for each value of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"and for","element":"span"}],[{"style":{"width":"34%"},"width":646,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-1.png","element":"img"}],[{"text":"These estimates will depend on the propensity score, E","element":"span"},{"style":{"height":17.6},"width":123.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-2.png","element":"img","alt":"P [Z|X","inline":true},{"text":"], estimated above and quantities of the form E","element":"span"},{"style":{"height":17.6},"width":595.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-3.png","element":"img","alt":"P [1(D = d)1(Y ≤ u)|Z = z, X","inline":true},{"text":"]. We again approximate this function with E","element":"span"},{"style":{"height":17.6},"width":166.69,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-4.png","element":"img","alt":"P [1(D =","inline":true},{"style":{"height":19.95},"width":843.8,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-5.png","element":"img","alt":"d)1(Y ≤ u)|Z = z, X] ≈ Λ0(X′β1d(D)Yu(z","inline":true},{"text":")) and estimate the coefficients ","element":"span"},{"style":{"height":19.95},"width":391.29,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-6.png","element":"img","alt":" β1d(D)Yu(z) for each","inline":true,"padRight":true},{"text":"combination of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z ","element":"span"},{"text":"and each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"using the post-","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-7.png","element":"img","alt":"ℓ1","inline":true},{"text":"-penalized estimator defined in equations ","element":"span"},{"href":"#id-149","text":"(3.10) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-150","text":"(3.11) ","element":"a"},{"text":"based on the logistic link function. We set ","element":"span"},{"style":{"height":19.13},"width":877.12,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-8.png","element":"img","alt":" λ = 1.1√nΦ−1(1−(1/ log(n))/(2n(2p))) where","inline":true,"padRight":true},{"text":"Φ(","element":"span"},{"style":{"height":5.6},"width":12,"height":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-9.png","element":"img","alt":"·","inline":true},{"text":") is the standard normal distribution function. We calculate penalty loadings using Algorithm ","element":"span"},{"href":"#id-118","text":"6.1 ","element":"a"},{"text":"of the main text with a maximum of 15 iterations. We follow this procedure for each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"u ","element":"span"},{"text":"with ","element":"span"},{"style":{"height":17.67},"width":459.26,"height":44.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-10.png","element":"img","alt":"{1(yi ≤ u)1(di = 1)}i∈I1","inline":true,"padRight":true},{"text":"as the outcome and covariates ","element":"span"},{"style":{"height":17.67},"width":799.49,"height":44.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-11.png","element":"img","alt":" {f(xi)}i∈I1, with {1(yi ≤ u)1(di = 0)}i∈I1","inline":true,"padRight":true},{"text":"as the outcome and covariates ","element":"span"},{"style":{"height":17.67},"width":933.55,"height":44.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-12.png","element":"img","alt":" {f(xi)}i∈I1, and with {1(yi ≤ u)1(di = 0)}i∈I0","inline":true,"padRight":true},{"text":"as the outcome and covariates ","element":"span"},{"style":{"height":17.67},"width":490.2,"height":44.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-13.png","element":"img","alt":" {f(xi)}i∈I0 for I1 and I0","inline":true,"padRight":true},{"text":"defined as above to obtain point estimates ","element":"span"},{"style":{"height":19.95},"width":224.9,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-14.png","element":"img","alt":"�β11(D)Yu(1),","inline":true},{"style":{"height":19.95},"width":478.9,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-15.png","element":"img","alt":"�β10(D)Yu(1), and �β10(D)Yu","inline":true},{"text":"(0) respectively. We then estimate E","element":"span"},{"style":{"height":17.6},"width":685.74,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-16.png","element":"img","alt":"P [1(D = 1)1(Y ≤ u)|Z = 1, X = xi]","inline":true,"padRight":true},{"text":"as Λ","element":"span"},{"style":{"height":19.95},"width":299.06,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-17.png","element":"img","alt":"0(f(xi)′ �β11(D)Yu","inline":true},{"text":"(1)) for each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ..., n ","element":"span"},{"text":"and obtain estimates of E","element":"span"},{"style":{"height":17.6},"width":499.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-18.png","element":"img","alt":"P [1(D = 0)1(Y ≤ u)|Z =","inline":true,"padRight":true},{"text":"1","element":"span"},{"style":{"height":17.6},"width":996.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-19.png","element":"img","alt":", X = xi], and EP [1(D = 0)1(Y ≤ u)|Z = 0, X = xi","inline":true},{"text":"] analogously. As before, we have E","element":"span"},{"style":{"height":17.6},"width":164.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-20.png","element":"img","alt":"P [1(D =","inline":true,"padRight":true},{"text":"1)1(","element":"span"},{"style":{"height":17.6},"width":1795.13,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-21.png","element":"img","alt":"Y ≤ u)|Z = 0, X] = 0 since one cannot participate unless one is eligible. We then plug-in these","inline":true,"padRight":true},{"text":"estimates to obtain","element":"span"}],[{"style":{"width":"103%"},"width":1943,"height":485,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-22.png","element":"img"}],[{"text":"Estimates of the local average structural (distribution) functions are formed using the estimators defined in the previous two paragraphs as","element":"span"}],[{"style":{"width":"48%"},"width":907,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-23.png","element":"img"}],[{"text":"To obtain LQTE estimates, we then need to invert these local average structural functions. Since we only have the estimated distribution for each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"evaluated on the finite grid of points ","element":"span"},{"style":{"height":10.4},"width":77.21,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-24.png","element":"img","alt":" u ∈","inline":true},{"style":{"height":17.6},"width":1026.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-25.png","element":"img","alt":"{qY (.05), qY (.06), qY (.07), ..., qY (.93), qY (.94), qY (.95)}","inline":true},{"text":", we do this inversion by linearly interpolating the value of the distribution function between these points to find the value of the outcome associated with each quantile in the set ","element":"span"},{"style":{"height":17.6},"width":573.65,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-26.png","element":"img","alt":" q ∈ [0.1, 0.11, .0, 12, ..., 0.89, .0.","inline":true},{"text":"9] which we denote as ","element":"span"},{"style":{"height":18.37},"width":166.66,"height":45.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-27.png","element":"img","alt":"�θ←Y (q, d).","inline":true,"padRight":true},{"text":"The LQTE at point ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q ","element":"span"},{"text":"is then estimated as ","element":"span"},{"style":{"height":18.37},"width":522.39,"height":45.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-28.png","element":"img","alt":"�∆(q) = �θ←Y (q, 1) − �θ←Y (q, 0).","inline":true}],[{"text":"For the LQTE, we only report inference based on the multiplier bootstrap using 500 bootstrap replications. For each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"b ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ..., ","element":"span"},{"text":"500, we generate bootstrap weights as ","element":"span"},{"style":{"height":22.53},"width":571.61,"height":56.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-29.png","element":"img","alt":" ξbi = 1+rb1,i/√2+((rb2,i)2−1)/2","inline":true,"padRight":true},{"text":"for observation ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"in bootstrap repetition ","element":"span"},{"style":{"height":22.43},"width":390.84,"height":56.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/55-30.png","element":"img","alt":" b where rb1,i and rb2,i ","inline":true,"padRight":true},{"text":"are random numbers generated as ","element":"span"},{"text":"iid draws from two independent standard normal random variables. We then use these weights to","element":"span"}],[{"text":"form bootstrap estimates of the local average structural functions","element":"span"}],[{"style":{"width":"47%"},"width":893,"height":135,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/56-0.png","element":"img"}],[{"text":"where","element":"span"}],[{"style":{"width":"72%"},"width":1362,"height":500,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/56-1.png","element":"img"}],[{"text":"From these bootstrap estimates of the average structural distribution functions, we obtain bootstrap LQTE estimates as above through inversion by linearly interpolating the value of the distribution function between the finite set of points at which we have estimated values to find the value of the outcome associated with each quantile in the set ","element":"span"},{"style":{"height":17.6},"width":591.03,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/56-2.png","element":"img","alt":" q ∈ [0.1, 0.11, .0, 12, ..., 0.89, .0.","inline":true},{"text":"9], denoted (","element":"span"},{"style":{"height":20.31},"width":186.62,"height":50.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/56-3.png","element":"img","alt":"�θ←Y (q, d))b","inline":true},{"text":". The bootstrap estimate of the LQTE for bootstrap replication ","element":"span"},{"style":{"fontStyle":"italic"},"text":"b ","element":"span"},{"text":"at point ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q ","element":"span"},{"text":"is then ","element":"span"},{"style":{"height":20.31},"width":636.42,"height":50.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/56-4.png","element":"img","alt":"�∆b(q) = (�θ←Y (q, 1))b − (�θ←Y (q, 0))b","inline":true},{"text":". We form bootstrap standard error estimates for the LQTE at ","element":"span"},{"text":"each quantile ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q ","element":"span"},{"text":"as","element":"span"}],[{"style":{"width":"96%"},"width":1800,"height":137,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/56-5.png","element":"img"}],[{"text":"We also use the bootstrap LQTE estimates to obtain the critical values we use when plotting the uniform confidence bands in our example. We form bootstrap t-statistics for each quantile ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q ","element":"span"},{"text":"as ","element":"span"},{"style":{"height":19.53},"width":510.44,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/56-6.png","element":"img","alt":"tb(q) = (�∆b(q) − �∆(q))/s(q","inline":true},{"text":"). We then take ","element":"span"},{"style":{"height":20.15},"width":393.74,"height":50.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/56-7.png","element":"img","alt":" tbmax = maxq{|tb(q)|}","inline":true,"padRight":true},{"text":"and use the 95","element":"span"},{"style":{"height":8.8},"width":32.24,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/56-8.png","element":"img","alt":"th","inline":true,"padRight":true},{"text":"percentile of the ","element":"span"},{"text":"bootstrap distribution of ","element":"span"},{"style":{"height":19.45},"width":78.9,"height":48.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/56-9.png","element":"img","alt":" tbmax ","inline":true,"padRight":true},{"text":"as the critical value in constructing the confidence intervals for our ","element":"span"},{"text":"figures following for example ","element":"span"},{"href":"#id-83","referenceIndex":36,"text":"Chernozhukov et al. ","element":"a"},{"href":"#id-83","referenceIndex":36,"text":"(2013)","element":"a"},{"text":".","element":"span"}]]},{"heading":"References","paragraphs":[[{"id":"id-59","text":"Abadie, A. ","element":"span"},{"text":"(2002): “Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the American Statistical Association","element":"span"},{"text":", 97, 284–292.","element":"span"}],[{"id":"id-60","text":"——— (2003): “Semiparametric Instrumental Variable Estimation of Treatment Response Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 113, 231–263.","element":"span"}],[{"id":"id-32","text":"Ai, C. and X. Chen ","element":"span"},{"text":"(2003): “Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 71, 1795–1843.","element":"span"}],[{"id":"id-33","text":"——— (2012): “The semiparametric efficiency bound for models of sequential moment restrictions ","element":"span"},{"text":"containing unknown functions,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 170, 442–457.","element":"span"}],[{"id":"id-106","text":"Andrews, D. W. ","element":"span"},{"text":"(1994a): “Empirical process methods in econometrics,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Handbook of Econometrics","element":"span"},{"text":", 4, 2247–2294.","element":"span"}],[{"id":"id-25","text":"Andrews, D. W. K. ","element":"span"},{"text":"(1994b): “Asymptotics for semiparametric econometric models via stochastic equicontinuity,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 62, 43–72.","element":"span"}],[{"text":"Angrist, J. D. and J.-S. Pischke ","element":"span"},{"text":"(2008): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mostly Harmless Econometrics: An Empiricist’s Companion","element":"span"},{"text":", Princeton University Press.","element":"span"}],[{"id":"id-45","text":"Bach, F. ","element":"span"},{"text":"(2010): “Self-concordant analysis for logistic regression,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Electronic Journal of Statistics","element":"span"},{"text":", 4, 384–414.","element":"span"}],[{"id":"id-6","text":"Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen ","element":"span"},{"text":"(2012): “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 80, 2369–2429, arxiv, 2010.","element":"span"}],[{"id":"id-35","style":{"height":17.6},"width":963.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/57-0.png","element":"img","alt":"Belloni, A. and V. Chernozhukov (2011): “ℓ1","inline":true},{"text":"-Penalized Quantile Regression for High Dimensional Sparse Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annals of Statistics","element":"span"},{"text":", 39, 82–130.","element":"span"}],[{"id":"id-48","text":"——— (2013): ","element":"span"},{"text":"“Least Squares After Model Selection in High-dimensional Sparse Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Bernoulli","element":"span"},{"text":", 19, 521–547, arXiv, 2009.","element":"span"}],[{"id":"id-57","text":"Belloni, A., V. Chernozhukov, I. Fernandez-Val, and C. Hansen ","element":"span"},{"text":"(2015): “Supplement to “Program Evaluation with High-Dimensional Data”,” Tech. rep., ArXiv.","element":"span"}],[{"id":"id-5","text":"Belloni, A., V. Chernozhukov, and C. Hansen ","element":"span"},{"text":"(2010): “LASSO Methods for Gaussian Instrumental Variables Models,” 2010 arXiv:[math.ST], http://arxiv.org/abs/1012.1297.","element":"span"}],[{"id":"id-3","text":"——— (2013a): “Inference for High-Dimensional Sparse Econometric Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Economics and Econometrics. 10th World Congress of Econometric Society. August 2010","element":"span"},{"text":", III, 245– 295.","element":"span"}],[{"id":"id-4","text":"——— (2014a): “Inference on Treatment Effects After Selection Amongst High-Dimensional Con- ","element":"span"},{"text":"trols,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Review of Economic Studies","element":"span"},{"text":", 81, 608–650.","element":"span"}],[{"id":"id-49","text":"Belloni, A., V. Chernozhukov, and K. Kato ","element":"span"},{"text":"(2013b): “Uniform Post Selection Inference for LAD Regression Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1304.0282","element":"span"},{"text":".","element":"span"}],[{"id":"id-111","text":"Belloni, A., V. Chernozhukov, and L. Wang ","element":"span"},{"text":"(2011): “Square-Root-LASSO: Pivotal Recovery of Sparse Signals via Conic Programming,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Biometrika","element":"span"},{"text":", 98, 791–806, arxiv, 2010.","element":"span"}],[{"text":"Belloni, A., V. Chernozhukov, L. Wang, et al. ","element":"span"},{"text":"(2014b): “Pivotal estimation via square-root lasso in nonparametric regression,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 42, 757–788.","element":"span"}],[{"id":"id-50","text":"Belloni, A., V. Chernozhukov, and Y. Wei ","element":"span"},{"text":"(2013c): “Honest Confidence Regions for Logistic Regression with a Large Number of Controls,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1304.3969","element":"span"},{"text":".","element":"span"}],[{"id":"id-127","text":"Benjamin, D. J. ","element":"span"},{"text":"(2003): “Does 401(k) eligibility increase saving? Evidence from propensity score subclassification,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Public Economics","element":"span"},{"text":", 87, 1259–1290.","element":"span"}],[{"id":"id-12","text":"Berry, S., J. Levinsohn, and A. Pakes ","element":"span"},{"text":"(1995): “Automobile Prices in Market Equilibrium,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 63, 841–890.","element":"span"}],[{"id":"id-14","text":"Bickel, P. J. ","element":"span"},{"text":"(1982): “On adaptive estimation,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 647–671.","element":"span"}],[{"id":"id-137","text":"Bickel, P. J. and D. A. Freedman ","element":"span"},{"text":"(1981): “Some asymptotic theory for the bootstrap,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 1196–1217.","element":"span"}],[{"id":"id-43","text":"Bickel, P. J., Y. Ritov, and A. B. Tsybakov ","element":"span"},{"text":"(2009): “Simultaneous analysis of Lasso and Dantzig selector,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annals of Statistics","element":"span"},{"text":", 37, 1705–1732.","element":"span"}],[{"id":"id-40","text":"Cand`es, E. and T. Tao ","element":"span"},{"text":"(2007): “The Dantzig selector: statistical estimation when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"is much larger than ","element":"span"},{"style":{"height":16},"width":318.56,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/58-0.png","element":"img","alt":" n,” Ann. Statist.","inline":true},{"text":", 35, 2313–2351.","element":"span"}],[{"id":"id-51","text":"Caner, M. and H. H. Zhang ","element":"span"},{"text":"(2014): “Adaptive elastic net for generalized methods of moments,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Business and Economic Statistics","element":"span"},{"text":", 32, 30–47.","element":"span"}],[{"id":"id-129","text":"Cattaneo, M., M. Jansson, and W. Newey ","element":"span"},{"text":"(2010): “Alternative Asymptotics and the Partially Linear Model with Many Regressors,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Working Paper, http://econ-www.mit.edu/files/6204","element":"span"},{"text":".","element":"span"}],[{"id":"id-69","text":"Cattaneo, M. D. ","element":"span"},{"text":"(2010): “Efficient semiparametric estimation of multi-valued treatment effects under ignorability,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 155, 138–154.","element":"span"}],[{"id":"id-102","text":"Chamberlain, G. ","element":"span"},{"text":"(1992): “Efficiency Bounds for Semiparametric Regression,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 60, 567–596.","element":"span"}],[{"id":"id-76","text":"Chamberlain, G. and G. W. Imbens ","element":"span"},{"text":"(2003): “Nonparametric applications of Bayesian inference,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Business & Economic Statistics","element":"span"},{"text":", 21, 12–18.","element":"span"}],[{"text":"Chen, X. ","element":"span"},{"text":"(2007): “Large Sample Sieve Estimatin of Semi-Nonparametric Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Handbook of Econometrics","element":"span"},{"text":", 6, 5559–5632.","element":"span"}],[{"id":"id-34","text":"Chen, X., O. Linton, and I. v. Keilegom ","element":"span"},{"text":"(2003): “Estimation of Semiparametric Models when the Criterion Function Is Not Smooth,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 71, 1591–1608.","element":"span"}],[{"id":"id-15","text":"Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and a. W. ","element":"span"},{"text":"Newey ","element":"span"},{"text":"(2016): “Double Machine Learning for Treatment and Causal Parameters,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ArXiv e-prints","element":"span"},{"text":".","element":"span"}],[{"id":"id-140","text":"Chernozhukov, V., D. Chetverikov, and K. Kato ","element":"span"},{"text":"(2012): “Gaussian approximation of suprema of empirical processes,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ArXiv e-prints","element":"span"},{"text":".","element":"span"}],[{"id":"id-82","style":{"height":17.45},"width":892.78,"height":43.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/58-1.png","element":"img","alt":"Chernozhukov, V. and I. Fern´andez-Val","inline":true,"padRight":true},{"text":"(2005): “Subsampling inference on quantile regression processes,” ","element":"span"},{"style":{"height":16.4},"width":159.92,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/58-2.png","element":"img","alt":" Sankhy¯a","inline":true},{"text":", 67, 253–276.","element":"span"}],[{"id":"id-83","style":{"height":17.45},"width":1153.48,"height":43.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/58-3.png","element":"img","alt":"Chernozhukov, V., I. Fern´andez-Val, and B. Melly","inline":true,"padRight":true},{"text":"(2013): “Inference on counterfactual distributions,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 81, 2205–2268.","element":"span"}],[{"id":"id-22","text":"Chernozhukov, V. and C. Hansen ","element":"span"},{"text":"(2004): “The impact of 401(k) participation on the wealth distribution: An instrumental quantile regression analysis,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Review of Economics and Statistics","element":"span"},{"text":", 86, 735–751.","element":"span"}],[{"id":"id-16","text":"——— (2005): “An IV Model of Quantile Treatment Effects,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 73, 245–262.","element":"span"}],[{"id":"id-17","text":"——— (2006): “Instrumental quantile regression inference for structural and treatment effect mod- ","element":"span"},{"text":"els,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J. Econometrics","element":"span"},{"text":", 132, 491–525.","element":"span"}],[{"id":"id-11","text":"Chernozhukov, V., C. Hansen, and M. Spindler ","element":"span"},{"text":"(2015a): ","element":"span"},{"text":"“Post-Selection and PostRegularization Inference in Linear Models with Very Many Controls and Instruments,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"American Economic Review: Papers and Proceedings","element":"span"},{"text":", 105, 486–490.","element":"span"}],[{"id":"id-13","text":"——— (2015b): “Valid Post-Selection and Post-Regularization Inference: An Elementary, General ","element":"span"},{"text":"Approach,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annual Review of Economics","element":"span"},{"text":", 7, 649–688.","element":"span"}],[{"id":"id-18","text":"Chesher, A. ","element":"span"},{"text":"(2003): “Identification in nonseparable models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 71, 1405–1441.","element":"span"}],[{"id":"id-130","text":"Dudley, R. M. ","element":"span"},{"text":"(1999): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Uniform central limit theorems","element":"span"},{"text":", vol. 63 of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Cambridge Studies in Advanced Mathematics","element":"span"},{"text":", Cambridge: Cambridge University Press.","element":"span"}],[{"text":"Engen, E. M. and W. G. Gale ","element":"span"},{"text":"(2000): “The Effects of 401(k) Plans on Household Wealth: Differences Across Earnings Groups,” Working Paper 8032, National Bureau of Economic Research.","element":"span"}],[{"text":"Engen, E. M., W. G. Gale, and J. K. Scholz ","element":"span"},{"text":"(1996): “The Illusory Effects of Saving Incentives on Saving,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Economic Perspectives","element":"span"},{"text":", 10, 113–138.","element":"span"}],[{"id":"id-20","text":"Escanciano, J. C. and L. Zhu ","element":"span"},{"text":"(2013): “Set inferences and sensitivity analysis in semiparametric conditionally identified models,” CeMMAP working papers CWP55/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.","element":"span"}],[{"id":"id-38","text":"Fan, J. and R. Li ","element":"span"},{"text":"(2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of American Statistical Association","element":"span"},{"text":", 96, 1348–1360.","element":"span"}],[{"id":"id-29","text":"Farrell, M. ","element":"span"},{"text":"(2015): “Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 174, 1–23.","element":"span"}],[{"id":"id-36","text":"Frank, I. E. and J. H. Friedman ","element":"span"},{"text":"(1993): “A Statistical View of Some Chemometrics Regression Tools,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Technometrics","element":"span"},{"text":", 35, 109–135.","element":"span"}],[{"id":"id-65","style":{"height":16.65},"width":601.06,"height":41.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/59-0.png","element":"img","alt":"Fr¨olich, M. and B. Melly","inline":true,"padRight":true},{"text":"(2013): “Identification of treatment effects on the treated with one-sided non-compliance,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometric Reviews","element":"span"},{"text":", 32, 384–414.","element":"span"}],[{"id":"id-170","text":"Ghosal, S., A. Sen, and A. W. van der Vaart ","element":"span"},{"text":"(2000): “Testing Monotonicity of Regression,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ann. Statist.","element":"span"},{"text":", 28, 1054–1082.","element":"span"}],[{"id":"id-74","style":{"height":17.45},"width":448.78,"height":43.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/59-1.png","element":"img","alt":"Gin´e, E. and J. Zinn","inline":true,"padRight":true},{"text":"(1984): “Some limit theorems for empirical processes,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ann. Probab.","element":"span"},{"text":", 12, 929–998, with discussion.","element":"span"}],[{"id":"id-75","text":"Hahn, J. ","element":"span"},{"text":"(1997): “Bayesian bootstrap of the quantile regression estimator: a large sample study,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Internat. Econom. Rev.","element":"span"},{"text":", 38, 795–808.","element":"span"}],[{"id":"id-10","text":"——— (1998): “On the role of the propensity score in efficient semiparametric estimation of average ","element":"span"},{"text":"treatment effects,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 315–331.","element":"span"}],[{"id":"id-79","text":"Hansen, B. E. ","element":"span"},{"text":"(1996): “Inference when a nuisance parameter is not identified under the null hypothesis,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 64, 413–430.","element":"span"}],[{"id":"id-7","text":"Hansen, L. P. ","element":"span"},{"text":"(1982): “Large sample properties of generalized method of moments estimators,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 50, 1029–1054.","element":"span"}],[{"id":"id-8","text":"Hansen, L. P. and K. J. Singleton ","element":"span"},{"text":"(1982): “Generalized instrumental variables estimation of nonlinear rational expectations models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 50, 1269–1286.","element":"span"}],[{"id":"id-158","text":"Heckman, J. and E. J. Vytlacil ","element":"span"},{"text":"(1999): “Local instrumental variables and latent variable models for identifying and bounding treatment effects,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. Natl. Acad. Sci. USA","element":"span"},{"text":", 96, 4730– 4734 (electronic).","element":"span"}],[{"text":"Heckman, J. J. and E. Vytlacil ","element":"span"},{"text":"(2005): “Structural equations, treatment effects, and econometric policy evaluation,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 73, 669–738.","element":"span"}],[{"id":"id-64","text":"Hong, H. and D. Nekipelov ","element":"span"},{"text":"(2010): “Semiparametric efficiency in nonlinear LATE models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Quantitative Economics","element":"span"},{"text":", 1, 279–304.","element":"span"}],[{"text":"Hong, H. and O. Scaillet ","element":"span"},{"text":"(2006): “A fast subsampling method for nonlinear dynamic models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J. Econometrics","element":"span"},{"text":", 133, 557–578.","element":"span"}],[{"id":"id-42","text":"Huang, J., J. L. Horowitz, and S. Ma ","element":"span"},{"text":"(2008): “Asymptotic properties of bridge estimators in sparse high-dimensional regression models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 36, 587613.","element":"span"}],[{"id":"id-46","text":"Huang, J., J. L. Horowitz, and F. Wei ","element":"span"},{"text":"(2010): “Variable selection in nonparametric additive models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ann. Statist.","element":"span"},{"text":", 38, 2282–2313.","element":"span"}],[{"id":"id-61","text":"Imbens, G. W. and J. D. Angrist ","element":"span"},{"text":"(1994): “Identification and Estimation of Local Average Treatment Effects,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 62, 467–475.","element":"span"}],[{"id":"id-19","text":"Imbens, G. W. and W. K. Newey ","element":"span"},{"text":"(2009): “Identification and estimation of triangular simultaneous equations models without additivity,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 77, 1481–1512.","element":"span"}],[{"text":"Imbens, G. W. and D. B. Rubin ","element":"span"},{"text":"(2015): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction","element":"span"},{"text":", Cambridge University Press.","element":"span"}],[{"id":"id-190","text":"Jing, B.-Y., Q.-M. Shao, and Q. Wang ","element":"span"},{"text":"(2003): “Self-normalized Cramr-type large deviations for independent random variables,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ann. Probab.","element":"span"},{"text":", 31, 2167–2215.","element":"span"}],[{"id":"id-47","text":"Kato, K. ","element":"span"},{"text":"(2011): “Group Lasso for high dimensional sparse quantile regression models,” Preprint, ArXiv.","element":"span"}],[{"id":"id-80","text":"Kline, P. and A. Santos ","element":"span"},{"text":"(2012): “A Score Based Approach to Wild Bootstrap Inference,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometric Methods","element":"span"},{"text":", 1, 23–41.","element":"span"}],[{"text":"Koenker, R. ","element":"span"},{"text":"(1988): “Asymptotic Theory and Econometric Practice,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Aplpied Econometrics","element":"span"},{"text":", 3, 139–147.","element":"span"}],[{"id":"id-66","text":"——— (2005): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Quantile regression","element":"span"},{"text":", Cambridge university press.","element":"span"}],[{"id":"id-101","text":"Kosorok, M. R. ","element":"span"},{"text":"(2008): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Introduction to Empirical Processes and Semiparametric Inference","element":"span"},{"text":", Series in Statistics, Berlin: Springer.","element":"span"}],[{"id":"id-0","style":{"height":16.66},"width":663.04,"height":41.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/60-0.png","element":"img","alt":"Leeb, H. and B. M. P¨otscher","inline":true,"padRight":true},{"text":"(2008a): “Can one estimate the unconditional distribution of post-model-selection estimators?” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometric Theory","element":"span"},{"text":", 24, 338–376.","element":"span"}],[{"id":"id-1","text":"——— (2008b): “Recent developments in model selection and related areas,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometric Theory","element":"span"},{"text":", 24, 319–322.","element":"span"}],[{"id":"id-28","text":"Linton, O. ","element":"span"},{"text":"(1996): “Edgeworth approximation for MINPIN estimators in semiparametric regression models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometric Theory","element":"span"},{"text":", 12, 30–60.","element":"span"}],[{"id":"id-77","text":"Mammen, E. ","element":"span"},{"text":"(1993): “Bootstrap and wild bootstrap for high dimensional linear models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 255–285.","element":"span"}],[{"id":"id-44","text":"Meinshausen, N. and B. Yu ","element":"span"},{"text":"(2009): “Lasso-type recovery of sparse representations for high-dimensional data,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annals of Statistics","element":"span"},{"text":", 37, 2246–2270.","element":"span"}],[{"id":"id-24","text":"Newey, W. K. ","element":"span"},{"text":"(1990): “Semiparametric efficiency bounds,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Applied Econometrics","element":"span"},{"text":", 5, 99–135.","element":"span"}],[{"id":"id-26","text":"——— (1994): “The asymptotic variance of semiparametric estimators,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 62, 1349– 1382.","element":"span"}],[{"text":"——— (1997): “Convergence Rates and Asymptotic Normality for Series Estimators,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 79, 147–168.","element":"span"}],[{"id":"id-23","style":{"height":17.6},"width":510.19,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/60-1.png","element":"img","alt":"Neyman, J. (1979): “C(α","inline":true},{"text":") tests and their use,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Sankhya","element":"span"},{"text":", 41, 1–21.","element":"span"}],[{"id":"id-128","text":"Ogburn, E. L., A. Rotnitzky, and J. M. Robins ","element":"span"},{"text":"(2015): “Doubly robust estimation of the local average treatment effect curve,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the Royal Statistical Society: Series B","element":"span"},{"text":", 77, 373–396.","element":"span"}],[{"id":"id-123","text":"Poterba, J. M., S. F. Venti, and D. A. Wise ","element":"span"},{"text":"(1994): “401(k) Plans and Tax-Deferred savings,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Studies in the Economics of Aging","element":"span"},{"text":", ed. by D. A. Wise, Chicago, IL: University of Chicago Press.","element":"span"}],[{"id":"id-124","text":"——— (1995): “Do 401(k) Contributions Crowd Out Other Personal Saving?” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Public Economics","element":"span"},{"text":", 58, 1–32.","element":"span"}],[{"id":"id-125","text":"——— (1996): “Personal Retirement Saving Programs and Asset Accumulation: Reconciling the ","element":"span"},{"text":"Evidence,” Working Paper 5599, National Bureau of Economic Research.","element":"span"}],[{"id":"id-126","text":"——— (2001): “The Transition to Personal Accounts and Increasing Retirement Wealth: Macro ","element":"span"},{"text":"and Micro Evidence,” Working Paper 8610, National Bureau of Economic Research.","element":"span"}],[{"id":"id-2","style":{"height":16.65},"width":295.66,"height":41.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/61-0.png","element":"img","alt":"P¨otscher, B.","inline":true,"padRight":true},{"text":"(2009): “Confidence Sets Based on Sparse Estimators Are Necessarily Large,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Sankhya","element":"span"},{"text":", 71-A, 1–18.","element":"span"}],[{"id":"id-27","text":"Robins, J. M. and A. Rotnitzky ","element":"span"},{"text":"(1995): “Semiparametric efficiency in multivariate regression models with missing data,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J. Amer. Statist. Assoc.","element":"span"},{"text":", 90, 122–129.","element":"span"}],[{"id":"id-30","text":"Robinson, P. M. ","element":"span"},{"text":"(1988): “Root-","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"text":"-consistent semiparametric regression,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 56, 931– 954.","element":"span"}],[{"id":"id-21","text":"Romano, J. P. and A. M. Shaikh ","element":"span"},{"text":"(2012): “On the uniform asymptotic validity of subsampling and the bootstrap,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 40, 2798–2822.","element":"span"}],[{"id":"id-70","text":"Rothe, C. and S. Firpo ","element":"span"},{"text":"(2013): “Semiparametric Estimation and Inference Using Doubly Robust Moment Conditions,” Tech. rep., NYU preprint.","element":"span"}],[{"id":"id-172","text":"Sherman, R. ","element":"span"},{"text":"(1994): “Maximal inequalities for degenerate ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"text":"-processes with applications to optimization estimators,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ann. Statist.","element":"span"},{"text":", 22, 439–459.","element":"span"}],[{"text":"Spindler, M., V. Chernozhukov, and C. Hansen ","element":"span"},{"text":"(2016): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"hdm: High-Dimensional Metrics","element":"span"},{"text":", R package version 0.1.0, http://CRAN.R-project.org/package=hdm.","element":"span"}],[{"id":"id-37","text":"Tibshirani, R. ","element":"span"},{"text":"(1996): “Regression shrinkage and selection via the Lasso,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J. Roy. Statist. Soc. Ser. B","element":"span"},{"text":", 58, 267–288.","element":"span"}],[{"text":"Tsybakov, A. B. ","element":"span"},{"text":"(2009): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Introduction to nonparametric estimation","element":"span"},{"text":", Springer.","element":"span"}],[{"id":"id-41","text":"van de Geer, S. A. ","element":"span"},{"text":"(2008): “High-dimensional generalized linear models and the lasso,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annals of Statistics","element":"span"},{"text":", 36, 614–645.","element":"span"}],[{"id":"id-31","text":"van der Vaart, A. W. ","element":"span"},{"text":"(1991): “On differentiable functionals,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 178–204.","element":"span"}],[{"id":"id-100","text":"——— (1998): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Asymptotic Statistics","element":"span"},{"text":", Cambridge University Press.","element":"span"}],[{"id":"id-72","text":"van der Vaart, A. W. and J. A. Wellner ","element":"span"},{"text":"(1996): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Weak Convergence and Empirical Processes","element":"span"},{"text":", Springer Series in Statistics.","element":"span"}],[{"id":"id-157","text":"Vytlacil, E. J. ","element":"span"},{"text":"(2002): “Independence, Monotonicity, and Latent Index Models: An Equivalence Result,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 70, 331–341.","element":"span"}],[{"text":"Wasserman, L. ","element":"span"},{"text":"(2006): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"All of nonparametric statistics","element":"span"},{"text":", Springer New York.","element":"span"}],[{"text":"Wooldridge, J. M. ","element":"span"},{"text":"(2010): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometric Analysis of Cross Section and Panel Data","element":"span"},{"text":", Cambridge, Massachusetts: The MIT Press, second ed.","element":"span"}],[{"id":"id-39","text":"Zou, H. ","element":"span"},{"text":"(2006): “The Adaptive Lasso And Its Oracle Properties,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the American Statistical Association","element":"span"},{"text":", 101, 1418–1429.","element":"span"}],[{"text":"Supplement to “Program Evaluation and Causal Inference with High-Dimensional Data”","element":"span"}],[{"style":{"width":"63%"},"width":1198,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-0.png","element":"img"}],[{"text":"Abstract. ","element":"span"},{"text":"The supplementary material contains 10 appendices with additional results and some omitted proofs. Appendices ","element":"span"},{"href":"#id-68","text":"F–","element":"a"},{"href":"#id-151","text":"J ","element":"a"},{"text":"include additional results for Sections 2–7, respectively. Appendix ","element":"span"},{"href":"#id-135","text":"K ","element":"a"},{"text":"gathers auxiliary results on algebra of covering entropies. Appendices ","element":"span"},{"href":"#id-152","text":"L ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-132","text":"M ","element":"a"},{"text":"contain the proofs of Sections 4 and 5 omitted from the main text. Appendix ","element":"span"},{"href":"#id-153","text":"N ","element":"a"},{"text":"contains the proofs of Sections 6 omitted from the main text, together with the proofs of the additional results for Section 6 in Appendix ","element":"span"},{"text":"I. ","element":"span"},{"text":"Appendix ","element":"span"},{"href":"#id-154","text":"O ","element":"a"},{"text":"reports the results of a simulation experiment.","element":"span"}]]},{"heading":"Appendix F. Additional Results for Section 2","paragraphs":[[{"text":"F.1. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Causal Interpretations for Structural Parameters. ","element":"span"},{"text":"The quantities discussed in Sections ","element":"span"},{"href":"#id-155","text":"2.2 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-156","text":"2.3 ","element":"a"},{"text":"are well-defined and have causal interpretation under standard conditions. We briefly recall these conditions, using the potential outcomes notation. Let ","element":"span"},{"style":{"height":15.02},"width":219.24,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-1.png","element":"img","alt":" Yu1 and Yu0","inline":true,"padRight":true},{"text":"denote the potential outcomes under the treatment states 1 and 0. These outcomes are not observed jointly, and we instead observe ","element":"span"},{"style":{"height":17.6},"width":917.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-2.png","element":"img","alt":" Yu = DYu1 + (1 − D)Yu0, where D ∈ D = {0, 1}","inline":true,"padRight":true},{"text":"is the random variable indicating program participation or treatment state. Under exogeneity, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"is assigned independently of the potential outcomes conditional on covariates ","element":"span"},{"style":{"height":17.6},"width":541.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-3.png","element":"img","alt":" X, i.e. (Yu1, Yu0) ⊥⊥ D | X","inline":true,"padRight":true},{"text":"a.s., where ","element":"span"},{"style":{"height":12.8},"width":207.03,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-4.png","element":"img","alt":" ⊥⊥ denotes","inline":true,"padRight":true},{"text":"statistical independence.","element":"span"}],[{"text":"Exogeneity fails when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"depends on the potential outcomes. For example, people may drop out of a program if they think the program will not benefit them. In this case, instrumental variables are useful in creating quasi-experimental fluctuations in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"that may identify useful effects. Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":"be a binary instrument, such as an offer of participation, that generates potential participation decisions ","element":"span"},{"style":{"height":15.02},"width":218.1,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-5.png","element":"img","alt":" D1 and D0","inline":true,"padRight":true},{"text":"under the instrument states 1 and 0, respectively. ","element":"span"},{"text":"As with the potential outcomes, the potential participation decisions under both instrument states are not observed jointly. The realized participation decision is then given by ","element":"span"},{"style":{"height":17.6},"width":430.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-6.png","element":"img","alt":" D = ZD1 +(1−Z)D0.","inline":true,"padRight":true},{"text":"We assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":"is assigned randomly with respect to potential outcomes and participation decisions conditional on ","element":"span"},{"style":{"height":17.6},"width":735.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-7.png","element":"img","alt":" X, i.e., (Yu0, Yu1, D0, D1) ⊥⊥ Z | X a.s.","inline":true}],[{"text":"There are many causal quantities of interest for program evaluation. Chief among these are various structural averages: ","element":"span"},{"style":{"height":17.6},"width":223.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-8.png","element":"img","alt":" d �→ EP [Yud","inline":true},{"text":"], the causal ASF; ","element":"span"},{"style":{"height":17.6},"width":761.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-9.png","element":"img","alt":" d �→ EP [Yud | D = 1], the causal ASF-T;","inline":true},{"style":{"height":17.6},"width":427.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-10.png","element":"img","alt":"d �→ EP [Yud | D1 > D0","inline":true},{"text":"], the causal LASF; and ","element":"span"},{"style":{"height":17.6},"width":590.46,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-11.png","element":"img","alt":" d �→ EP [Yud | D1 > D0, D = 1],","inline":true,"padRight":true},{"text":"the causal LASF-T; as well as effects derived from them such as E","element":"span"},{"style":{"height":17.6},"width":216.05,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-12.png","element":"img","alt":"P [Yu1 − Yu0","inline":true},{"text":"], the causal ATE; E","element":"span"},{"style":{"height":17.6},"width":395.83,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-13.png","element":"img","alt":"P [Yu1 − Yu0 | D = 1],","inline":true,"padRight":true},{"text":"the causal ATE-T; E","element":"span"},{"style":{"height":17.6},"width":412.98,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-14.png","element":"img","alt":"P [Yu1−Yu0 | D1 > D0","inline":true},{"text":"], the causal LATE; and E","element":"span"},{"style":{"height":17.6},"width":575.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-15.png","element":"img","alt":"P [Yu1−Yu0 | D1 > D0, D = 1],","inline":true,"padRight":true},{"text":"the causal LATE-T. These causal quantities are the same as the structural parameters defined in Sections 2.2-2.3 under the following well-known sufficient condition.","element":"span"}],[{"id":"id-159","style":{"fontWeight":"bold"},"text":"Assumption F.1 ","element":"span"},{"text":"(Assumptions for Causal/Structural Interpretability)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The following conditions hold ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"style":{"fontStyle":"italic"},"text":"-almost surely: (Exogeneity) ","element":"span"},{"text":"((","element":"span"},{"style":{"height":17.6},"width":586.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-16.png","element":"img","alt":"Yu1, Yu0)u∈U, D1, D0) ⊥⊥ Z | X","inline":true},{"style":{"fontStyle":"italic"},"text":"; (First Stage) ","element":"span"},{"text":"E","element":"span"},{"style":{"height":17.6},"width":237.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-17.png","element":"img","alt":"P [D1 | X] ̸=","inline":true,"padRight":true},{"text":"E","element":"span"},{"style":{"height":17.6},"width":183.29,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-18.png","element":"img","alt":"P [D0 | X]","inline":true},{"style":{"fontStyle":"italic"},"text":"; (Non-Degeneracy) ","element":"span"},{"text":"P","element":"span"},{"style":{"height":17.6},"width":401.29,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-19.png","element":"img","alt":"P (Z = 1 | X) ∈ (0, 1)","inline":true},{"style":{"fontStyle":"italic"},"text":"; (Monotonicity) ","element":"span"},{"text":"P","element":"span"},{"style":{"height":17.6},"width":399.34,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/62-20.png","element":"img","alt":"P (D1 ⩾ D0 | X) = 1.","inline":true}],[{"text":"This condition due to ","element":"span"},{"href":"#id-61","referenceIndex":64,"text":"Imbens and Angrist ","element":"a"},{"href":"#id-61","referenceIndex":64,"text":"(1994) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-60","referenceIndex":2,"text":"Abadie ","element":"a"},{"href":"#id-60","referenceIndex":2,"text":"(2003) ","element":"a"},{"text":"is much-used in the program evaluation literature. It has an equivalent formulation in terms of a simultaneous equation model with a binary endogenous variable; see ","element":"span"},{"href":"#id-157","referenceIndex":100,"text":"Vytlacil ","element":"a"},{"href":"#id-157","referenceIndex":100,"text":"(2002) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-158","referenceIndex":58,"text":"Heckman and Vytlacil ","element":"a"},{"href":"#id-158","referenceIndex":58,"text":"(1999)","element":"a"},{"text":". For a thorough discussion of this assumption, we refer to ","element":"span"},{"href":"#id-61","referenceIndex":64,"text":"Imbens and Angrist ","element":"a"},{"href":"#id-61","referenceIndex":64,"text":"(1994)","element":"a"},{"text":". Using this assumption, we present an identification lemma which follows from results of ","element":"span"},{"href":"#id-60","referenceIndex":2,"text":"Abadie ","element":"a"},{"href":"#id-60","referenceIndex":2,"text":"(2003) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-64","referenceIndex":60,"text":"Hong and ","element":"a"},{"href":"#id-64","referenceIndex":60,"text":"Nekipelov ","element":"a"},{"href":"#id-64","referenceIndex":60,"text":"(2010) ","element":"a"},{"text":"that both in turn build upon ","element":"span"},{"href":"#id-61","referenceIndex":64,"text":"Imbens and Angrist ","element":"a"},{"href":"#id-61","referenceIndex":64,"text":"(1994)","element":"a"},{"text":". The lemma shows that the parameters ","element":"span"},{"style":{"height":16.7},"width":233.54,"height":41.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-0.png","element":"img","alt":" θYu and ϑYu","inline":true,"padRight":true},{"text":"defined earlier have a causal interpretation under Assumption ","element":"span"},{"href":"#id-159","text":"F.1. ","element":"a"},{"text":"Therefore, our referring to them as structural/causal is justified under this condition.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Lemma F.1 ","element":"span"},{"text":"(Identification of Causal Effects )","element":"span"},{"href":"#id-159","style":{"height":16.8},"width":791.98,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-1.png","element":"img","alt":". Under Assumption F.1, for each d ∈ D,","inline":true}],[{"style":{"width":"67%"},"width":1256,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-2.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Furthermore, if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is exogenous, namely ","element":"span"},{"style":{"height":12},"width":127.52,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-3.png","element":"img","alt":" D ≡ Z","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"a.s., then","element":"span"}],[{"style":{"width":"78%"},"width":1461,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-4.png","element":"img"}]]},{"heading":"Appendix G. Additional Results for Section 3","paragraphs":[[{"id":"id-161","style":{"fontWeight":"bold"},"text":"Comment G.1 ","element":"span"},{"text":"(Another strategy for estimating ","element":"span"},{"style":{"height":17.6},"width":246.47,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-5.png","element":"img","alt":" mZ and gV ).","inline":true,"padRight":true},{"text":"An alternative to the strategy for modeling and estimating ","element":"span"},{"style":{"height":16.4},"width":219.94,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-6.png","element":"img","alt":" mZ and gV","inline":true,"padRight":true},{"text":"is to treat ","element":"span"},{"style":{"height":10.7},"width":63.31,"height":26.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-7.png","element":"img","alt":" mZ","inline":true,"padRight":true},{"text":"as in the text via ","element":"span"},{"href":"#id-67","text":"(3.7) ","element":"a"},{"text":"while modeling ","element":"span"},{"style":{"height":12},"width":46.81,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-8.png","element":"img","alt":" gV","inline":true,"padRight":true},{"text":"through its disaggregation","element":"span"}],[{"id":"id-160","style":{"width":"67%"},"width":1267,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-9.png","element":"img"}],[{"text":"where the regression functions ","element":"span"},{"style":{"height":15.1},"width":187.91,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-10.png","element":"img","alt":" eV and lD","inline":true,"padRight":true},{"text":"map the support of (","element":"span"},{"style":{"fontStyle":"italic"},"text":"D, Z, X","element":"span"},{"text":"), ","element":"span"},{"style":{"fontStyle":"italic"},"text":"DZX","element":"span"},{"text":", to the real line and are defined by","element":"span"}],[{"style":{"width":"71%"},"width":1333,"height":121,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-11.png","element":"img"}],[{"text":"We will denote other potential values for the functions ","element":"span"},{"style":{"height":15.1},"width":194.12,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-12.png","element":"img","alt":" eV and lD","inline":true,"padRight":true},{"text":"by the parameters ","element":"span"},{"style":{"fontStyle":"italic"},"text":"e ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l","element":"span"},{"text":". In this alternative approach, we can again use high-dimensional methods for modeling and estimating ","element":"span"},{"style":{"height":15.1},"width":191.37,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-13.png","element":"img","alt":"eV and lD","inline":true,"padRight":true},{"text":"using the same approach as in the main paper, and we can then use the relation ","element":"span"},{"href":"#id-160","text":"(G.1) ","element":"a"},{"text":"to estimate ","element":"span"},{"style":{"height":19.16},"width":96.11,"height":47.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-14.png","element":"img","alt":" gV .31 ","inline":true,"padRight":true},{"text":"Specifically, we model the conditional expectation of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"given ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z","element":"span"},{"text":", and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"by","element":"span"}],[{"id":"id-162","style":{"width":"70%"},"width":1324,"height":201,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/63-15.png","element":"img"}],[{"text":"We model the conditional probability of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"taking on 1 or 0, given ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"by","element":"span"}],[{"id":"id-163","style":{"width":"101%"},"width":1900,"height":552,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-0.png","element":"img"}],[{"text":"As in the strategy in the main text, we maintain approximate sparsity. We assume that there exist ","element":"span"},{"style":{"height":16.4},"width":276.16,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-1.png","element":"img","alt":" βZ, θV and θD","inline":true,"padRight":true},{"text":"such that, for all ","element":"span"},{"style":{"height":15.2},"width":130.81,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-2.png","element":"img","alt":" V ∈ V,","inline":true}],[{"style":{"width":"63%"},"width":1196,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-3.png","element":"img"}],[{"text":"That is, there are at most ","element":"span"},{"style":{"height":12.62},"width":224.54,"height":31.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-4.png","element":"img","alt":" s = sn ≪ n","inline":true,"padRight":true},{"text":"components of ","element":"span"},{"style":{"height":16.4},"width":293.54,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-5.png","element":"img","alt":" θV , θD, and βZ","inline":true,"padRight":true},{"text":"with nonzero values in the approximations to ","element":"span"},{"style":{"height":15.6},"width":270.6,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-6.png","element":"img","alt":" eV , lD and mZ","inline":true},{"text":". The sparsity condition also requires the size of the approximation errors to be small compared to the conjectured size of the estimation error: For all ","element":"span"},{"style":{"height":12.8},"width":118.51,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-7.png","element":"img","alt":" V ∈ V","inline":true},{"text":", we assume","element":"span"}],[{"style":{"width":"85%"},"width":1606,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-8.png","element":"img"}],[{"text":"Note that the size of the approximating model ","element":"span"},{"style":{"height":10.62},"width":125.23,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-9.png","element":"img","alt":" s = sn","inline":true,"padRight":true},{"text":"can grow with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"just as in standard series estimation as long as ","element":"span"},{"style":{"height":19.87},"width":544.92,"height":49.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-10.png","element":"img","alt":" s2 log2(p ∨ n) log2(n)/n → 0.","inline":true}],[{"text":"We proceed with the estimation of ","element":"span"},{"style":{"height":15.1},"width":189.88,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-11.png","element":"img","alt":" eV and lD","inline":true,"padRight":true},{"text":"analogously to the approach outlined in the main text. The Lasso estimator ","element":"span"},{"style":{"height":15.1},"width":46.49,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-12.png","element":"img","alt":"�θV","inline":true,"padRight":true},{"text":"and Post-Lasso estimator ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":15.1},"width":46.48,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-13.png","element":"img","alt":"θV","inline":true,"padRight":true},{"text":"are defined analogously to ","element":"span"},{"style":{"height":16.4},"width":143.66,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-14.png","element":"img","alt":"�βV and","inline":true,"padRight":true},{"text":"˜","element":"span"},{"style":{"height":16.4},"width":50.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-15.png","element":"img","alt":"βV","inline":true,"padRight":true},{"text":"using the data ( ˜","element":"span"},{"style":{"height":20.9},"width":1193.8,"height":52.25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-16.png","element":"img","alt":"Yi, ˜Xi)ni=1= (Vi, f(Di, Zi, Xi))ni=1 and the link function Λ = ΓV","inline":true,"padRight":true},{"text":". The estimator ","element":"span"},{"style":{"height":20.61},"width":1126.67,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-17.png","element":"img","alt":"�eV (D, Z, X) = ΓV [f(D, Z, X)′¯θV ], with ¯θV = �θV or ¯θV = ˜θV","inline":true,"padRight":true},{"text":", has the near oracle rate of convergence ","element":"span"},{"style":{"height":20.8},"width":238.71,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-18.png","element":"img","alt":"�(s log p)/n","inline":true,"padRight":true},{"text":"and other desirable properties. The Lasso estimator ","element":"span"},{"style":{"height":15.1},"width":48.49,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-19.png","element":"img","alt":"�θD","inline":true,"padRight":true},{"text":"and Post-Lasso estimators ˜","element":"span"},{"style":{"height":15.1},"width":48.48,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-20.png","element":"img","alt":"θD","inline":true,"padRight":true},{"text":"are also defined analogously to ","element":"span"},{"style":{"height":19.81},"width":208.42,"height":49.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-21.png","element":"img","alt":" �βV and ˜βV","inline":true,"padRight":true},{"text":"using the data ( ˜","element":"span"},{"style":{"height":20.9},"width":651.41,"height":52.25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-22.png","element":"img","alt":"Yi, ˜Xi)ni=1= (Di, f(Zi, Xi))ni=1 and","inline":true},{"style":{"height":15.5},"width":497.01,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-23.png","element":"img","alt":"the link function Λ = ΓD","inline":true},{"text":". Again, the estimator ","element":"span"},{"style":{"height":19.4},"width":913.22,"height":48.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-24.png","element":"img","alt":"�lD(Z, X) = ΓD[f(Z, X)′¯θD] of lD(Z, X), where","inline":true,"padRight":true},{"text":"¯","element":"span"},{"style":{"height":18.91},"width":384.05,"height":47.27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-25.png","element":"img","alt":"θD = �θD or ¯θD = ˜θD","inline":true},{"text":", has good theoretical properties including the near oracle rate of convergence, ","element":"span"},{"style":{"height":20.8},"width":238.71,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-26.png","element":"img","alt":"�(s log p)/n","inline":true},{"text":". The resulting estimator for ","element":"span"},{"style":{"height":16.4},"width":193.07,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-27.png","element":"img","alt":" gV is then","inline":true}],[{"style":{"width":"67%"},"width":1267,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-28.png","element":"img"}],[{"text":"The remaining estimation steps are the same as with the strategy given in the main text.","element":"span"}]]},{"heading":"Appendix H. Additional Results for Section 4","paragraphs":[[{"id":"id-119","style":{"fontWeight":"bold"},"text":"Assumption H.1 ","element":"span"},{"text":"(Approximate Sparsity for the Strategy of Section ","element":"span"},{"href":"#id-161","text":"G.1)","element":"a"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under each ","element":"span"},{"style":{"height":14.62},"width":147.08,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-29.png","element":"img","alt":" P ∈ Pn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and for each ","element":"span"},{"style":{"height":13.82},"width":127.56,"height":34.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-30.png","element":"img","alt":" n ⩾ n0","inline":true},{"style":{"fontStyle":"italic"},"text":", uniformly for all ","element":"span"},{"style":{"height":12.8},"width":118.51,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-31.png","element":"img","alt":" V ∈ V","inline":true},{"style":{"fontStyle":"italic"},"text":": (i) The approximations ","element":"span"},{"href":"#id-162","style":{"fontStyle":"italic"},"text":"(G.4)","element":"a"},{"style":{"fontStyle":"italic"},"text":"-","element":"span"},{"href":"#id-163","style":{"fontStyle":"italic"},"text":"(G.10) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-67","style":{"fontStyle":"italic"},"text":"(3.7) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"apply with the link functions ","element":"span"},{"text":"Γ","element":"span"},{"style":{"height":16},"width":262.34,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-32.png","element":"img","alt":"V , ΓD and ΛZ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"belonging to the set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L","element":"span"},{"style":{"fontStyle":"italic"},"text":", the sparsity condition ","element":"span"},{"style":{"height":17.6},"width":300.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-33.png","element":"img","alt":" ∥θV ∥0+∥θD∥0+","inline":true},{"style":{"height":17.6},"width":193.63,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-34.png","element":"img","alt":"∥βZ∥0 ⩽ s","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"holding, the approximation errors satisfying ","element":"span"},{"style":{"height":21.04},"width":824.45,"height":52.59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/64-35.png","element":"img","alt":" ∥ϱD∥P,2 + ∥ϱV ∥P,2 + ∥rZ∥P,2 ⩽ δnn−1/4 and","inline":true,"padRight":true},{"style":{"height":18.3},"width":705.08,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-0.png","element":"img","alt":"∥ϱD∥P,∞ + ∥ϱV ∥P,∞ + ∥rZ∥P,∞ ⩽ ϵn","inline":true},{"style":{"fontStyle":"italic"},"text":", and the sparsity index ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and the number of terms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"style":{"fontStyle":"italic"},"text":"in the vector ","element":"span"},{"style":{"height":19.87},"width":786.47,"height":49.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-1.png","element":"img","alt":" f(X) obeying s2 log2(p ∨ n) log2 n ⩽ δnn","inline":true},{"style":{"fontStyle":"italic"},"text":". (ii) There are estimators ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":18.61},"width":410.27,"height":46.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-2.png","element":"img","alt":"θV , ¯θD, and ¯βZ such","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"that, with probability no less than ","element":"span"},{"text":"1","element":"span"},{"style":{"height":15.42},"width":95.54,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-3.png","element":"img","alt":"−∆n","inline":true},{"style":{"fontStyle":"italic"},"text":", the estimation errors satisfy ","element":"span"},{"style":{"height":20.18},"width":543.28,"height":50.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-4.png","element":"img","alt":" ∥f(D, Z, X)′(¯θV −θV )∥Pn,2+","inline":true},{"style":{"height":21.11},"width":1872.06,"height":52.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-5.png","element":"img","alt":"∥f(Z, X)′(¯θD − θD)∥Pn,2 + ∥f(X)′(¯βZ − βZ)∥Pn,2 ⩽ δnn−1/4 and Kn∥¯θV − θV ∥1 + Kn∥¯θD − θD∥1 +","inline":true},{"style":{"height":19.41},"width":389.05,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-6.png","element":"img","alt":"Kn∥¯βZ − βZ∥1 ⩽ ϵn","inline":true},{"style":{"fontStyle":"italic"},"text":"; the estimators are sparse such that ","element":"span"},{"style":{"height":19.41},"width":754.58,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-7.png","element":"img","alt":" ∥¯θV ∥0 + ∥¯θD∥0 + ∥¯βZ∥0 ⩽ Cs; and the","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"empirical and population norms induced by the Gram matrix formed by ","element":"span"},{"text":"(","element":"span"},{"style":{"height":18.09},"width":181.52,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-8.png","element":"img","alt":"f(Xi))ni=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are equivalent ","element":"span"},{"style":{"fontStyle":"italic"},"text":"on sparse subsets, ","element":"span"},{"text":"sup","element":"span"},{"style":{"height":20.59},"width":838.57,"height":51.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-9.png","element":"img","alt":"∥δ∥0⩽ℓns |∥f(X)′δ∥Pn,2/∥f(X)′δ∥P,2 − 1| ⩽ ϵn","inline":true},{"style":{"fontStyle":"italic"},"text":". (iii) The following boundedness conditions hold: ","element":"span"},{"style":{"height":18.3},"width":762.95,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-10.png","element":"img","alt":" ∥∥f(X)∥∞||P,∞ ⩽ Kn and ∥V ∥P,∞ ⩽ C.","inline":true}],[{"text":"Under the stated assumptions, the empirical reduced form process ","element":"span"},{"style":{"height":18.47},"width":322.2,"height":46.18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-11.png","element":"img","alt":"�Zn,P = √n(�ρ − ρ","inline":true},{"text":") defined by ","element":"span"},{"href":"#id-86","text":"(3.16)","element":"a"},{"text":", but constructed using the alternative strategy for estimating ","element":"span"},{"style":{"height":16.4},"width":211.99,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-12.png","element":"img","alt":" mZ and gV","inline":true,"padRight":true},{"text":"of Comment ","element":"span"},{"href":"#id-161","text":"G.1, ","element":"a"},{"text":"follows a functional central limit theorem and a functional central limit theorem for the multiplier bootstrap. Theorem ","element":"span"},{"href":"#id-164","text":"H.1 ","element":"a"},{"text":"states these results. We omit the proof because it is analogous to the proofs of Theorems ","element":"span"},{"href":"#id-165","text":"4.1–","element":"a"},{"href":"#id-166","text":"4.2.","element":"a"}],[{"id":"id-164","style":{"fontWeight":"bold"},"text":"Theorem H.1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under Assumption ","element":"span"},{"href":"#id-119","style":{"fontStyle":"italic"},"text":"H.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"the results stated in Theorems ","element":"span"},{"href":"#id-165","style":{"fontStyle":"italic"},"text":"4.1–","element":"a"},{"href":"#id-166","style":{"fontStyle":"italic"},"text":"4.2 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"in the main text apply to the alternative strategy for estimating ","element":"span"},{"style":{"height":16.4},"width":212.69,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-13.png","element":"img","alt":" mZ and gV","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"of Comment ","element":"span"},{"href":"#id-161","style":{"fontStyle":"italic"},"text":"G.1.","element":"a"}]]},{"heading":"Appendix I. Additional Results for Section 6: Finite Sample Results of a","paragraphs":[[{"style":{"width":"76%"},"width":1432,"height":94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-14.png","element":"img"}],[{"text":"I.1. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Assumptions. ","element":"span"},{"text":"We consider the following high level conditions which are implied by the primitive Assumptions ","element":"span"},{"href":"#id-114","text":"6.1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-121","text":"6.2. ","element":"a"},{"text":"For each ","element":"span"},{"style":{"height":13.6},"width":72.31,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-15.png","element":"img","alt":" n ⩾","inline":true,"padRight":true},{"text":"1, there is a sequence of independent random variables (","element":"span"},{"style":{"height":18.09},"width":126.57,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-16.png","element":"img","alt":"Wi)ni=1","inline":true},{"text":", defined on the probability space (Ω","element":"span"},{"style":{"height":16},"width":155.78,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-17.png","element":"img","alt":", AΩ, PP","inline":true,"padRight":true},{"text":") such that model ","element":"span"},{"href":"#id-110","text":"(6.1) ","element":"a"},{"text":"holds with ","element":"span"},{"style":{"height":19.53},"width":228.14,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-18.png","element":"img","alt":" U ⊂ [0, 1]du.","inline":true,"padRight":true},{"text":"Let ","element":"span"},{"style":{"height":15.5},"width":47.71,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-19.png","element":"img","alt":" dU","inline":true,"padRight":true},{"text":"be a metric on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U ","element":"span"},{"text":"(and note that the results cover the case where ","element":"span"},{"style":{"height":15.02},"width":42.71,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-20.png","element":"img","alt":" du","inline":true,"padRight":true},{"text":"is a function of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"). Throughout this section we assume that the variables (","element":"span"},{"style":{"height":17.6},"width":809.18,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-21.png","element":"img","alt":"Xi, (Yui, ζui := Yui − EP [Yui | Xi])u∈U) are","inline":true,"padRight":true},{"text":"generated as suitably measurable transformations of ","element":"span"},{"style":{"height":15.02},"width":262.84,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-22.png","element":"img","alt":" Wi and u ∈ U","inline":true},{"text":". Furthermore, this section uses the notation ","element":"span"},{"text":"¯","element":"span"},{"text":"E","element":"span"},{"style":{"height":21.29},"width":363.98,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-23.png","element":"img","alt":"P [·] = 1n�ni=1 EP [·","inline":true},{"text":"], because we allow for independent non-identically distributed ","element":"span"},{"text":"(i.n.i.d.) data.","element":"span"}],[{"text":"Consider fixed sequences of positive numbers ","element":"span"},{"style":{"height":16.8},"width":530.57,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-24.png","element":"img","alt":" δn ↘ 0, ϵn ↘ 0, and ∆n ↘","inline":true,"padRight":true},{"text":"0 at a speed at most polynomial in ","element":"span"},{"style":{"height":16.4},"width":595.1,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-25.png","element":"img","alt":" n, ℓn = log n, and 1 ⩽ Kn < ∞","inline":true},{"text":"; and positive constants ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"which will not vary with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":".","element":"span"}],[{"style":{"width":"98%"},"width":1842,"height":61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-26.png","element":"img"}],[{"style":{"height":14.62},"width":56.06,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-27.png","element":"img","alt":"Nn","inline":true},{"style":{"fontStyle":"italic"},"text":"; (ii) uniformly over ","element":"span"},{"style":{"height":12.8},"width":119.23,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-28.png","element":"img","alt":" u ∈ U","inline":true},{"style":{"fontStyle":"italic"},"text":", we have that ","element":"span"},{"text":"max","element":"span"},{"style":{"height":10.8},"width":58.33,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-29.png","element":"img","alt":"j⩽p","inline":true}],[{"style":{"height":19.83},"width":1872.92,"height":49.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-30.png","element":"img","alt":"and 0 < c ⩽ ¯EP [|fj(X)ζu|2] ⩽ C, j = 1, . . . , p; and (iii) with probability 1 − ∆n, we have","inline":true}],[{"style":{"width":"99%"},"width":1868,"height":179,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-31.png","element":"img"}],[{"text":"The following technical lemma justifies the choice of penalty level ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-32.png","element":"img","alt":" λ","inline":true},{"text":". It is based on self-normalized moderate deviation theory. In what follows, for ","element":"span"},{"style":{"height":15.02},"width":329,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-33.png","element":"img","alt":" u ∈ U we let �Ψu0","inline":true,"padRight":true},{"text":"denote a diagonal ","element":"span"},{"style":{"height":15.2},"width":241.62,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-34.png","element":"img","alt":" p × p matrix","inline":true,"padRight":true},{"text":"of “ideal loadings” with diagonal elements given by ","element":"span"},{"style":{"height":23.23},"width":806.52,"height":58.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/65-35.png","element":"img","alt":"�Ψu0jj = {En[f2j (X)ζ2u]}1/2 for j = 1, . . . , p.","inline":true}],[{"id":"id-179","style":{"fontWeight":"bold"},"text":"Lemma I.1 ","element":"span"},{"text":"(Choice of ","element":"span"},{"style":{"height":17.6},"width":56.42,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-0.png","element":"img","alt":" λ).","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"Suppose Condition WL holds, let ","element":"span"},{"style":{"height":12.4},"width":238.92,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-1.png","element":"img","alt":" c′ > c > 1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be constants, ","element":"span"},{"style":{"height":13.2},"width":79.04,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-2.png","element":"img","alt":" γ ∈","inline":true,"padRight":true},{"text":"[1","element":"span"},{"style":{"height":19.13},"width":1226.03,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-3.png","element":"img","alt":"/n, 1/ log n], and λ = c′√nΦ−1(1 − γ/{2pNn}). Then for n ⩾ n0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"large enough depending only on Condition WL,","element":"span"}],[{"style":{"width":"55%"},"width":1032,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-4.png","element":"img"}],[{"text":"We note that Condition WL(iii) contains high level conditions on the process (","element":"span"},{"style":{"height":17.6},"width":303.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-5.png","element":"img","alt":"Yu, ζu)u∈U. The","inline":true,"padRight":true},{"text":"following lemma provides easy to verify sufficient conditions that imply Condition WL(iii).","element":"span"}],[{"id":"id-115","style":{"fontWeight":"bold"},"text":"Lemma I.2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose the i.i.d. sequence ","element":"span"},{"text":"((","element":"span"},{"style":{"height":17.6},"width":555.99,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-6.png","element":"img","alt":"Yui, ζui)u∈U, Xi), i = 1, . . . , n","inline":true},{"style":{"fontStyle":"italic"},"text":", satisfies the following conditions: (i) ","element":"span"},{"style":{"height":19.75},"width":1582.74,"height":49.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-7.png","element":"img","alt":" c ⩽ maxj⩽p EP [fj(X)2] ⩽ C, maxj⩽p |fj(X)| ⩽ Kn, supu∈U maxi⩽n |Yui| ⩽ Bn, and","inline":true},{"style":{"height":19.72},"width":583.08,"height":49.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-8.png","element":"img","alt":"c ⩽ supu∈U EP [ζ2u | X] ⩽ C, P","inline":true},{"style":{"fontStyle":"italic"},"text":"-a.s.; (ii) for some random variable ","element":"span"},{"style":{"height":17.6},"width":591.46,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-9.png","element":"img","alt":" Y we have Yu = G(Y, u) where","inline":true},{"style":{"height":17.6},"width":314.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-10.png","element":"img","alt":"{G(·, u) : u ∈ U}","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a VC-class of functions with VC-index equal to ","element":"span"},{"style":{"height":15.02},"width":88.2,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-11.png","element":"img","alt":" C′du","inline":true},{"style":{"fontStyle":"italic"},"text":", (iii) For some fixed ","element":"span"},{"style":{"height":14.8},"width":117.33,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-12.png","element":"img","alt":" ν > 0,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"we have ","element":"span"},{"text":"E","element":"span"},{"style":{"height":19.13},"width":1676.03,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-13.png","element":"img","alt":"P [|Yu − Yu′|2 | X] ⩽ Ln|u − u′|ν for any u, u′ ∈ U, P-a.s. For �A := pnKnBnnν/Ln, we","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"have with probability ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":15.42},"width":101,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-14.png","element":"img","alt":" − ∆n","inline":true}],[{"style":{"width":"91%"},"width":1704,"height":365,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-15.png","element":"img"}],[{"text":"Lemma ","element":"span"},{"href":"#id-115","text":"I.2 ","element":"a"},{"text":"allows for several different cases including cases where ","element":"span"},{"style":{"height":14.62},"width":45.33,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-16.png","element":"img","alt":" Yu","inline":true,"padRight":true},{"text":"is generated by a non-smooth transformation of a random variable ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":". For example, if ","element":"span"},{"style":{"height":17.6},"width":493.62,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-17.png","element":"img","alt":" Yu = 1{Y ⩽ u} where Y","inline":true,"padRight":true},{"text":"has bounded conditional probability density function, we have ","element":"span"},{"style":{"height":19.95},"width":924.26,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-18.png","element":"img","alt":" du = 1, Bn = 1, ν = 1, Ln = supy fY |X(y | x). A","inline":true,"padRight":true},{"text":"similar result holds for independent non-identically distributed data.","element":"span"}],[{"text":"In what follows for a vector ","element":"span"},{"style":{"height":13.2},"width":128.08,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-19.png","element":"img","alt":" δ ∈ Rp","inline":true},{"text":", and a set of indices ","element":"span"},{"style":{"height":17.6},"width":278.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-20.png","element":"img","alt":" T ⊆ {1, . . . , p}","inline":true},{"text":", we denote by ","element":"span"},{"style":{"height":15.1},"width":152.84,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-21.png","element":"img","alt":" δT ∈ Rp","inline":true,"padRight":true},{"text":"the vector such that (","element":"span"},{"style":{"height":18.22},"width":844.13,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-22.png","element":"img","alt":"δT )j = δj if j ∈ T and (δT )j = 0 if j /∈ T","inline":true},{"text":". For a set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"denotes the cardinality of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":". Moreover, let","element":"span"}],[{"style":{"width":"37%"},"width":694,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-23.png","element":"img"}],[{"id":"id-181","text":"I.2. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Finite Sample Results: Linear Case. ","element":"span"},{"text":"For the model described in ","element":"span"},{"href":"#id-110","text":"(6.1) ","element":"a"},{"text":"with Λ(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") = ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":21.29},"width":353.26,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-24.png","element":"img","alt":"M(y, t) = 12(y−t)2","inline":true,"padRight":true},{"text":"we will study the finite sample properties of the associated Lasso and Post-Lasso ","element":"span"},{"text":"estimators of (","element":"span"},{"style":{"height":17.6},"width":126.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-25.png","element":"img","alt":"θu)u∈U","inline":true,"padRight":true},{"text":"defined in relations ","element":"span"},{"href":"#id-116","text":"(6.2) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-117","text":"(6.3)","element":"a"},{"text":".","element":"span"}],[{"text":"The analysis relies on ","element":"span"},{"style":{"height":17.6},"width":877.3,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-26.png","element":"img","alt":" Tu = supp(θu), su := ∥θu∥0 ⩽ s, with s ⩾","inline":true,"padRight":true},{"text":"1, and on the restricted eigenvalues","element":"span"}],[{"id":"id-180","style":{"width":"64%"},"width":1210,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-27.png","element":"img"}],[{"text":"and maximum and minimum sparse eigenvalues","element":"span"}],[{"style":{"width":"75%"},"width":1419,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/66-28.png","element":"img"}],[{"text":"Next we present technical results on the performance of the estimators generated by Lasso that are used in the proof of Theorem ","element":"span"},{"href":"#id-120","text":"6.1.","element":"a"}],[{"style":{"width":"8%"},"width":155,"height":4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-0.png","element":"img"}],[{"id":"id-177","style":{"fontWeight":"bold"},"text":"Lemma I.3 ","element":"span"},{"text":"(Rates of Convergence for Lasso)","element":"span"},{"style":{"height":18.37},"width":966.26,"height":45.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-1.png","element":"img","alt":". The events cr ⩾ supu∈U ∥ru∥Pn,2, ℓ�Ψu0 ⩽ �Ψu ⩽","inline":true},{"style":{"height":20.74},"width":1867.04,"height":51.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-2.png","element":"img","alt":"L�Ψu0, u ∈ U, and λ/n ⩾ c supu∈U ∥�Ψ−1u0 En[f(X)ζu]∥∞, for c > 1/ℓ, imply that uniformly in u ∈ U","inline":true}],[{"style":{"width":"90%"},"width":1703,"height":362,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-3.png","element":"img"}],[{"text":"The following lemma summarizes sparsity properties of (","element":"span"},{"style":{"height":17.6},"width":139.57,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-4.png","element":"img","alt":"�θu)u∈U.","inline":true}],[{"id":"id-182","style":{"width":"101%"},"width":1893,"height":548,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-5.png","element":"img"}],[{"id":"id-178","style":{"fontWeight":"bold"},"text":"Lemma I.5 ","element":"span"},{"text":"(Rate of Convergence of Post-Lasso)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under Conditions WL, let ","element":"span"},{"style":{"height":15.02},"width":40.48,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-6.png","element":"img","alt":"�θu","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be the Post-Lasso estimator based on the support ","element":"span"},{"style":{"height":14.62},"width":45.5,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-7.png","element":"img","alt":"�Tu","inline":true},{"style":{"fontStyle":"italic"},"text":". Then, with probability ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":17.6},"width":120.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-8.png","element":"img","alt":" − o(1)","inline":true},{"style":{"fontStyle":"italic"},"text":", uniformly over ","element":"span"},{"style":{"height":15.6},"width":288.96,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-9.png","element":"img","alt":" u ∈ U, we have","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"for ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":17.6},"width":171.47,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-10.png","element":"img","alt":"su = | �Tu|","inline":true}],[{"style":{"width":"34%"},"width":641,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-11.png","element":"img"}],[{"style":{"height":18.37},"width":612.16,"height":45.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-12.png","element":"img","alt":"∥EP [Yu | X] − f(X)′�θu∥Pn,2 ⩽ C","inline":true}],[{"style":{"width":"75%"},"width":1404,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-13.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Moreover, if ","element":"span"},{"text":"supp(","element":"span"},{"style":{"height":17.6},"width":504.89,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-14.png","element":"img","alt":"�θu) ⊆ �Tu for every u ∈ U","inline":true},{"style":{"fontStyle":"italic"},"text":", the following events ","element":"span"},{"style":{"height":18.37},"width":574.94,"height":45.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-15.png","element":"img","alt":" cr ⩾ supu∈U ∥ru∥Pn,2, ℓ�Ψu0 ⩽","inline":true},{"style":{"height":20.74},"width":1380.14,"height":51.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-16.png","element":"img","alt":"�Ψu ⩽ L�Ψu0, u ∈ U, and λ/n ⩾ c supu∈U ∥�Ψ−1u0 En[f(X)ζu]∥∞, for c > 1/ℓ","inline":true},{"style":{"fontStyle":"italic"},"text":", imply that","element":"span"}],[{"style":{"width":"78%"},"width":1473,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-17.png","element":"img"}],[{"text":"I.3. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Finite Sample Results: ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Logistic Case. ","element":"span"},{"text":"For the model described in ","element":"span"},{"href":"#id-110","text":"(6.1) ","element":"a"},{"text":"with Λ(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") = exp(","element":"span"},{"style":{"height":17.6},"width":1515.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-18.png","element":"img","alt":"t)/{1 + exp(t)} and M(y, t) = −{1{y = 1} log(Λ(t)) + 1{y = 0} log(1 − Λ(t))}","inline":true,"padRight":true},{"text":"we will study finite the sample properties of the associated Lasso and Post-Lasso estimators of (","element":"span"},{"style":{"height":17.6},"width":279.34,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-19.png","element":"img","alt":"θu)u∈U defined","inline":true,"padRight":true},{"text":"in relations ","element":"span"},{"href":"#id-116","text":"(6.2) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-117","text":"(6.3)","element":"a"},{"text":". In what follows we use the notation","element":"span"}],[{"style":{"width":"28%"},"width":541,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-20.png","element":"img"}],[{"text":"In the finite sample analysis we will consider not only the design matrix ","element":"span"},{"style":{"height":17.6},"width":451.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-21.png","element":"img","alt":" En[f(X)f(X)′] but also","inline":true,"padRight":true},{"text":"a weighted counterpart ","element":"span"},{"style":{"height":17.6},"width":1418.45,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-22.png","element":"img","alt":" En[wuf(X)f(X)′] where wui = EP [Yui | Xi](1 − EP [Yui | Xi]), i = 1, . . . , n,","inline":true},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-23.png","element":"img","alt":"u ∈ U","inline":true},{"text":", is the conditional variance of the outcome variable ","element":"span"},{"style":{"height":14.62},"width":70.47,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-24.png","element":"img","alt":" Yui.","inline":true}],[{"id":"id-187","style":{"width":"97%"},"width":1819,"height":175,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/67-25.png","element":"img"}],[{"text":"For a subset ","element":"span"},{"style":{"height":16},"width":341.83,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-0.png","element":"img","alt":" Au ⊂ Rp, u ∈ U","inline":true},{"text":", let the non-linear impact coefficient ","element":"span"},{"href":"#id-35","referenceIndex":10,"text":"Belloni and Chernozhukov ","element":"a"},{"href":"#id-35","referenceIndex":10,"text":"(2011)","element":"a"},{"text":"; ","element":"span"},{"href":"#id-50","referenceIndex":19,"text":"Belloni et al. ","element":"a"},{"href":"#id-50","referenceIndex":19,"text":"(2013c) ","element":"a"},{"text":"be defined as","element":"span"}],[{"style":{"width":"66%"},"width":1243,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-1.png","element":"img"}],[{"text":"Note that ¯","element":"span"},{"style":{"height":12.3},"width":62.85,"height":30.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-2.png","element":"img","alt":"qAu","inline":true,"padRight":true},{"text":"can be bounded as","element":"span"}],[{"style":{"width":"62%"},"width":1175,"height":123,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-3.png","element":"img"}],[{"text":"which can lead to interesting bounds provided ","element":"span"},{"style":{"height":15.42},"width":52.72,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-4.png","element":"img","alt":" Au","inline":true,"padRight":true},{"text":"is appropriate (like the restrictive set ∆","element":"span"},{"style":{"height":8},"width":46.77,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-5.png","element":"img","alt":"c,u","inline":true,"padRight":true},{"text":"in the definition of restricted eigenvalues). ","element":"span"},{"text":"In Lemma ","element":"span"},{"href":"#id-167","text":"I.6 ","element":"a"},{"text":"we have ","element":"span"},{"style":{"height":18.32},"width":508.25,"height":45.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-6.png","element":"img","alt":" Au = ∆2˜c,u ∪ {δ ∈ Rp :","inline":true},{"style":{"height":31.83},"width":1123.01,"height":79.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-7.png","element":"img","alt":"∥δ∥1 ⩽ 6c∥�Ψ−1u0 ∥∞ℓc−1 nλ∥ ru√wu ∥Pn,2∥√wuf(X)′δ∥Pn,2}, for u ∈ U","inline":true},{"text":". For this choice of sets, and provided that with probability 1 ","element":"span"},{"style":{"height":8.4},"width":68.55,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-8.png","element":"img","alt":" − o","inline":true},{"text":"(1) we have ","element":"span"},{"style":{"height":21.26},"width":1080.23,"height":53.15,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-9.png","element":"img","alt":" ℓc > c′ > 1, supu∈U ∥ru/√wu∥Pn,2 ≲ �s log(p ∨ n)/n,","inline":true,"padRight":true},{"text":"sup","element":"span"},{"style":{"height":20.82},"width":771.28,"height":52.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-10.png","element":"img","alt":"u∈U ∥�Ψ−1u0 ∥∞ ≲ 1 and�n log(p ∨ n) ≲ λ","inline":true},{"text":", we have that uniformly over ","element":"span"},{"style":{"height":12.8},"width":113.85,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-11.png","element":"img","alt":" u ∈ U","inline":true},{"text":", with probability","element":"span"}],[{"id":"id-188","style":{"width":"99%"},"width":1864,"height":193,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-12.png","element":"img"}],[{"text":"The definitions above differ from their counterpart in the analysis of ","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-13.png","element":"img","alt":" ℓ1","inline":true},{"text":"-penalized least squares estimators by the weighting 0 ","element":"span"},{"style":{"height":13.82},"width":156.56,"height":34.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-14.png","element":"img","alt":" ⩽ wui ⩽","inline":true,"padRight":true},{"text":"1. Thus it is relevant to understand their relations through the quantities","element":"span"}],[{"style":{"width":"32%"},"width":617,"height":102,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-15.png","element":"img"}],[{"text":"Many primitive conditions on the data generating process will imply ","element":"span"},{"style":{"height":17.6},"width":100,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-16.png","element":"img","alt":" ψu(A","inline":true},{"text":") to be bounded away from zero for the relevant choices of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":". We refer to ","element":"span"},{"href":"#id-50","referenceIndex":19,"text":"Belloni et al. ","element":"a"},{"href":"#id-50","referenceIndex":19,"text":"(2013c) ","element":"a"},{"text":"for bounds on ","element":"span"},{"style":{"height":16.4},"width":150.59,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-17.png","element":"img","alt":" ψu. For","inline":true,"padRight":true},{"text":"notational convenience we will also work with a rescaling of the approximation errors ˜","element":"span"},{"style":{"height":17.6},"width":262.26,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-18.png","element":"img","alt":"ru(X) defined","inline":true,"padRight":true},{"text":"as","element":"span"}],[{"id":"id-185","style":{"width":"76%"},"width":1428,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-19.png","element":"img"}],[{"text":"which is the unique solution to Λ(","element":"span"},{"style":{"height":17.6},"width":867.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-20.png","element":"img","alt":"f(Xi)′θu + ˜ru(Xi)) = Λ(f(Xi)′θu) + ru(Xi).","inline":true,"padRight":true},{"text":"It follows that ","element":"span"},{"href":"#id-168","style":{"height":20.05},"width":1623.2,"height":50.13,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-21.png","element":"img","alt":"|rui| ⩽ |˜rui| and that32 |˜rui| ⩽ |rui|/ inf0⩽t⩽˜rui Λ′(f(X′iθu) + t) ⩽ |rui|/{wui − 2|rui|}+.","inline":true}],[{"text":"Next we derive finite sample bounds provided some crucial events occur.","element":"span"}],[{"id":"id-167","style":{"fontWeight":"bold"},"text":"Lemma I.6 ","element":"span"},{"text":"(Rates of Convergence for ","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-22.png","element":"img","alt":" ℓ1","inline":true},{"text":"-Logistic Estimator)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Assume that","element":"span"}],[{"style":{"width":"37%"},"width":709,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-23.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c > ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Further, let ","element":"span"},{"style":{"height":17.6},"width":883.94,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-24.png","element":"img","alt":" ℓ�Ψu0 ⩽ �Ψu ⩽ L�Ψu0 for L ⩾ 1 ⩾ ℓ > 1/c","inline":true},{"style":{"fontStyle":"italic"},"text":", uniformly over ","element":"span"},{"style":{"height":15.2},"width":145.83,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-25.png","element":"img","alt":" u ∈ U,","inline":true,"padRight":true},{"text":"˜","element":"span"},{"style":{"height":20.74},"width":953.02,"height":51.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-26.png","element":"img","alt":"c = (Lc + 1)/(ℓc − 1) supu∈U ∥�Ψu0∥∞∥�Ψ−1u0 ∥∞ and","inline":true}],[{"id":"id-168","style":{"width":"86%"},"width":1612,"height":128,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/68-27.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Provided that the nonlinear impact coefficient ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":32},"width":964.94,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-0.png","element":"img","alt":"qAu > 3�(L + 1c)∥�Ψu0∥∞λ√sn¯κ2˜c + 9˜c∥˜ru/√wu∥Pn,2�","inline":true}],[{"style":{"fontStyle":"italic"},"text":"for every ","element":"span"},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-1.png","element":"img","alt":" u ∈ U","inline":true},{"style":{"fontStyle":"italic"},"text":", we have uniformly over ","element":"span"},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-2.png","element":"img","alt":" u ∈ U","inline":true}],[{"style":{"width":"92%"},"width":1739,"height":246,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-3.png","element":"img"}],[{"text":"The following result provides bounds on the number of non-zero coefficients in the ","element":"span"},{"style":{"height":16},"width":229.59,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-4.png","element":"img","alt":" ℓ1-penalized","inline":true,"padRight":true},{"text":"estimator ","element":"span"},{"style":{"height":15.02},"width":40.48,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-5.png","element":"img","alt":"�θu","inline":true},{"text":", uniformly over ","element":"span"},{"style":{"height":12.8},"width":121.96,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-6.png","element":"img","alt":" u ∈ U.","inline":true}],[{"id":"id-183","style":{"fontWeight":"bold"},"text":"Lemma I.7 ","element":"span"},{"text":"(Sparsity of ","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-7.png","element":"img","alt":" ℓ1","inline":true},{"text":"-Logistic Estimator)","element":"span"},{"style":{"height":20.74},"width":911.68,"height":51.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-8.png","element":"img","alt":". Assume λ/n ⩾ c supu∈U ∥�Ψ−1u0 En[f(X)ζu]∥∞","inline":true},{"style":{"height":17.6},"width":1871.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-9.png","element":"img","alt":"for c > 1. Further, let ℓ�Ψu0 ⩽ �Ψu ⩽ L�Ψu0 for L ⩾ 1 ⩾ ℓ > 1/c, uniformly over u ∈","inline":true},{"style":{"height":20.86},"width":1872.06,"height":52.14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-10.png","element":"img","alt":"U, c0 = (Lc + 1)/(ℓc − 1), ˜c = c0 supu∈U ∥�Ψu0∥∞∥�Ψ−1u0 ∥∞ and Au = ∆2˜c,u ∪ {δ : ∥δ∥1 ⩽","inline":true}],[{"style":{"width":"99%"},"width":1868,"height":63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-11.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for every ","element":"span"},{"style":{"height":17.6},"width":529.82,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-12.png","element":"img","alt":" u ∈ U. Then for �su = ∥�θu∥0","inline":true},{"style":{"fontStyle":"italic"},"text":", uniformly over ","element":"span"},{"style":{"height":15.2},"width":122.95,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-13.png","element":"img","alt":" u ∈ U,","inline":true}],[{"style":{"width":"86%"},"width":1612,"height":589,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-14.png","element":"img"}],[{"text":"Next we turn to finite sample bounds for the logistic regression estimator where the support was selected based on ","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-15.png","element":"img","alt":" ℓ1","inline":true},{"text":"-penalized logistic regression. The results will hold uniformly over ","element":"span"},{"style":{"height":12.8},"width":114.28,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-16.png","element":"img","alt":" u ∈ U","inline":true,"padRight":true},{"text":"provided the side conditions also hold uniformly over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"text":".","element":"span"}],[{"id":"id-184","style":{"fontWeight":"bold"},"text":"Lemma I.8 ","element":"span"},{"text":"(Rate of Convergence for Post-","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-17.png","element":"img","alt":"ℓ1","inline":true},{"text":"-Logistic Estimator)","element":"span"},{"style":{"height":15.02},"width":257.16,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-18.png","element":"img","alt":". Consider �θu","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"defined as the post model selection logistic regression with the support ","element":"span"},{"style":{"height":17.6},"width":391.27,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-19.png","element":"img","alt":"�Tu and let ˜su := | �Tu|","inline":true},{"style":{"fontStyle":"italic"},"text":". Uniformly over ","element":"span"},{"style":{"height":12.8},"width":175.17,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-20.png","element":"img","alt":" u ∈ U we","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"have","element":"span"}],[{"style":{"width":"99%"},"width":1855,"height":351,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-21.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Comment I.1. ","element":"span"},{"text":"Since for a sparse vector ","element":"span"},{"style":{"height":20.08},"width":1011.55,"height":50.19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-22.png","element":"img","alt":" δ such that ∥δ∥0 = k we have ∥δ∥1 ⩽ √k∥δ∥ ⩽","inline":true}],[{"style":{"height":20.9},"width":457.62,"height":52.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-23.png","element":"img","alt":"k∥f(X)′δ∥Pn,2/�φmin(k","inline":true},{"text":"), the results above can directly establish bounds on the rate of convergence in the ","element":"span"},{"style":{"height":15.02},"width":163.18,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/69-24.png","element":"img","alt":" ℓ1-norm.","inline":true}]]},{"heading":"Appendix J. Additional Results for Section 7","paragraphs":[[{"id":"id-151","text":"In this section, we report additional results to supplement those provided in the main text. ","element":"span"},{"text":"Specifically, we provide results with both total wealth and net total financial assets as the outcome variable. We present detailed results for four different sets of controls ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":"). The first set uses the indicators of marital status, two-earner status, defined benefit pension status, IRA participation status, and home ownership status, a linear term for family size, five categories for age, four categories for education, and seven categories for income (Indicator specification). ","element":"span"},{"text":"We use the same definitions of categories as in ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004) ","element":"a"},{"text":"and note that this is identical to the specification in ","element":"span"},{"href":"#id-22","referenceIndex":37,"text":"Chernozhukov and Hansen ","element":"a"},{"href":"#id-22","referenceIndex":37,"text":"(2004) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-127","referenceIndex":20,"text":"Benjamin ","element":"a"},{"href":"#id-127","referenceIndex":20,"text":"(2003)","element":"a"},{"text":". The second through fourth specifications correspond to the Quadratic Spline specification, the Quadratic Spline Plus Interactions specification, and the Quadratic Spline Plus Many Interactions specification described in the main text.","element":"span"}],[{"text":"Results for intention to treat effects based on using 401(k) eligibility as the treatment variable are given in Appendix Table 1. In Appendix Table 2, we report results using 401(k) participation as the treatment variable instrumenting with 401(k) eligibility. We plot the QTE and QTE-T, based on using 401(k) eligibility as the treatment variable, in Figures 3-6. ","element":"span"},{"text":"Finally, the LQTE and LQTE-T, based on using 401(k) participation as the treatment variability and instrumenting with eligibility, are plotted in Appendix Figures 7-10. The results are broadly consistent with the discussion provided in the main text with the selection and no selection results being similar in the low-dimensional cases and the selection results being substantially more regular in the high-dimensional cases. We also see that the patterns of point estimates for total wealth and net total financial assets are similar, though the total wealth estimates have substantially larger estimated standard errors, especially for high quantiles.","element":"span"}]]},{"heading":"Appendix K. Auxiliary Results: Algebra of Covering Entropies","paragraphs":[[{"id":"id-135","style":{"fontWeight":"bold"},"text":"Lemma K.1 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Algebra for Covering Entropies","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Work with the setup described in Appendix C of the main text.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"(1) Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a VC subgraph class with a finite VC index ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"style":{"fontStyle":"italic"},"text":"or any other class whose entropy is bounded above by that of such a VC subgraph class, then the covering entropy of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"obeys:","element":"span"}],[{"style":{"width":"50%"},"width":950,"height":78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/70-0.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"(2) For any measurable classes of functions ","element":"span"},{"style":{"height":16.4},"width":521.46,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/70-1.png","element":"img","alt":" F and F′ mapping W to R","inline":true}],[{"style":{"width":"99%"},"width":1870,"height":174,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/70-2.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"(3) Given a measurable class ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"mapping ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"style":{"fontStyle":"italic"},"text":"to ","element":"span"},{"text":"R ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and a random variable ","element":"span"},{"style":{"height":16.4},"width":20,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/70-3.png","element":"img","alt":" ξ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"taking values in ","element":"span"},{"text":"R","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"}],[{"style":{"width":"69%"},"width":1301,"height":77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/70-4.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"(4) Given measurable classes ","element":"span"},{"style":{"height":17.42},"width":46.36,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-0.png","element":"img","alt":" Fj","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and envelopes ","element":"span"},{"style":{"height":17.42},"width":694.76,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-1.png","element":"img","alt":" Fj, j = 1, . . . , k, mapping W to R","inline":true},{"style":{"fontStyle":"italic"},"text":", a function ","element":"span"},{"style":{"height":19.13},"width":226.3,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-2.png","element":"img","alt":"φ : Rk → R","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that for ","element":"span"},{"style":{"height":24.4},"width":1365.38,"height":61.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-3.png","element":"img","alt":" fj, gj ∈ Fj, |φ(f1, . . . , fk) − φ(g1, . . . , gk)| ⩽ �kj=1 Lj(x)|fj(x) − gj(x)|,","inline":true},{"style":{"height":18.22},"width":190.58,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-4.png","element":"img","alt":"Lj(x) ⩾ 0","inline":true},{"style":{"fontStyle":"italic"},"text":", and fixed functions ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":17.42},"width":142.88,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-5.png","element":"img","alt":"fj ∈ Fj","inline":true},{"style":{"fontStyle":"italic"},"text":", the class of functions ","element":"span"},{"style":{"height":19.41},"width":673.8,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-6.png","element":"img","alt":" L = {φ(f1, . . . , fk) − φ( ¯f1, . . . , ¯fk) :","inline":true}],[{"style":{"width":"88%"},"width":1660,"height":219,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-7.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Proof. ","element":"span"},{"text":"For the proof (1)-(2) see, e.g., ","element":"span"},{"href":"#id-106","referenceIndex":5,"text":"Andrews ","element":"a"},{"href":"#id-106","referenceIndex":5,"text":"(1994a) ","element":"a"},{"text":"and (3) follows from (2). To show (4) let ","element":"span"},{"style":{"height":18.22},"width":1276.66,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-8.png","element":"img","alt":" f = (f1, . . . , fk) and g = (g1, . . . , gk) where fj, gj ∈ Fj, j = 1, . . . , k","inline":true},{"text":". Then, by the condition on","element":"span"}],[{"id":"id-169","style":{"width":"99%"},"width":1866,"height":179,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-9.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":19.02},"width":481.93,"height":47.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-10.png","element":"img","alt":"�Nj be a (ϵ/k)-net for Fj","inline":true,"padRight":true},{"text":"with the measure ","element":"span"},{"style":{"height":22.02},"width":623.33,"height":55.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-11.png","element":"img","alt":"�Qj, where d �Qj(x) = L2j(x)dQ(x","inline":true},{"text":"). Then the set ","element":"span"},{"style":{"height":20.02},"width":1084.03,"height":50.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-12.png","element":"img","alt":"{φ(f1, . . . , fk) − φ( ¯f1, . . . , ¯fk) : fj ∈ �Nj} is an ϵ-net for L","inline":true,"padRight":true},{"text":"with respect to the measure ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"by ","element":"span"},{"href":"#id-169","text":"(K.1)","element":"a"},{"text":". Thus, for any ","element":"span"},{"style":{"height":10.4},"width":63.84,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-13.png","element":"img","alt":" ϵ >","inline":true,"padRight":true},{"text":"0 we have that","element":"span"}],[{"style":{"width":"48%"},"width":913,"height":137,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-14.png","element":"img"}],[{"text":"Therefore,","element":"span"}],[{"style":{"width":"84%"},"width":1575,"height":269,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-15.png","element":"img"}],[{"text":"and the result follows since the right hand side no longer depends on ","element":"span"},{"style":{"height":16},"width":531.04,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-16.png","element":"img","alt":" Q. ■","inline":true}],[{"id":"id-171","style":{"fontWeight":"bold"},"text":"Lemma K.2 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Covering Entropy for Classes obtained as Conditional Expectations","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"denote a class of measurable functions ","element":"span"},{"style":{"height":16.4},"width":332.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-17.png","element":"img","alt":" f : W × Y �→ R","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with a measurable envelope ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F","element":"span"},{"style":{"fontStyle":"italic"},"text":". For a given ","element":"span"},{"style":{"height":19.01},"width":463.66,"height":47.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-18.png","element":"img","alt":" f ∈ F, let ¯f : W �→ R","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be the function ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":19.6},"width":677.15,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-19.png","element":"img","alt":"f(w) :=�f(w, y)dµw(y) where µw","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a regular conditional probability distribution over ","element":"span"},{"style":{"height":16},"width":108.32,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-20.png","element":"img","alt":" y ∈ Y","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"conditional on ","element":"span"},{"style":{"height":19.41},"width":702.49,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-21.png","element":"img","alt":" w ∈ W. Set ¯F = { ¯f : f ∈ F} and let","inline":true,"padRight":true},{"text":"¯","element":"span"},{"style":{"height":19.6},"width":483.06,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-22.png","element":"img","alt":"F(w) :=�F(w, y)dµw(y)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be an envelope for ","element":"span"},{"text":"¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"F","element":"span"},{"style":{"fontStyle":"italic"},"text":". Then, for ","element":"span"},{"style":{"height":14.8},"width":151.32,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-23.png","element":"img","alt":" r, s ⩾ 1,","inline":true}],[{"style":{"width":"68%"},"width":1290,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-24.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"style":{"fontStyle":"italic"},"text":"belongs to the set of finitely-discrete probability measures over ","element":"span"},{"style":{"height":19.91},"width":508.42,"height":49.78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-25.png","element":"img","alt":" W such that 0 < ∥ ¯F∥Q,r <","inline":true},{"style":{"height":16},"width":200.61,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-26.png","element":"img","alt":"∞, and �Q","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"belongs to the set of finitely-discrete probability measures over ","element":"span"},{"style":{"height":14.8},"width":427.68,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-27.png","element":"img","alt":" W × Y such that 0 <","inline":true},{"style":{"height":22.02},"width":233.65,"height":55.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-28.png","element":"img","alt":"∥F∥ �Q,s < ∞","inline":true},{"style":{"fontStyle":"italic"},"text":". In particular, for every ","element":"span"},{"style":{"height":16.4},"width":385.98,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-29.png","element":"img","alt":" ϵ > 0 and any k ⩾ 1","inline":true}],[{"style":{"width":"52%"},"width":978,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/71-30.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Proof. ","element":"span"},{"text":"The proof generalizes the proof of Lemma A.2 in ","element":"span"},{"href":"#id-170","referenceIndex":51,"text":"Ghosal et al. ","element":"a"},{"href":"#id-170","referenceIndex":51,"text":"(2000)","element":"a"},{"text":". For ","element":"span"},{"style":{"height":16.4},"width":238.52,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-0.png","element":"img","alt":" f, g ∈ F and","inline":true,"padRight":true},{"text":"the corresponding ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":18.81},"width":160.45,"height":47.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-1.png","element":"img","alt":"f, ¯g ∈ ¯F","inline":true},{"text":", and any probability measure ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W","element":"span"},{"text":", by Jensen’s inequality, for any ","element":"span"},{"style":{"height":15.6},"width":116.09,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-2.png","element":"img","alt":"k ⩾ 1,","inline":true}],[{"style":{"width":"77%"},"width":1444,"height":59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.21},"width":462.8,"height":48.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-4.png","element":"img","alt":" d ¯Q(w, y) = dQ(w)dµw(y","inline":true},{"text":"). Therefore, for any ","element":"span"},{"style":{"height":12.4},"width":97.89,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-5.png","element":"img","alt":" ϵ > 0","inline":true}],[{"style":{"width":"68%"},"width":1280,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-6.png","element":"img"}],[{"text":"where we use Problems 2.5.1-2 of ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996) ","element":"a"},{"text":"to replace the supremum over ¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"with the supremum over finitely-discrete probability measures ","element":"span"},{"style":{"height":16},"width":46.49,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-7.png","element":"img","alt":" �Q.","inline":true}],[{"text":"Moreover, ","element":"span"},{"style":{"height":21.74},"width":1612.2,"height":54.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-8.png","element":"img","alt":" ∥ ¯F∥Q,1 = EQ[ ¯F(w)] = EQ[�F(w, y)dµw(y)] = E ¯Q[F(w, y)] = ∥F∥ ¯Q,1. Therefore","inline":true,"padRight":true},{"text":"taking ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"= 1,","element":"span"}],[{"style":{"width":"70%"},"width":1313,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-9.png","element":"img"}],[{"text":"where we use Problems 2.5.1-2 of ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996) ","element":"a"},{"text":"to replace the supremum over ¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"with the supremum over finitely-discrete probability measures ","element":"span"},{"style":{"height":16},"width":35,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-10.png","element":"img","alt":" �Q","inline":true},{"text":", and then Problem 2.10.4 of ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996) ","element":"a"},{"text":"to argue that the last bound in weakly increasing in ","element":"span"},{"style":{"height":14},"width":112.46,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-11.png","element":"img","alt":" s ⩾ 1.","inline":true}],[{"style":{"width":"97%"},"width":1821,"height":241,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-12.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Comment K.1. ","element":"span"},{"text":"Lemma ","element":"span"},{"href":"#id-171","text":"K.2 ","element":"a"},{"text":"extends the result in Lemma A.2 in ","element":"span"},{"href":"#id-170","referenceIndex":51,"text":"Ghosal et al. ","element":"a"},{"href":"#id-170","referenceIndex":51,"text":"(2000) ","element":"a"},{"text":"and Lemma 5 in ","element":"span"},{"href":"#id-172","referenceIndex":92,"text":"Sherman ","element":"a"},{"href":"#id-172","referenceIndex":92,"text":"(1994) ","element":"a"},{"text":"which considered integral classes with respect to a fixed measure ","element":"span"},{"style":{"height":16},"width":208.34,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-13.png","element":"img","alt":" µ on Y. In","inline":true,"padRight":true},{"text":"our applications we need to allow the integration measure to vary with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"w","element":"span"},{"text":", namely we allow for ","element":"span"},{"style":{"height":12},"width":50.3,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-14.png","element":"img","alt":" µw","inline":true,"padRight":true},{"text":"to be a conditional distribution. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-15.png","element":"img","alt":"■","inline":true}]]},{"heading":"Appendix L. Proofs for Section 4","paragraphs":[[{"id":"id-152","text":"L.1. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-165","style":{"fontWeight":"bold"},"text":"4.1. ","element":"a"},{"text":"Step 0. ","element":"span"},{"text":"(Preparation). In the proof ","element":"span"},{"style":{"height":16.8},"width":110.33,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-16.png","element":"img","alt":" a ≲ b","inline":true,"padRight":true},{"text":"means that ","element":"span"},{"style":{"height":16},"width":154.78,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-17.png","element":"img","alt":" a ⩽ Ab,","inline":true,"padRight":true},{"text":"where the constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"depends on the constants in Assumptions ","element":"span"},{"href":"#id-84","text":"4.1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-85","text":"4.2 ","element":"a"},{"text":"only, but not on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"once ","element":"span"},{"style":{"height":18.22},"width":537.46,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-18.png","element":"img","alt":" n ⩾ n0 = min{j : δj ⩽ 1/2}","inline":true},{"text":", and not on ","element":"span"},{"style":{"height":14.62},"width":142.72,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-19.png","element":"img","alt":" P ∈ Pn","inline":true},{"text":". We consider a sequence ","element":"span"},{"style":{"height":15.6},"width":332.86,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-20.png","element":"img","alt":" Pn in Pn, but for","inline":true,"padRight":true},{"text":"simplicity, we write ","element":"span"},{"style":{"height":14.62},"width":141.88,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-21.png","element":"img","alt":" P = Pn","inline":true,"padRight":true},{"text":"throughout the proof, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"suppressing ","element":"span"},{"text":"the index ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". Since the argument is asymptotic, we can assume that ","element":"span"},{"style":{"height":13.82},"width":127.56,"height":34.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-22.png","element":"img","alt":" n ⩾ n0","inline":true,"padRight":true},{"text":"in what follows.","element":"span"}],[{"text":"To proceed with the presentation of the proofs, it might be convenient for the reader to have the notation collected in one place. The influence function and low-bias moment functions for ","element":"span"},{"style":{"height":17.6},"width":113.34,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-23.png","element":"img","alt":" αV (z)","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"height":17.6},"width":275.67,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-24.png","element":"img","alt":" z ∈ Z = {0, 1}","inline":true,"padRight":true},{"text":"are given respectively by","element":"span"}],[{"style":{"width":"91%"},"width":1709,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/72-25.png","element":"img"}],[{"text":"The influence function and the moment function for ","element":"span"},{"style":{"height":19.59},"width":862.14,"height":48.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-0.png","element":"img","alt":" γV are ψγV (W) = ψγV (W, γV ) and ψγV (W, γ) =","inline":true},{"style":{"height":15.6},"width":134.84,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-1.png","element":"img","alt":"V − γ.","inline":true,"padRight":true},{"text":"Recall that the estimator of the reduced-form parameters ","element":"span"},{"style":{"height":17.6},"width":275.27,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-2.png","element":"img","alt":" αV (z) and γV","inline":true,"padRight":true},{"text":"are solutions ","element":"span"},{"style":{"height":17.6},"width":430.75,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-3.png","element":"img","alt":"α = �αV (z) and γ = �γV","inline":true,"padRight":true},{"text":"to the equations","element":"span"}],[{"style":{"width":"45%"},"width":860,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-4.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.41},"width":1744.46,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-5.png","element":"img","alt":" �gV (z, x) = ΛV (f(z, x)′ ¯βV ), �mZ(1, x) = ΛZ(f(x)′ ¯βZ), �mZ(0, x) = 1 − �mZ(1, x), and ¯βV and","inline":true,"padRight":true},{"text":"¯","element":"span"},{"style":{"height":16.4},"width":49.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-6.png","element":"img","alt":"βZ","inline":true,"padRight":true},{"text":"are estimators as in Assumption ","element":"span"},{"href":"#id-85","text":"4.2. ","element":"a"},{"text":"For each variable ","element":"span"},{"style":{"height":15.2},"width":148.83,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-7.png","element":"img","alt":" V ∈ Vu,","inline":true}],[{"style":{"width":"55%"},"width":1044,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-8.png","element":"img"}],[{"text":"we obtain the estimator ","element":"span"},{"style":{"height":23.44},"width":1401.51,"height":58.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-9.png","element":"img","alt":" �ρu =�{�αV (0), �αV (1), �γV }�V ∈Vu of ρu :=�{αV (0), αV (1), γV }�V ∈Vu. The","inline":true,"padRight":true},{"text":"estimator and the estimand are vectors in ","element":"span"},{"style":{"height":15.13},"width":63.94,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-10.png","element":"img","alt":" Rdρ ","inline":true,"padRight":true},{"text":"with a fixed finite dimension. We stack these vectors into the processes ","element":"span"},{"style":{"height":17.6},"width":566.11,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-11.png","element":"img","alt":" �ρ = (�ρu)u∈U and ρ = (ρu)u∈U.","inline":true}],[{"style":{"width":"97%"},"width":1817,"height":130,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-12.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":21.66},"width":980.31,"height":54.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-13.png","element":"img","alt":" Zn,P = (Gnψρu)u∈U and ψρu = ({ψαV,0, ψαV,1, ψγV })V ∈Vu","inline":true},{"text":". The components (","element":"span"},{"style":{"height":20.02},"width":385.29,"height":50.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-14.png","element":"img","alt":"√n(�γVuj − γVuj))u∈U","inline":true,"padRight":true},{"text":"of ","element":"span"},{"style":{"height":17.77},"width":178.05,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-15.png","element":"img","alt":"√n(�ρ − ρ","inline":true},{"text":") trivially have the linear representation (with no error) for each ","element":"span"},{"style":{"height":16},"width":110.82,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-16.png","element":"img","alt":" j ∈ J","inline":true,"padRight":true},{"text":". We only need to establish the claim for the empirical process (","element":"span"},{"style":{"height":20.02},"width":953.23,"height":50.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-17.png","element":"img","alt":"√n(�αVuj(z) − αVuj(z)))u∈U for z ∈ {0, 1} and each","inline":true},{"style":{"height":16},"width":110.82,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-18.png","element":"img","alt":"j ∈ J","inline":true,"padRight":true},{"text":", which we do in the steps below.","element":"span"}],[{"text":"(a) We make some preliminary observations. For ","element":"span"},{"style":{"height":19.13},"width":849.91,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-19.png","element":"img","alt":" t = (t1, t2, t3, t4) ∈ R2 × (0, 1)2, v ∈ R, and","inline":true,"padRight":true},{"text":"(","element":"span"},{"style":{"height":19.13},"width":257.79,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-20.png","element":"img","alt":"z, ¯z) ∈ {0, 1}2","inline":true},{"text":", we define the function (","element":"span"},{"style":{"height":17.6},"width":513.02,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-21.png","element":"img","alt":"v, z, ¯z, t) �→ ϕ(v, z, ¯z, t) via:","inline":true}],[{"style":{"width":"73%"},"width":1379,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-22.png","element":"img"}],[{"text":"The derivatives of this function with respect to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"obey for all ","element":"span"},{"style":{"height":22.02},"width":577.75,"height":55.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-23.png","element":"img","alt":" k = (kj)4j=1 ∈ N4 : 0 ⩽ |k| ⩽ 3,","inline":true}],[{"style":{"width":"93%"},"width":1743,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-24.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L ","element":"span"},{"text":"depends only on ","element":"span"},{"style":{"height":24.14},"width":938.53,"height":60.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-25.png","element":"img","alt":" c′ and C, |k| = �4j=1 kj, and ∂kt := ∂k1t1 ∂k2t2 ∂k3t3 ∂k4t4 .","inline":true}],[{"text":"(b) Let","element":"span"}],[{"style":{"width":"57%"},"width":1078,"height":289,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-26.png","element":"img"}],[{"text":"We observe that with probability no less than 1 ","element":"span"},{"style":{"height":16},"width":114.54,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-27.png","element":"img","alt":" − ∆n,","inline":true}],[{"style":{"width":"87%"},"width":1629,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-28.png","element":"img"}],[{"text":"where","element":"span"}],[{"style":{"width":"61%"},"width":1145,"height":184,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/73-29.png","element":"img"}],[{"style":{"width":"59%"},"width":1114,"height":184,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-0.png","element":"img"}],[{"text":"To see this, note that under Assumption ","element":"span"},{"href":"#id-85","text":"4.2 ","element":"a"},{"text":"for all ","element":"span"},{"style":{"height":18.22},"width":430.13,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-1.png","element":"img","alt":" n ⩾ min{j : δj ⩽ 1/2},","inline":true}],[{"style":{"width":"91%"},"width":1717,"height":433,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-2.png","element":"img"}],[{"text":"for ","element":"span"},{"style":{"height":18.61},"width":146.42,"height":46.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-3.png","element":"img","alt":" β = ¯βZ","inline":true},{"text":", with evaluation after computing the norms, and for ","element":"span"},{"style":{"height":17.6},"width":133.53,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-4.png","element":"img","alt":" ∥∂Λ∥∞","inline":true,"padRight":true},{"text":"denoting sup","element":"span"},{"style":{"height":18.19},"width":193.2,"height":45.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-5.png","element":"img","alt":"l∈R |∂Λ(l)|","inline":true,"padRight":true},{"text":"here and below. Similarly, under Assumption ","element":"span"},{"href":"#id-85","text":"4.2,","element":"a"}],[{"style":{"width":"98%"},"width":1851,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-6.png","element":"img"}],[{"style":{"height":18.3},"width":1227.25,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-7.png","element":"img","alt":"∥ΛV (f(Z, X)′β) − gV (Z, X)∥P,∞ ≲ Kn∥β − βV ∥1 + ϵn ⩽ 2ϵn,","inline":true}],[{"text":"for ","element":"span"},{"style":{"height":18.61},"width":135.85,"height":46.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-8.png","element":"img","alt":" β = ¯βV","inline":true,"padRight":true},{"text":", with evaluation after computing the norms, and noting that for any ","element":"span"},{"style":{"height":16.4},"width":26,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-9.png","element":"img","alt":" β","inline":true}],[{"style":{"width":"96%"},"width":1807,"height":335,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-10.png","element":"img"}],[{"text":"Hence with probability at least 1 ","element":"span"},{"style":{"height":16},"width":114.54,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-11.png","element":"img","alt":" − ∆n,","inline":true}],[{"style":{"width":"89%"},"width":1675,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-12.png","element":"img"}],[{"text":"(c) We have that","element":"span"}],[{"style":{"width":"44%"},"width":831,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-13.png","element":"img"}],[{"text":"so that","element":"span"}],[{"style":{"width":"81%"},"width":1520,"height":170,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-14.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"evaluated at ","element":"span"},{"style":{"height":15.1},"width":149.71,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-15.png","element":"img","alt":" h = �hV .","inline":true}],[{"text":"(d) Note that for","element":"span"}],[{"style":{"width":"84%"},"width":1591,"height":63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/74-16.png","element":"img"}],[{"style":{"width":"71%"},"width":1341,"height":503,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-0.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"evaluated at ","element":"span"},{"style":{"height":12.8},"width":108.32,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-1.png","element":"img","alt":" h = �h","inline":true,"padRight":true},{"text":"after computing the expectations under ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":".","element":"span"}],[{"text":"By the law of iterated expectations and the orthogonality property of the moment condition for ","element":"span"},{"style":{"height":11.2},"width":69.16,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-2.png","element":"img","alt":"αV ,","inline":true}],[{"style":{"width":"71%"},"width":1343,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-3.png","element":"img"}],[{"text":"Moreover, uniformly for any ","element":"span"},{"style":{"height":17.5},"width":167.35,"height":43.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-4.png","element":"img","alt":" h ∈ HV,n","inline":true},{"text":", in view of properties noted in Steps (a) and (b),","element":"span"}],[{"style":{"width":"77%"},"width":1454,"height":174,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-5.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":18.3},"width":901.39,"height":45.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-6.png","element":"img","alt":"�hV ∈ HV,n for all V ∈ V = {Vuj : u ∈ U, j ∈ J }","inline":true,"padRight":true},{"text":"with probability 1 ","element":"span"},{"style":{"height":16},"width":337.5,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-7.png","element":"img","alt":" − ∆n, for n ⩾ n0,","inline":true}],[{"style":{"width":"51%"},"width":972,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-8.png","element":"img"}],[{"text":"(e) Furthermore, with probability 1 ","element":"span"},{"style":{"height":15.42},"width":101,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-9.png","element":"img","alt":" − ∆n","inline":true}],[{"style":{"width":"65%"},"width":1223,"height":83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-10.png","element":"img"}],[{"text":"The classes of functions,","element":"span"}],[{"id":"id-174","style":{"width":"83%"},"width":1570,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-11.png","element":"img"}],[{"text":"viewed as maps from the sample space ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"to the real line, are bounded by a constant envelope and obey log sup","element":"span"},{"style":{"height":19.79},"width":599.22,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-12.png","element":"img","alt":"Q N(ϵ, V, ∥ · ∥Q,2) ≲ log(e/ϵ) ∨","inline":true,"padRight":true},{"text":"0, which holds by Assumption ","element":"span"},{"href":"#id-84","text":"4.1(","element":"a"},{"text":"ii), and log sup","element":"span"},{"style":{"height":19.79},"width":597.9,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-13.png","element":"img","alt":"Q N(ϵ, V∗, ∥ · ∥Q,2) ≲ log(e/ϵ) ∨","inline":true,"padRight":true},{"text":"0 which holds by Assumption ","element":"span"},{"href":"#id-84","text":"4.1(","element":"a"},{"text":"ii) and Lemma ","element":"span"},{"href":"#id-171","text":"K.2. ","element":"a"},{"text":"The uniform covering entropy of the function sets","element":"span"}],[{"style":{"width":"63%"},"width":1188,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-14.png","element":"img"}],[{"text":"are trivially bounded by log(e","element":"span"},{"style":{"height":17.6},"width":138.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-15.png","element":"img","alt":"/ϵ) ∨ 0.","inline":true}],[{"text":"The class of functions","element":"span"}],[{"style":{"width":"32%"},"width":607,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-16.png","element":"img"}],[{"text":"has a constant envelope and is a subset of","element":"span"}],[{"style":{"width":"72%"},"width":1348,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-17.png","element":"img"}],[{"text":"which is a union of 5 sets of the form","element":"span"}],[{"style":{"width":"36%"},"width":674,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/75-18.png","element":"img"}],[{"text":"with Λ ","element":"span"},{"style":{"height":13.2},"width":78.09,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-0.png","element":"img","alt":" ∈ L","inline":true,"padRight":true},{"text":"a fixed monotone function for each of the 5 sets; each of these sets are the unions of at most","element":"span"},{"style":{"height":22.16},"width":82.06,"height":55.41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-1.png","element":"img","alt":"�2pCs�","inline":true},{"text":"VC-subgraph classes of functions with VC indices bounded by ","element":"span"},{"style":{"height":12.8},"width":66.49,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-2.png","element":"img","alt":" C′s","inline":true},{"text":". Note that a fixed monotone transformations Λ preserves the VC-subgraph property ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"(van der Vaart and Wellner, ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"1996, ","element":"a"},{"text":"Lemma 2.6.18). Therefore","element":"span"}],[{"style":{"width":"50%"},"width":939,"height":78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-3.png","element":"img"}],[{"text":"Similarly, the class of functions ","element":"span"},{"style":{"height":17.6},"width":396.99,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-4.png","element":"img","alt":" M = (M(1)∪(1−M","inline":true},{"text":"(1))) has a constant envelope, is a union of at most 5 sets, which are themselves the unions of at most","element":"span"},{"style":{"height":20.96},"width":82.06,"height":52.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-5.png","element":"img","alt":"� pCs�","inline":true},{"text":"VC-subgraph classes of functions with VC indices bounded by ","element":"span"},{"style":{"height":12.8},"width":66.49,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-6.png","element":"img","alt":" C′s","inline":true,"padRight":true},{"text":"since a fixed monotone transformations Λ preserves the VC-subgraph property. Therefore, log sup","element":"span"},{"style":{"height":19.79},"width":866.73,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-7.png","element":"img","alt":"Q N(ϵ, M, ∥ · ∥Q,2) ≲ (s log p + s log(e/ϵ)) ∨ 0.","inline":true}],[{"text":"Finally, the set of functions","element":"span"}],[{"style":{"width":"52%"},"width":987,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-8.png","element":"img"}],[{"text":"is a Lipschitz transform of function sets ","element":"span"},{"style":{"height":15.6},"width":486.79,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-9.png","element":"img","alt":" V, V∗, B, M∗, G, and M","inline":true},{"text":", with bounded Lipschitz coeffi-cients and with a constant envelope. Therefore,","element":"span"}],[{"style":{"width":"51%"},"width":963,"height":78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-10.png","element":"img"}],[{"text":"Applying Lemma ","element":"span"},{"href":"#id-143","text":"C.1 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":18.55},"width":299.33,"height":46.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-11.png","element":"img","alt":" σn = C′δnn−1/4 ","inline":true,"padRight":true},{"text":"and the envelope ","element":"span"},{"style":{"height":15.02},"width":158.06,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-12.png","element":"img","alt":" Jn = C′","inline":true},{"text":", with probability 1 ","element":"span"},{"style":{"height":15.42},"width":101.57,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-13.png","element":"img","alt":" − ∆n","inline":true,"padRight":true},{"text":"for some constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K > e","element":"span"}],[{"style":{"width":"53%"},"width":994,"height":428,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-14.png","element":"img"}],[{"text":"Here we have used some simple calculations, exploiting the boundedness condition in Assumptions ","element":"span"},{"href":"#id-84","text":"4.1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-85","text":"4.2, ","element":"a"},{"text":"to deduce that","element":"span"}],[{"style":{"width":"65%"},"width":1226,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-15.png","element":"img"}],[{"text":"by definition of the set ","element":"span"},{"style":{"height":17.1},"width":88.87,"height":42.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-16.png","element":"img","alt":" HV,n","inline":true},{"text":", so that we can use Lemma ","element":"span"},{"href":"#id-143","text":"C.1. ","element":"a"},{"text":"We also note that log(1","element":"span"},{"style":{"height":17.6},"width":255.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-17.png","element":"img","alt":"/δn) ≲ log(n)","inline":true,"padRight":true},{"text":"by the assumption on ","element":"span"},{"style":{"height":19.88},"width":777.69,"height":49.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-18.png","element":"img","alt":" δn and that s2 log2(p ∨ n) log2(n)/n ⩽ δn","inline":true,"padRight":true},{"text":"by Assumption ","element":"span"},{"href":"#id-85","text":"4.2(","element":"a"},{"text":"i).","element":"span"}],[{"text":"(f) The claim of Step 1 follows by collecting Steps (a)-(e).","element":"span"}],[{"text":"Step 2 ","element":"span"},{"text":"(Uniform Donskerness). ","element":"span"},{"text":"Here we claim that Assumption ","element":"span"},{"href":"#id-84","text":"4.1 ","element":"a"},{"text":"implies that the set of vectors of functions (","element":"span"},{"style":{"height":18.49},"width":227.93,"height":46.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-19.png","element":"img","alt":"ψρu)u∈U is P","inline":true},{"text":"-Donsker uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", namely that","element":"span"}],[{"style":{"width":"51%"},"width":959,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/76-20.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.19},"width":779.28,"height":47.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-0.png","element":"img","alt":" Zn,P = (Gnψρu)u∈U and ZP = (GP ψρu)u∈U","inline":true},{"text":". Moreover, ","element":"span"},{"style":{"height":14.7},"width":55.79,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-1.png","element":"img","alt":" ZP ","inline":true,"padRight":true},{"text":"has bounded, uniformly continuous ","element":"span"},{"text":"paths uniformly in ","element":"span"},{"style":{"height":12.8},"width":133.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-2.png","element":"img","alt":" P ∈ P:","inline":true}],[{"style":{"width":"71%"},"width":1334,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-3.png","element":"img"}],[{"text":"To verify these claims we shall invoke Theorem ","element":"span"},{"href":"#id-131","text":"B.1.","element":"a"}],[{"text":"To demonstrate the claim, it will suffice to consider the set of ","element":"span"},{"style":{"height":17.6},"width":648.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-4.png","element":"img","alt":" R-valued functions Ψ = (ψuk : u ∈","inline":true},{"style":{"height":18.62},"width":180.29,"height":46.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-5.png","element":"img","alt":"U, k ∈ [dρ","inline":true},{"text":"]). Further, we notice that ","element":"span"},{"style":{"height":19.97},"width":511.06,"height":49.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-6.png","element":"img","alt":" GnψαV,z = Gnf, for f ∈ Fz,","inline":true}],[{"style":{"width":"99%"},"width":1871,"height":317,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-7.png","element":"img"}],[{"text":"defined in Step 1(e), where the validity of the Lipschitz property relies on Assumption ","element":"span"},{"href":"#id-84","text":"4.1(","element":"a"},{"text":"iii) (to keep the denominator away from zero) and on the boundedness conditions in Assumption ","element":"span"},{"href":"#id-84","text":"4.1(","element":"a"},{"text":"iii) and Assumption ","element":"span"},{"href":"#id-85","text":"4.2(","element":"a"},{"text":"iii). The function sets ","element":"span"},{"style":{"height":15.6},"width":337.84,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-8.png","element":"img","alt":" B, V, V∗ and M∗ ","inline":true,"padRight":true},{"text":"are uniformly bounded classes that have uniform covering entropy bounded by log(e","element":"span"},{"style":{"height":17.6},"width":98.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-9.png","element":"img","alt":"/ϵ) ∨","inline":true,"padRight":true},{"text":"0 up to a multiplicative constant, and so ","element":"span"},{"style":{"height":15.02},"width":47.36,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-10.png","element":"img","alt":"Fz","inline":true},{"text":", which is uniformly bounded under Assumption ","element":"span"},{"href":"#id-84","text":"4.1, ","element":"a"},{"text":"the uniform covering entropy bounded by log(e","element":"span"},{"style":{"height":17.6},"width":95.93,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-11.png","element":"img","alt":"/ϵ) ∨","inline":true,"padRight":true},{"text":"0 up to a multiplicative constant (e.g. ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996)","element":"a"},{"text":"). Since ","element":"span"},{"style":{"height":15.1},"width":104.31,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-12.png","element":"img","alt":" FP is","inline":true,"padRight":true},{"text":"uniformly bounded and is a finite union of function sets with the uniform entropies obeying the said properties, it also follows that ","element":"span"},{"style":{"height":15.1},"width":57.36,"height":37.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-13.png","element":"img","alt":" FP","inline":true,"padRight":true},{"text":"has this property; namely,","element":"span"}],[{"style":{"width":"44%"},"width":824,"height":78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-14.png","element":"img"}],[{"text":"Since","element":"span"},{"style":{"height":21.85},"width":815.34,"height":54.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-15.png","element":"img","alt":"� ∞0 �log(e/ϵ) ∨ 0dϵ = e√π/2 < ∞ and FP","inline":true,"padRight":true},{"text":"is uniformly bounded, the first condition in ","element":"span"},{"href":"#id-133","text":"(B.1) ","element":"a"},{"text":"and the entropy condition ","element":"span"},{"href":"#id-173","text":"(B.2) ","element":"a"},{"text":"in Theorem ","element":"span"},{"href":"#id-131","text":"B.1 ","element":"a"},{"text":"hold.","element":"span"}],[{"text":"We demonstrate the second condition in ","element":"span"},{"href":"#id-133","text":"(B.1)","element":"a"},{"text":". ","element":"span"},{"text":"Consider a sequence of positive constants ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-16.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"approaching zero, and note that","element":"span"}],[{"style":{"width":"52%"},"width":983,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-17.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":16.4},"width":183.73,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-18.png","element":"img","alt":" fu and f˜u","inline":true,"padRight":true},{"text":"must be of the form:","element":"span"}],[{"style":{"width":"80%"},"width":1507,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-19.png","element":"img"}],[{"text":"with (","element":"span"},{"style":{"height":15.2},"width":120.58,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-20.png","element":"img","alt":"Uu, U˜u","inline":true},{"text":") equal to either (","element":"span"},{"style":{"height":17.6},"width":1290.71,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-21.png","element":"img","alt":"Yu, Y˜u) or (1d(D)Yu, 1d(D)Y˜u), for d = 0 or 1, and z = 0 or 1. Then","inline":true}],[{"style":{"width":"99%"},"width":1870,"height":315,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/77-22.png","element":"img"}],[{"style":{"width":"99%"},"width":1871,"height":203,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-0.png","element":"img"}],[{"text":"which we deduce by the definition of ","element":"span"},{"style":{"height":17.6},"width":540.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-1.png","element":"img","alt":" gUu(z, X) = EP [Uu|X, Z = z","inline":true},{"text":"] and the contraction property of the conditional expectation. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-2.png","element":"img","alt":"■","inline":true}],[{"text":"L.2. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-166","style":{"fontWeight":"bold"},"text":"4.2. ","element":"a"},{"text":"The proof will be similar to the proof of Theorem ","element":"span"},{"href":"#id-165","text":"4.1.","element":"a"}],[{"text":"Step 0. ","element":"span"},{"text":"(Preparation). In the proof ","element":"span"},{"style":{"height":16.8},"width":100.24,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-3.png","element":"img","alt":" a ≲ b","inline":true,"padRight":true},{"text":"means that ","element":"span"},{"style":{"height":15.2},"width":132.97,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-4.png","element":"img","alt":" a ⩽ Ab","inline":true},{"text":", where the constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"depends on the constants in Assumptions ","element":"span"},{"href":"#id-84","text":"4.1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-85","text":"4.2 ","element":"a"},{"text":"only, but not on ","element":"span"},{"style":{"height":18.22},"width":755.71,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-5.png","element":"img","alt":" n once n ⩾ n0 = min{j : δj ⩽ 1/2}, and","inline":true,"padRight":true},{"text":"not on ","element":"span"},{"style":{"height":14.62},"width":139.16,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-6.png","element":"img","alt":" P ∈ Pn","inline":true},{"text":". We consider a sequence ","element":"span"},{"style":{"height":14.62},"width":167.6,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-7.png","element":"img","alt":" Pn in Pn","inline":true},{"text":", but for simplicity, we write ","element":"span"},{"style":{"height":14.62},"width":141.67,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-8.png","element":"img","alt":" P = Pn","inline":true,"padRight":true},{"text":"throughout the proof, suppressing the index ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". Since the argument is asymptotic, we can assume that ","element":"span"},{"style":{"height":13.82},"width":128,"height":34.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-9.png","element":"img","alt":" n ⩾ n0","inline":true,"padRight":true},{"text":"in what follows. Let ","element":"span"},{"style":{"height":14.62},"width":47.66,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-10.png","element":"img","alt":" Pn","inline":true,"padRight":true},{"text":"denote the measure that puts mass ","element":"span"},{"style":{"height":15.13},"width":69.54,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-11.png","element":"img","alt":" n−1 ","inline":true,"padRight":true},{"text":"on points (","element":"span"},{"style":{"height":17.6},"width":417.63,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-12.png","element":"img","alt":"ξi, Wi) for i = 1, ..., n.","inline":true,"padRight":true},{"text":"Let ","element":"span"},{"style":{"height":14.62},"width":50.09,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-13.png","element":"img","alt":" En","inline":true,"padRight":true},{"text":"denote the expectation with respect to this measure, so that ","element":"span"},{"style":{"height":19.89},"width":586.42,"height":49.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-14.png","element":"img","alt":" Enf = n−1 �ni=1 f(ξi, Wi), and","inline":true},{"style":{"height":15.02},"width":54.94,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-15.png","element":"img","alt":"Gn","inline":true,"padRight":true},{"text":"denote the corresponding empirical process ","element":"span"},{"style":{"height":17.77},"width":317.84,"height":44.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-16.png","element":"img","alt":"√n(En − P), i.e.","inline":true}],[{"style":{"width":"74%"},"width":1398,"height":123,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-17.png","element":"img"}],[{"text":"Recall that we define the bootstrap draw as:","element":"span"}],[{"style":{"width":"63%"},"width":1197,"height":138,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-18.png","element":"img"}],[{"text":"since ","element":"span"},{"style":{"height":18.49},"width":410.02,"height":46.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-19.png","element":"img","alt":" P[ξ �ψρu] = 0 because ξ","inline":true,"padRight":true},{"text":"is independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"and has zero mean. Here ","element":"span"},{"style":{"height":19.59},"width":434.82,"height":48.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-20.png","element":"img","alt":" �ψρu = ( �ψρV )V ∈Vu, where","inline":true},{"style":{"height":22.16},"width":1303.65,"height":55.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-21.png","element":"img","alt":"�ψρV (W) = {ψαV,0,�gV , �mZ(W, �αV (0)), ψαV,1,�gV , �mZ(W, �αV (1)), ψγV (W, �γV )},","inline":true,"padRight":true},{"text":"is a plug-in estimator of the ","element":"span"},{"text":"influence function ","element":"span"},{"style":{"height":18.09},"width":62.03,"height":45.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-22.png","element":"img","alt":" ψρu.","inline":true}],[{"text":"Step 1.","element":"span"},{"text":"(Linearization) In this step we establish that","element":"span"}],[{"style":{"width":"88%"},"width":1664,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-23.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":20.77},"width":379.26,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-24.png","element":"img","alt":" ζ∗n,P = ζn,P (Dn, Bn","inline":true},{"text":") is a linearization error, arising completely due to estimation of the ","element":"span"},{"text":"influence function; if the influence function were known, this term would be zero.","element":"span"}],[{"text":"For the components (","element":"span"},{"style":{"height":18.54},"width":608.4,"height":46.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-25.png","element":"img","alt":"√n(�γ∗V − �γV ))V ∈V of √n(�ρ∗ − �ρ","inline":true},{"text":") the linearization follows by the represen- ","element":"span"},{"text":"tation,","element":"span"}],[{"style":{"width":"41%"},"width":776,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-26.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":12.8},"width":119.5,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-27.png","element":"img","alt":" V ∈ V","inline":true},{"text":", and noting that sup","element":"span"},{"style":{"height":21.11},"width":1215.12,"height":52.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-28.png","element":"img","alt":"V ∈V |I∗V | = supV ∈V |(�γV − γV )||Gnξ| = OP (n−1/2), for V defined","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-174","text":"(L.3) ","element":"a"},{"text":"by Theorem ","element":"span"},{"href":"#id-165","text":"4.1 ","element":"a"},{"text":"and by ","element":"span"},{"style":{"height":17.6},"width":289.23,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-29.png","element":"img","alt":" |Gnξ| = OP (1).","inline":true}],[{"text":"It remains to establish the claim for the empirical process (","element":"span"},{"style":{"height":22.49},"width":668.37,"height":56.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-30.png","element":"img","alt":"√n(�α∗Vuj(z) − �αVuj(z)))u∈U for z ∈","inline":true},{"style":{"height":17.6},"width":316.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-31.png","element":"img","alt":"{0, 1} and j ∈ J","inline":true,"padRight":true},{"text":". As in the proof of Theorem 4.1, we have that with probability at least 1 ","element":"span"},{"style":{"height":16},"width":114.55,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-32.png","element":"img","alt":" − ∆n,","inline":true}],[{"style":{"width":"92%"},"width":1729,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/78-33.png","element":"img"}],[{"style":{"width":"88%"},"width":1658,"height":214,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-0.png","element":"img"}],[{"text":"where sup","element":"span"},{"style":{"height":23.32},"width":755.56,"height":58.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-1.png","element":"img","alt":"V ∈V,z∈{0,1}(�αV (z) − αV (z)) = OP (n−1/2","inline":true},{"text":") by Theorem ","element":"span"},{"href":"#id-165","text":"4.1.","element":"a"}],[{"style":{"width":"94%"},"width":1768,"height":165,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-2.png","element":"img"}],[{"text":"where","element":"span"}],[{"style":{"width":"52%"},"width":986,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-3.png","element":"img"}],[{"text":"By the calculations in Step 1(e) of the proof of Theorem ","element":"span"},{"href":"#id-165","text":"4.1, ","element":"a"},{"style":{"height":14.62},"width":50.58,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-4.png","element":"img","alt":" Jn","inline":true,"padRight":true},{"text":"obeys log sup","element":"span"},{"style":{"height":19.79},"width":390.76,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-5.png","element":"img","alt":"Q N(ϵ, Jn, ∥ · ∥Q,2) ≲","inline":true,"padRight":true},{"text":"(","element":"span"},{"style":{"height":17.6},"width":416.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-6.png","element":"img","alt":"s log p+s log(e/ϵ))∨0.","inline":true,"padRight":true},{"text":"By Lemma ","element":"span"},{"href":"#id-135","text":"K.1, ","element":"a"},{"text":"multiplication of this class by ","element":"span"},{"style":{"height":16.4},"width":20,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-7.png","element":"img","alt":" ξ","inline":true,"padRight":true},{"text":"does not change the entropy bound modulo an absolute constant, namely","element":"span"}],[{"style":{"width":"60%"},"width":1131,"height":77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-8.png","element":"img"}],[{"text":"where the envelope ","element":"span"},{"style":{"height":17.6},"width":340.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-9.png","element":"img","alt":" Jn for ξJn is |ξ|","inline":true,"padRight":true},{"text":"times a constant. ","element":"span"},{"text":"Also, E[exp(","element":"span"},{"style":{"height":17.6},"width":206.1,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-10.png","element":"img","alt":"|ξ|)] < ∞","inline":true,"padRight":true},{"text":"implies that (E[max","element":"span"},{"style":{"height":20.55},"width":397.97,"height":51.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-11.png","element":"img","alt":"i⩽n |ξi|2])1/2 ≲ log n.","inline":true,"padRight":true},{"text":"Thus, applying Lemma ","element":"span"},{"href":"#id-143","text":"C.1 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":18.55},"width":409.46,"height":46.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-12.png","element":"img","alt":" σ = σn = C′δnn−1/4 ","inline":true,"padRight":true},{"text":"and the envelope ","element":"span"},{"style":{"height":17.6},"width":195.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-13.png","element":"img","alt":" Jn = C′|ξ|","inline":true},{"text":", for some constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"K > e","element":"span"}],[{"style":{"width":"74%"},"width":1399,"height":335,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-14.png","element":"img"}],[{"text":"for sup","element":"span"},{"style":{"height":19.79},"width":697.67,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-15.png","element":"img","alt":"f∈ξJn ∥f∥P,2 = supf∈Jn ∥f∥P,2 ≲ σn","inline":true},{"text":"; where the details of calculations are the same as in Step 1(e) of the proof of Theorem ","element":"span"},{"href":"#id-165","text":"4.1.","element":"a"}],[{"text":"Finally, we conclude that","element":"span"}],[{"style":{"width":"47%"},"width":880,"height":82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-16.png","element":"img"}],[{"text":"Step 2","element":"span"},{"text":". Here we are claiming that ","element":"span"},{"style":{"height":19.91},"width":340.47,"height":49.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-17.png","element":"img","alt":" Z∗n,P ⇝B ZP in D","inline":true},{"text":", under any sequence ","element":"span"},{"style":{"height":15.6},"width":388.17,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-18.png","element":"img","alt":" P = Pn ∈ Pn, where","inline":true},{"style":{"height":18.49},"width":329.4,"height":46.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-19.png","element":"img","alt":"ZP = (GP ψρu)u∈U","inline":true},{"text":". We have that","element":"span"}],[{"style":{"width":"92%"},"width":1736,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-20.png","element":"img"}],[{"text":"where the first term is ","element":"span"},{"style":{"height":20.77},"width":495.45,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-21.png","element":"img","alt":" o∗P (1), since G∗n,P ⇝B ZP","inline":true,"padRight":true},{"text":"by Theorem ","element":"span"},{"href":"#id-134","text":"B.2, ","element":"a"},{"text":"and the second term is ","element":"span"},{"style":{"height":17.6},"width":105.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-22.png","element":"img","alt":" oP (1)","inline":true,"padRight":true},{"text":"because ","element":"span"},{"style":{"height":20.77},"width":254.18,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-23.png","element":"img","alt":" ∥ζ∗n,P ∥D = oP","inline":true,"padRight":true},{"text":"(1) implies that E","element":"span"},{"style":{"height":20.77},"width":785.04,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-24.png","element":"img","alt":"P (∥ζ∗n,P ∥D ∧ 2) = EP EBn(∥ζ∗n,P ∥D ∧ 2) →","inline":true,"padRight":true},{"text":"0, which in turn ","element":"span"},{"text":"implies that E","element":"span"},{"style":{"height":20.77},"width":403.39,"height":51.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-25.png","element":"img","alt":"Bn(∥ζ∗n,P ∥D ∧ 2) = oP","inline":true,"padRight":true},{"text":"(1) by the Markov inequality. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-26.png","element":"img","alt":"■","inline":true}],[{"text":"L.3. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Corollary ","element":"span"},{"href":"#id-175","style":{"fontWeight":"bold"},"text":"4.1. ","element":"a"},{"text":"This is an immediate consequence of Theorems ","element":"span"},{"href":"#id-165","text":"4.1, ","element":"a"},{"href":"#id-166","text":"4.2, ","element":"a"},{"href":"#id-108","text":"B.3, ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-109","text":"B.4. ","element":"a"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/79-27.png","element":"img","alt":"■","inline":true}]]},{"heading":"Appendix M. Omited Proofs for Section 5","paragraphs":[[{"id":"id-132","style":{"fontWeight":"bold"},"text":"Lemma M.1 ","element":"span"},{"text":"(","element":"span"},{"style":{"fontWeight":"bold"},"text":"Donsker Theorem for Classes Changing with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Work with the set-up described in Appendix B of the main text. Suppose that for some fixed constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q > ","element":"span"},{"text":"2 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and every sequence ","element":"span"},{"style":{"height":16.8},"width":145.64,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-0.png","element":"img","alt":" δn ↘ 0:","inline":true}],[{"style":{"width":"99%"},"width":1869,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-1.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"(a) Then the empirical process ","element":"span"},{"text":"(","element":"span"},{"style":{"height":18.22},"width":197.82,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-2.png","element":"img","alt":"Gnfn,t)t∈T","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is asymptotically tight in ","element":"span"},{"style":{"height":17.6},"width":119.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-3.png","element":"img","alt":" ℓ∞(T)","inline":true},{"style":{"fontStyle":"italic"},"text":". (b) For any subsequence such that the covariance function ","element":"span"},{"style":{"height":17.42},"width":473.63,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-4.png","element":"img","alt":" Pnfn,sfn,t − Pnfn,sPnfn,t","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"converges pointwise on ","element":"span"},{"style":{"height":15.2},"width":131.23,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-5.png","element":"img","alt":" T × T,","inline":true,"padRight":true},{"text":"(","element":"span"},{"style":{"height":18.22},"width":197.82,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-6.png","element":"img","alt":"Gnfn,t)t∈T","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"converges in ","element":"span"},{"style":{"height":17.6},"width":119.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-7.png","element":"img","alt":" ℓ∞(T)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"to a Gaussian process with covariance function given by the limit of the covariance function along that subsequence.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Proof. ","element":"span"},{"text":"The proof follows is similar to the proof of Theorem 2.11.22 in ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996, ","element":"a"},{"text":"p. 220-221), except that the probability law is allowed to depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". Indeed, the use of Theorem 2.11.1 in ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996)","element":"a"},{"text":", which does allow for the probability space to depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", allows us to establish claim (a), whereas the proof of claim (b) follows by a standard argument.","element":"span"}],[{"text":"The random distance given in Theorem 2.11.1 in ","element":"span"},{"href":"#id-72","referenceIndex":99,"text":"van der Vaart and Wellner ","element":"a"},{"href":"#id-72","referenceIndex":99,"text":"(1996) ","element":"a"},{"text":"(Lemma ","element":"span"},{"href":"#id-176","text":"M.2 ","element":"a"},{"text":"below) reduces to ","element":"span"},{"style":{"height":21.29},"width":1094.82,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-8.png","element":"img","alt":" d2n(s, t) = 1n�ni=1(fn,s − fn,t)2(Wi) = Pn(fn,s − fn,t)2.","inline":true,"padRight":true},{"text":"It follows that ","element":"span"},{"style":{"height":17.6},"width":564,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-9.png","element":"img","alt":"N(ε, T, dn) = N(ε, Fn, L2(Pn","inline":true},{"text":")), for every ","element":"span"},{"style":{"height":15.02},"width":258.52,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-10.png","element":"img","alt":" ε > 0. If Fn","inline":true,"padRight":true},{"text":"is replaced by ","element":"span"},{"style":{"height":14.62},"width":92.39,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-11.png","element":"img","alt":" Fn ∨","inline":true,"padRight":true},{"text":"1 , then the conditions of the lemma still hold. Hence, assume without loss of generality than ","element":"span"},{"style":{"height":14.62},"width":101.29,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-12.png","element":"img","alt":" Fn ⩾","inline":true,"padRight":true},{"text":"1. Insert the bound on the covering numbers and next make a change of variables to bound the entropy integral","element":"span"},{"style":{"height":24.74},"width":456.8,"height":61.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-13.png","element":"img","alt":"� δn0 �log N(ε, Fn, dn)dε","inline":true,"padRight":true},{"text":"in Lemma ","element":"span"},{"href":"#id-176","text":"M.2 ","element":"a"},{"text":"by","element":"span"},{"style":{"height":24.74},"width":976.12,"height":61.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-14.png","element":"img","alt":"� δn0 �log N(ε∥Fn∥Pn,2, Fn, L2(Pn))dε∥Fn∥Pn,2. This","inline":true,"padRight":true},{"text":"converges to zero in probability for every ","element":"span"},{"style":{"height":16},"width":80.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-15.png","element":"img","alt":" δn ↓","inline":true,"padRight":true},{"text":"0 by the conditions of the lemma. Apply Lemma ","element":"span"},{"href":"#id-176","text":"M.2 ","element":"a"},{"text":"to obtain the result. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-16.png","element":"img","alt":"■","inline":true}],[{"id":"id-176","style":{"fontWeight":"bold"},"text":"Lemma M.2 ","element":"span"},{"text":"(van der Vaart and Wellner (1996, Th. 2.11.1))","element":"span"},{"style":{"height":17.42},"width":625.91,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-17.png","element":"img","alt":". For each n, let Zn1, . . . , Zn,mn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be independent stochastic processes, defined on the product probability space ","element":"span"},{"style":{"height":18.8},"width":391.94,"height":47.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-18.png","element":"img","alt":"�mni=1(Wni, Ani, Pni),","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"with each ","element":"span"},{"style":{"height":17.6},"width":295.33,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-19.png","element":"img","alt":" Zni = Zni(f, w)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"depending on the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"style":{"fontStyle":"italic"},"text":"th coordinate of ","element":"span"},{"style":{"height":17.6},"width":355.29,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-20.png","element":"img","alt":" w = (w1, . . . , wmn)","inline":true},{"style":{"fontStyle":"italic"},"text":", and indexed by a totally bounded semimetric space ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":88.09,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-21.png","element":"img","alt":"T, ρ)","inline":true},{"style":{"fontStyle":"italic"},"text":". Assume that the sums ","element":"span"},{"style":{"height":18.8},"width":206.34,"height":47.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-22.png","element":"img","alt":"�mni=1 eiZni","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are measurable in the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"sense that every one of the maps","element":"span"}],[{"style":{"width":"86%"},"width":1622,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-23.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"is measurable, for every ","element":"span"},{"style":{"height":13.2},"width":101.23,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-24.png","element":"img","alt":" δ > 0","inline":true},{"style":{"fontStyle":"italic"},"text":", every vector ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":508.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-25.png","element":"img","alt":"e1, . . . , emn) ∈ {−1, 0, 1}mn","inline":true},{"style":{"fontStyle":"italic"},"text":", and every natural number ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":". Also, for every ","element":"span"},{"style":{"height":16.4},"width":442.83,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-26.png","element":"img","alt":" η > 0 and every δn ↓ 0:","inline":true}],[{"style":{"width":"70%"},"width":1311,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-27.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"and","element":"span"},{"style":{"height":24.92},"width":741.76,"height":62.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-28.png","element":"img","alt":"� δn0 �log N(ε, Fn, dn)dε P∗→ 0, where dn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the random semimetric","element":"span"}],[{"style":{"width":"34%"},"width":638,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/80-29.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Then the sequence ","element":"span"},{"style":{"height":18.8},"width":347.6,"height":47.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-0.png","element":"img","alt":"�mni=1(Zni − EZni)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is asymptotically ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-1.png","element":"img","alt":" ρ","inline":true},{"style":{"fontStyle":"italic"},"text":"-equicontinuous.","element":"span"}]]},{"heading":"Appendix N. Proofs for Section 6 and Appendix I","paragraphs":[[{"id":"id-153","style":{"fontStyle":"italic"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-120","style":{"fontStyle":"italic"},"text":"6.1. ","element":"a"},{"text":"In order to establish the result uniformly in ","element":"span"},{"style":{"height":14.62},"width":139.82,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-2.png","element":"img","alt":" P ∈ Pn","inline":true},{"text":", it suffices to establish the result under the probability measure induced by any sequence ","element":"span"},{"style":{"height":14.62},"width":274.29,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-3.png","element":"img","alt":" P = Pn ∈ Pn","inline":true},{"text":". In the proof we shall use ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", suppressing the dependency of ","element":"span"},{"style":{"height":14.62},"width":49.02,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-4.png","element":"img","alt":" Pn","inline":true,"padRight":true},{"text":"on the sample size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". To prove this result we invoke Lemmas ","element":"span"},{"href":"#id-177","text":"I.3-","element":"a"},{"href":"#id-178","text":"I.5 ","element":"a"},{"text":"in Appendix ","element":"span"},{"text":"I. ","element":"span"},{"text":"These lemmas rely on specific events (described below) and Condition WL which is also stated in Appendix ","element":"span"},{"text":"I. ","element":"span"},{"text":"We will show that Assumption ","element":"span"},{"href":"#id-114","text":"6.1 ","element":"a"},{"text":"implies that the required events occur with probability 1 ","element":"span"},{"style":{"height":8.4},"width":64.64,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-5.png","element":"img","alt":" − o","inline":true},{"text":"(1) and also implies Condition WL.","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":20.95},"width":537.8,"height":52.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-6.png","element":"img","alt":"�Ψu0,jj = {En[|fj(X)ζu|2]}1/2 ","inline":true,"padRight":true},{"text":"denote the ideal penalty loadings. The three events required to occur with probability 1 ","element":"span"},{"style":{"height":8.4},"width":65.53,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-7.png","element":"img","alt":" − o","inline":true},{"text":"(1) are the following: ","element":"span"},{"style":{"height":18.37},"width":564.08,"height":45.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-8.png","element":"img","alt":" E1 := {cr ⩾ supu∈U ∥ru∥Pn,2}","inline":true},{"text":", and where ","element":"span"},{"style":{"height":10.62},"width":97.58,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-9.png","element":"img","alt":" cr :=","inline":true},{"style":{"height":20.82},"width":1871.88,"height":52.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-10.png","element":"img","alt":"C�s log(p ∨ n)/n; E2 := {λ/n ⩾ √c supu∈U ∥�Ψ−1u0 En[ζuf(X)]∥∞}, E3 := {ℓ�Ψu0 ⩽ �Ψu ⩽ L�Ψu0},","inline":true,"padRight":true},{"text":"for some 1","element":"span"},{"style":{"height":17.77},"width":454.1,"height":44.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-11.png","element":"img","alt":"/√c < 1/ 4√c < ℓ and L","inline":true,"padRight":true},{"text":"uniformly bounded for the penalty loading ","element":"span"},{"style":{"height":14.62},"width":53.94,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-12.png","element":"img","alt":"�Ψu","inline":true,"padRight":true},{"text":"in all iterations ","element":"span"},{"style":{"height":14.8},"width":229.79,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-13.png","element":"img","alt":"k ⩽ K for n","inline":true,"padRight":true},{"text":"sufficiently large.","element":"span"}],[{"text":"By Assumption ","element":"span"},{"href":"#id-114","text":"6.1(","element":"a"},{"text":"iv)(b) ","element":"span"},{"style":{"height":14.62},"width":49.21,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-14.png","element":"img","alt":" E1","inline":true,"padRight":true},{"text":"holds with probability 1 ","element":"span"},{"style":{"height":17.6},"width":132.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-15.png","element":"img","alt":" − o(1).","inline":true}],[{"text":"Next we verify that Condition WL holds. Condition WL(i) is implied by the approximate sparsity condition in Assumption ","element":"span"},{"href":"#id-114","text":"6.1(","element":"a"},{"text":"i) and the covering condition in Assumption ","element":"span"},{"href":"#id-114","text":"6.1(","element":"a"},{"text":"ii). By Assumption ","element":"span"},{"href":"#id-114","text":"6.1 ","element":"a"},{"text":"we have that ","element":"span"},{"style":{"height":15.02},"width":42.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-16.png","element":"img","alt":" du","inline":true,"padRight":true},{"text":"is fixed and the Algorithm sets ","element":"span"},{"style":{"height":19.87},"width":908.04,"height":49.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-17.png","element":"img","alt":" γ ∈ [1/n, min{log−1 n, pndu−1}] so that γ = o(1)","inline":true,"padRight":true},{"text":"and Φ","element":"span"},{"style":{"height":21.07},"width":853.34,"height":52.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-18.png","element":"img","alt":"−1(1 − γ/{2pndu}) ⩽ C log1/2(np) ⩽ Cδnn1/6 ","inline":true,"padRight":true},{"text":"by Assumption ","element":"span"},{"href":"#id-114","text":"6.1(","element":"a"},{"text":"i). Since it is assumed that E","element":"span"},{"style":{"height":19.75},"width":792.67,"height":49.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-19.png","element":"img","alt":"P [|fj(X)ζu|2] ⩾ c and EP [|fj(X)ζu|3] ⩽ C","inline":true,"padRight":true},{"text":"uniformly in ","element":"span"},{"style":{"height":16.4},"width":307.29,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-20.png","element":"img","alt":" j ⩽ p and u ∈ U","inline":true},{"text":", Condition WL(ii) holds. Condition WL(iii) follows from Assumption ","element":"span"},{"href":"#id-114","text":"6.1(","element":"a"},{"text":"iv).","element":"span"}],[{"text":"Since Condition WL holds, by Lemma ","element":"span"},{"href":"#id-179","text":"I.1, ","element":"a"},{"text":"the event ","element":"span"},{"style":{"height":14.62},"width":49.22,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-21.png","element":"img","alt":" E2","inline":true,"padRight":true},{"text":"occurs with probability 1 ","element":"span"},{"style":{"height":17.6},"width":132.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-22.png","element":"img","alt":" − o(1).","inline":true}],[{"text":"Next we proceed to verify occurrence of ","element":"span"},{"style":{"height":14.62},"width":49.22,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-23.png","element":"img","alt":" E3","inline":true},{"text":". In the first iteration, the penalty loadings are defined as ","element":"span"},{"style":{"height":20.95},"width":975.94,"height":52.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-24.png","element":"img","alt":"�Ψujj = {En[|fj(X)Yu|2]}1/2 for j = 1, . . . , p, u ∈ U","inline":true},{"text":". By Assumption ","element":"span"},{"href":"#id-114","text":"6.1, ","element":"a"},{"style":{"height":19.75},"width":408.6,"height":49.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-25.png","element":"img","alt":" c ⩽ EP [|fj(X)ζu|2] ⩽","inline":true,"padRight":true},{"text":"E","element":"span"},{"style":{"height":19.75},"width":347.12,"height":49.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-26.png","element":"img","alt":"P [|fj(X)Yu|2] ⩽ C","inline":true,"padRight":true},{"text":"uniformly over ","element":"span"},{"style":{"height":16.4},"width":425.02,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-27.png","element":"img","alt":" u ∈ U and j = 1, . . . , p","inline":true},{"text":". Moreover, Assumption ","element":"span"},{"href":"#id-114","text":"6.1(","element":"a"},{"text":"iv)(b) yields","element":"span"}],[{"style":{"width":"82%"},"width":1547,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-28.png","element":"img"}],[{"text":"with probability 1 ","element":"span"},{"style":{"height":15.42},"width":101,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-29.png","element":"img","alt":" − ∆n","inline":true},{"text":". In turn this shows that for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"large so that ","element":"span"},{"style":{"height":19.56},"width":361.42,"height":48.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-30.png","element":"img","alt":" δn ⩽ c/4 we have33","inline":true}],[{"style":{"width":"81%"},"width":1531,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-31.png","element":"img"}],[{"text":"with probability 1 ","element":"span"},{"style":{"height":15.42},"width":628.59,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-32.png","element":"img","alt":" − ∆n so that ℓ�Ψu0 ⩽ �Ψu ⩽ L�Ψu0","inline":true,"padRight":true},{"text":"for some uniformly bounded ","element":"span"},{"style":{"height":17.77},"width":319.12,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-33.png","element":"img","alt":" L and ℓ > 1/ 4√c.","inline":true,"padRight":true},{"text":"Moreover, ˜","element":"span"},{"style":{"height":20.74},"width":1003.12,"height":51.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-34.png","element":"img","alt":"c = {(L√c + 1)/(√cℓ − 1)} supu∈U ∥�Ψ−1u0 ∥∞∥�Ψu0∥∞","inline":true,"padRight":true},{"text":"is uniformly bounded for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"large ","element":"span"},{"text":"enough which implies that ","element":"span"},{"style":{"height":10.72},"width":59.29,"height":26.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-35.png","element":"img","alt":" κ2˜c","inline":true,"padRight":true},{"text":"as defined in ","element":"span"},{"href":"#id-180","text":"(I.1) ","element":"a"},{"text":"in Appendix ","element":"span"},{"href":"#id-181","text":"I.2 ","element":"a"},{"text":"is bounded away from zero with probability 1 ","element":"span"},{"style":{"height":15.42},"width":103.7,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-36.png","element":"img","alt":" − ∆n","inline":true,"padRight":true},{"text":"by the condition on sparse eigenvalues of order ","element":"span"},{"style":{"height":15.02},"width":59.64,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/81-37.png","element":"img","alt":" sℓn","inline":true,"padRight":true},{"text":"(see ","element":"span"},{"href":"#id-43","referenceIndex":24,"text":"Bickel et al. ","element":"a"},{"href":"#id-43","referenceIndex":24,"text":"(2009) ","element":"a"},{"text":"Lemma 4.1(ii)).","element":"span"}],[{"text":"By Lemma ","element":"span"},{"href":"#id-177","text":"I.3, ","element":"a"},{"text":"since ","element":"span"},{"style":{"height":21.07},"width":787.45,"height":52.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-0.png","element":"img","alt":" λ ∈ [cn1/2 log1/2(p ∨ n), Cn1/2 log1/2(p ∨ n","inline":true},{"text":")] by the choice of ","element":"span"},{"style":{"height":16},"width":282.12,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-1.png","element":"img","alt":" γ and du fixed,","inline":true}],[{"style":{"width":"93%"},"width":1757,"height":190,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-2.png","element":"img"}],[{"text":"In the application of Lemma ","element":"span"},{"href":"#id-182","text":"I.4, ","element":"a"},{"text":"by Assumption ","element":"span"},{"href":"#id-114","text":"6.1(","element":"a"},{"text":"iv)(c), we have that min","element":"span"},{"style":{"height":17.6},"width":311.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-3.png","element":"img","alt":"m∈M φmax(m) is","inline":true,"padRight":true},{"text":"uniformly bounded for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"large enough with probability 1 ","element":"span"},{"style":{"height":8.4},"width":65.22,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-4.png","element":"img","alt":" − o","inline":true},{"text":"(1). Thus, with probability 1 ","element":"span"},{"style":{"height":17.6},"width":133.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-5.png","element":"img","alt":" − o(1),","inline":true,"padRight":true},{"text":"by Lemma ","element":"span"},{"href":"#id-182","text":"I.4 ","element":"a"},{"text":"we have","element":"span"}],[{"style":{"width":"93%"},"width":1749,"height":324,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-6.png","element":"img"}],[{"text":"for some ","element":"span"},{"text":"¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", since uniformly in ","element":"span"},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-7.png","element":"img","alt":" u ∈ U","inline":true,"padRight":true},{"text":"we have a sparsity bound ","element":"span"},{"style":{"height":17.6},"width":357.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-8.png","element":"img","alt":" ∥(�θu−θu)∥0 ⩽ C′′s","inline":true,"padRight":true},{"text":"and that ensures that a bound on the prediction rate yields a bound on the ","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-9.png","element":"img","alt":" ℓ1","inline":true},{"text":"-norm rate through the relations ","element":"span"},{"style":{"height":20.9},"width":1052.29,"height":52.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-10.png","element":"img","alt":" ∥v∥1 ⩽�∥v∥0∥v∥ ⩽�∥v∥0∥f(X)′v∥Pn,2/�φmin(∥v∥0).","inline":true}],[{"text":"In the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":"th iteration, the penalty loadings are constructed based on (","element":"span"},{"style":{"height":23.2},"width":152.68,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-11.png","element":"img","alt":"�θ(k)u )u∈U","inline":true},{"text":", defined as ","element":"span"},{"style":{"height":17.02},"width":132.74,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-12.png","element":"img","alt":" �Ψujj =","inline":true},{"style":{"height":23.82},"width":1110.28,"height":59.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-13.png","element":"img","alt":"{En[|fj(X){Yu − f(X)′�θ(k)u }|2]}1/2 for j = 1, . . . , p, u ∈ U","inline":true},{"text":". We assume (","element":"span"},{"style":{"height":23.2},"width":152.68,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-14.png","element":"img","alt":"�θ(k)u )u∈U ","inline":true,"padRight":true},{"text":"satisfy the rates ","element":"span"},{"text":"above uniformly in ","element":"span"},{"style":{"height":12.8},"width":142.9,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-15.png","element":"img","alt":" u ∈ U.","inline":true,"padRight":true},{"text":"Then with probability 1 ","element":"span"},{"style":{"height":8.4},"width":68.82,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-16.png","element":"img","alt":" − o","inline":true},{"text":"(1) we have uniformly in ","element":"span"},{"style":{"height":13.2},"width":221.81,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-17.png","element":"img","alt":" u ∈ U and","inline":true}],[{"style":{"width":"88%"},"width":1653,"height":273,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-18.png","element":"img"}],[{"text":"where we used that max","element":"span"},{"style":{"height":19.75},"width":1063.17,"height":49.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-19.png","element":"img","alt":"i⩽n,j⩽p |fj(Xi)| ⩽ Kn a.s., and K2ns log(p ∨ n) ⩽ δnn","inline":true,"padRight":true},{"text":"by Assumption ","element":"span"},{"href":"#id-114","text":"6.1(","element":"a"},{"text":"iv)(a), and that inf","element":"span"},{"style":{"height":18.3},"width":349.23,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-20.png","element":"img","alt":"u∈U,j⩽p �Ψu0jj ⩾ c/","inline":true},{"text":"2 with probability 1 ","element":"span"},{"style":{"height":17.6},"width":231.1,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-21.png","element":"img","alt":" − o(1) for n","inline":true,"padRight":true},{"text":"large so that ","element":"span"},{"style":{"height":17.6},"width":178.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-22.png","element":"img","alt":" δn ⩽ c/2.","inline":true,"padRight":true},{"text":"Further, for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"large so that (2 ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":23.2},"width":405.51,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-23.png","element":"img","alt":"Cδ1/2n /c) < 1 − 1/ 4√c","inline":true},{"text":", this establishes that the event of the penalty ","element":"span"},{"text":"loadings for the (","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"+ 1)th iteration also satisfy ","element":"span"},{"style":{"height":20.74},"width":414.61,"height":51.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-24.png","element":"img","alt":" ℓ�Ψ−1u0 ⩽ �Ψ−1u ⩽ L�Ψ−1u0 ","inline":true,"padRight":true},{"text":"for a uniformly bounded ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L ","element":"span"},{"text":"and some ","element":"span"},{"style":{"height":17.77},"width":177.86,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-25.png","element":"img","alt":" ℓ > 1/ 4√c","inline":true,"padRight":true},{"text":"with probability 1 ","element":"span"},{"style":{"height":8.4},"width":64.64,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-26.png","element":"img","alt":" − o","inline":true},{"text":"(1) uniformly in ","element":"span"},{"style":{"height":12.8},"width":121.96,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-27.png","element":"img","alt":" u ∈ U.","inline":true}],[{"text":"This leads to the stated rates of convergence and sparsity bound. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-28.png","element":"img","alt":"■","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-122","style":{"fontStyle":"italic"},"text":"6.2. ","element":"a"},{"text":"In order to establish the result uniformly in ","element":"span"},{"style":{"height":14.62},"width":139.82,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-29.png","element":"img","alt":" P ∈ Pn","inline":true},{"text":", it suffices to establish the result under the probability measure induced by any sequence ","element":"span"},{"style":{"height":14.62},"width":254.22,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-30.png","element":"img","alt":" P = Pn ∈ Pn","inline":true},{"text":". In the proof we shall use ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", suppressing the dependency of ","element":"span"},{"style":{"height":14.62},"width":49.01,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-31.png","element":"img","alt":" Pn","inline":true,"padRight":true},{"text":"on the sample size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". The proof is similar to the proof of Theorem ","element":"span"},{"href":"#id-120","text":"6.1. ","element":"a"},{"text":"We invoke Lemmas ","element":"span"},{"href":"#id-167","text":"I.6, ","element":"a"},{"href":"#id-183","text":"I.7 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-184","text":"I.8 ","element":"a"},{"text":"which require Condition WL and some events to occur. We show that Assumption ","element":"span"},{"href":"#id-121","text":"6.2 ","element":"a"},{"text":"implies Condition WL and that the required events occur with probability at least 1 ","element":"span"},{"style":{"height":17.6},"width":132.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-32.png","element":"img","alt":" − o(1).","inline":true}],[{"text":"Let ","element":"span"},{"style":{"height":20.95},"width":556.72,"height":52.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-33.png","element":"img","alt":"�Ψu0,jj = {En[|fj(X)ζu|2]}1/2 ","inline":true,"padRight":true},{"text":"denote the ideal penalty loadings, ","element":"span"},{"style":{"height":17.6},"width":473.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-34.png","element":"img","alt":" wui = EP [Yui | Xi](1 −","inline":true,"padRight":true},{"text":"E","element":"span"},{"style":{"height":17.6},"width":183.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-35.png","element":"img","alt":"P [Yui | Xi","inline":true},{"text":"]) the conditional variance of ","element":"span"},{"style":{"height":17.6},"width":549.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-36.png","element":"img","alt":" Yui given Xi and ˜rui = ˜ru(Xi","inline":true},{"text":") the rescaled approximation error as defined in ","element":"span"},{"href":"#id-185","text":"(I.5)","element":"a"},{"text":". ","element":"span"},{"text":"The three events required to occur with probability 1 ","element":"span"},{"style":{"height":17.6},"width":263.95,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/82-37.png","element":"img","alt":" − o(1) are as","inline":true,"padRight":true},{"text":"follows: ","element":"span"},{"style":{"height":21.26},"width":1401.93,"height":53.15,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-0.png","element":"img","alt":" E1 := {cr ⩾ supu∈U ∥˜ru/√wu∥Pn,2} for cr := C′�s log(p ∨ n)/n where C′ ","inline":true,"padRight":true},{"text":"is large enough; ","element":"span"},{"style":{"height":20.74},"width":1871.77,"height":51.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-1.png","element":"img","alt":"E2 := {λ/n ⩾ √c supu∈U ∥�Ψ−1u0 En[ζuf(X)]∥∞}; and E3 := {ℓ�Ψu0 ⩽ �Ψu ⩽ L�Ψu0}, for ℓ > 1/ 4√c and","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"L ","element":"span"},{"text":"uniformly bounded, for the penalty loading ","element":"span"},{"style":{"height":14.62},"width":53.94,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-2.png","element":"img","alt":"�Ψu","inline":true,"padRight":true},{"text":"in all iterations ","element":"span"},{"style":{"height":14.8},"width":229.79,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-3.png","element":"img","alt":" k ⩽ K for n","inline":true,"padRight":true},{"text":"sufficiently large.","element":"span"}],[{"text":"Regarding ","element":"span"},{"style":{"height":14.62},"width":49.21,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-4.png","element":"img","alt":" E1","inline":true},{"text":", by Assumption ","element":"span"},{"href":"#id-121","text":"6.2(","element":"a"},{"text":"iii), we have ","element":"span"},{"style":{"height":17.6},"width":900.89,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-5.png","element":"img","alt":" c(1 − c) ⩽ wui ⩽ 1/4. Since |ru(Xi)| ⩽ δn a.s.","inline":true,"padRight":true},{"text":"uniformly on ","element":"span"},{"style":{"height":15.6},"width":408.4,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-6.png","element":"img","alt":" u ∈ U for i = 1, . . . , n","inline":true},{"text":", we have that the rescaled approximation error defined in ","element":"span"},{"href":"#id-185","text":"(I.5) ","element":"a"},{"text":"satisfies ","element":"span"},{"style":{"height":20.41},"width":1077.38,"height":51.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-7.png","element":"img","alt":" |˜ru(Xi)| ⩽ |ru(Xi)|/{c(1 − c) − 2δn}+ ⩽ ˜C|ru(Xi)| for n","inline":true,"padRight":true},{"text":"large enough so that ","element":"span"},{"style":{"height":17.6},"width":206.5,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-8.png","element":"img","alt":" δn ⩽ c(1 −","inline":true},{"style":{"height":21.55},"width":836.01,"height":53.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-9.png","element":"img","alt":"c)/4. Thus ∥˜ru/√wu∥Pn,2 ⩽ ˜C∥ru/√wu∥Pn,2","inline":true},{"text":". Assumption ","element":"span"},{"href":"#id-121","text":"6.2(","element":"a"},{"text":"iv)(b) yields sup","element":"span"},{"style":{"height":18.74},"width":386.08,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-10.png","element":"img","alt":"u∈U ∥ru/√wu∥Pn,2 ⩽","inline":true},{"style":{"height":20.8},"width":340.42,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-11.png","element":"img","alt":"C�s log(p ∨ n)/n","inline":true,"padRight":true},{"text":"with probability 1 ","element":"span"},{"style":{"height":17.6},"width":250,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-12.png","element":"img","alt":" − o(1), so E3","inline":true,"padRight":true},{"text":"occurs with probability 1 ","element":"span"},{"style":{"height":17.6},"width":132.55,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-13.png","element":"img","alt":" − o(1).","inline":true}],[{"text":"To apply Lemma ","element":"span"},{"href":"#id-179","text":"I.1 ","element":"a"},{"text":"to show that ","element":"span"},{"style":{"height":14.62},"width":49.21,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-14.png","element":"img","alt":" E2","inline":true,"padRight":true},{"text":"occurs with probability 1","element":"span"},{"style":{"height":8.4},"width":59.08,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-15.png","element":"img","alt":"−o","inline":true},{"text":"(1) we need to verify Condition WL. Condition WL(i) is implied by the sparsity in Assumption ","element":"span"},{"href":"#id-121","text":"6.2(","element":"a"},{"text":"i) and the covering condition in Assumption ","element":"span"},{"href":"#id-121","text":"6.2(","element":"a"},{"text":"ii). By Assumption ","element":"span"},{"href":"#id-121","text":"6.2 ","element":"a"},{"text":"we have that ","element":"span"},{"style":{"height":15.02},"width":42.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-16.png","element":"img","alt":" du","inline":true,"padRight":true},{"text":"is fixed and the Algorithm sets ","element":"span"},{"style":{"height":13.2},"width":71.6,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-17.png","element":"img","alt":" γ ∈","inline":true,"padRight":true},{"text":"[1","element":"span"},{"style":{"height":21.07},"width":1838.04,"height":52.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-18.png","element":"img","alt":"/n, min{log−1 n, pndu−1}] so that γ = o(1) and Φ−1(1 − γ/{2pndu}) ⩽ C log1/2(np) ⩽ Cδnn1/6 by","inline":true,"padRight":true},{"text":"Assumption ","element":"span"},{"href":"#id-121","text":"6.2(","element":"a"},{"text":"i). Since it is assumed that E","element":"span"},{"style":{"height":19.75},"width":995.66,"height":49.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-19.png","element":"img","alt":"P [|fj(X)ζu|2] ⩾ c and EP [|fj(X)ζu|3] ⩽ C uniformly","inline":true,"padRight":true},{"text":"in ","element":"span"},{"style":{"height":16.4},"width":318.27,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-20.png","element":"img","alt":" j ⩽ p and u ∈ U","inline":true},{"text":", Condition WL(ii) holds. Condition WL(iii) follows from Assumption ","element":"span"},{"href":"#id-114","text":"6.1(","element":"a"},{"text":"iv). Then, by Lemma ","element":"span"},{"href":"#id-179","text":"I.1, ","element":"a"},{"text":"the event ","element":"span"},{"style":{"height":14.62},"width":49.21,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-21.png","element":"img","alt":" E2","inline":true,"padRight":true},{"text":"occurs with probability 1 ","element":"span"},{"style":{"height":17.6},"width":132.55,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-22.png","element":"img","alt":" − o(1).","inline":true}],[{"text":"Next we verify the occurrence of ","element":"span"},{"style":{"height":14.62},"width":49.21,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-23.png","element":"img","alt":" E3","inline":true},{"text":". In the initial iteration, the penalty loadings are defined as ","element":"span"},{"style":{"height":21.95},"width":931.73,"height":54.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-24.png","element":"img","alt":"�Ψujj = 12{En[|fj(X)|2]}1/2 for j = 1, . . . , p, u ∈ U","inline":true},{"text":". Assumption ","element":"span"},{"href":"#id-121","text":"6.2(","element":"a"},{"text":"iv)(c) for the sparse eigenvalues ","element":"span"},{"text":"implies that for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"large enough, ","element":"span"},{"style":{"height":19.75},"width":788.1,"height":49.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-25.png","element":"img","alt":" c′ ⩽ En[|fj(X)|2] ⩽ C′ for all j = 1, . . . , p,","inline":true,"padRight":true},{"text":"with probability 1","element":"span"},{"style":{"height":17.6},"width":126.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-26.png","element":"img","alt":"−o(1).","inline":true}],[{"text":"Moreover, Assumption ","element":"span"},{"href":"#id-121","text":"6.2(","element":"a"},{"text":"iv)(b) yields","element":"span"}],[{"id":"id-186","style":{"width":"68%"},"width":1273,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-27.png","element":"img"}],[{"text":"with probability 1 ","element":"span"},{"style":{"height":17.82},"width":403.5,"height":44.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-28.png","element":"img","alt":" − ∆n, so that �Ψu0jj","inline":true,"padRight":true},{"text":"is bounded away from zero and from above uniformly over ","element":"span"},{"style":{"height":16},"width":407.72,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-29.png","element":"img","alt":" j = 1, . . . , p, u ∈ U","inline":true},{"text":", with the same probability because E","element":"span"},{"style":{"height":19.75},"width":234.64,"height":49.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-30.png","element":"img","alt":"P [|fj(X)ζu|2","inline":true},{"text":"] is bounded away from zero and above. By ","element":"span"},{"href":"#id-186","text":"(N.1) ","element":"a"},{"text":"and E","element":"span"},{"style":{"height":21.29},"width":698.87,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-31.png","element":"img","alt":"P [|fj(X)ζu|2] ⩽ 14EP [|fj(X)|2], for n","inline":true,"padRight":true},{"text":"large enough, we have ","element":"span"},{"style":{"height":15.02},"width":362.82,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-32.png","element":"img","alt":"ℓ�Ψu0 ⩽ �Ψu ⩽ L�Ψu0","inline":true,"padRight":true},{"text":"for some uniformly bounded ","element":"span"},{"style":{"height":17.77},"width":306.95,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-33.png","element":"img","alt":" L and ℓ > 1/ 4√c","inline":true,"padRight":true},{"text":"with probability 1 ","element":"span"},{"style":{"height":15.42},"width":114.54,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-34.png","element":"img","alt":" − ∆n.","inline":true}],[{"text":"Thus, ˜","element":"span"},{"style":{"height":20.74},"width":984.32,"height":51.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-35.png","element":"img","alt":"c = {(L√c + 1)/(ℓ√c − 1)} supu∈U ∥�Ψ−1u0 ∥∞∥�Ψu0∥∞","inline":true,"padRight":true},{"text":"is uniformly bounded. In turn, since ","element":"span"},{"text":"inf","element":"span"},{"style":{"height":17.82},"width":487.74,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-36.png","element":"img","alt":"u∈U mini⩽n wui ⩾ c(1 − c","inline":true},{"text":") is bounded away from zero, we have ¯","element":"span"},{"style":{"height":20.8},"width":559.9,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-37.png","element":"img","alt":"κ2˜c ⩾�c(1 − c)κ2˜c by their","inline":true,"padRight":true},{"text":"definitions in ","element":"span"},{"href":"#id-180","text":"(I.1) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-187","text":"(I.2)","element":"a"},{"text":". It follows that ","element":"span"},{"style":{"height":10.72},"width":59.29,"height":26.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-38.png","element":"img","alt":" κ2˜c","inline":true,"padRight":true},{"text":"is bounded away from zero by the condition on ","element":"span"},{"style":{"height":15.02},"width":59.64,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-39.png","element":"img","alt":" sℓn","inline":true,"padRight":true},{"text":"sparse eigenvalues stated in Assumption ","element":"span"},{"href":"#id-121","text":"6.2(","element":"a"},{"text":"iv)(c), see ","element":"span"},{"href":"#id-43","referenceIndex":24,"text":"Bickel et al. ","element":"a"},{"href":"#id-43","referenceIndex":24,"text":"(2009) ","element":"a"},{"text":"Lemma 4.1(ii).","element":"span"}],[{"text":"By the choice of ","element":"span"},{"style":{"height":21.07},"width":1064.7,"height":52.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-40.png","element":"img","alt":" γ and du fixed, λ ∈ [cn1/2 log1/2(p∨n), Cn1/2 log1/2(p∨n","inline":true},{"text":")]. By relation ","element":"span"},{"href":"#id-188","text":"(I.4) ","element":"a"},{"text":"and Assumption ","element":"span"},{"href":"#id-121","text":"6.2(","element":"a"},{"text":"iv)(a), inf","element":"span"},{"style":{"height":17.77},"width":477.09,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-41.png","element":"img","alt":"u∈U ¯qAu ⩾ c′¯κ2˜c/{√sKn}","inline":true},{"text":". Under the condition ","element":"span"},{"style":{"height":19.87},"width":454.79,"height":49.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-42.png","element":"img","alt":" K2ns2 log2(p ∨ n) ⩽ δnn,","inline":true,"padRight":true},{"text":"the side condition in Lemma ","element":"span"},{"href":"#id-167","text":"I.6 ","element":"a"},{"text":"holds with probability 1 ","element":"span"},{"style":{"height":8.4},"width":64.64,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-43.png","element":"img","alt":" − o","inline":true},{"text":"(1), and the lemma yields","element":"span"}],[{"style":{"width":"87%"},"width":1639,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-44.png","element":"img"}],[{"text":"In turn, under Assumption ","element":"span"},{"href":"#id-121","text":"6.2(","element":"a"},{"text":"iv)(c) and ","element":"span"},{"style":{"height":19.87},"width":431.37,"height":49.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-45.png","element":"img","alt":" K2ns2 log2(p∨n) ⩽ δnn","inline":true},{"text":", with probability 1","element":"span"},{"style":{"height":17.6},"width":271.29,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-46.png","element":"img","alt":"−o(1) Lemma","inline":true,"padRight":true},{"href":"#id-183","text":"I.7 ","element":"a"},{"text":"implies","element":"span"}],[{"style":{"width":"33%"},"width":620,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/83-47.png","element":"img"}],[{"style":{"width":"99%"},"width":1871,"height":261,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-0.png","element":"img"}],[{"text":"for some ","element":"span"},{"text":"¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", since by ","element":"span"},{"href":"#id-189","text":"(N.16) ","element":"a"},{"text":"we have uniformly in ","element":"span"},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-1.png","element":"img","alt":" u ∈ U","inline":true}],[{"style":{"height":22.23},"width":1659.86,"height":55.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-2.png","element":"img","alt":"Mu(˜θu) − Mu(θu) ⩽ Mu(�θu) − Mu(θu) ⩽ λn∥�Ψuθu∥1 − λn∥�Ψu�θu∥1 ⩽ λn∥�Ψu(�θuTu − θu)∥1","inline":true},{"style":{"height":19.21},"width":366.21,"height":48.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-3.png","element":"img","alt":"⩽ ¯C′s log(p ∨ n)/n,","inline":true}],[{"text":"sup","element":"span"},{"style":{"height":20.8},"width":759.44,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-4.png","element":"img","alt":"u∈U ∥En[f(X)ζu]∥∞ ⩽ C�log(p ∨ n)/n","inline":true,"padRight":true},{"text":"by Lemma ","element":"span"},{"href":"#id-179","text":"I.1, ","element":"a"},{"style":{"height":17.6},"width":244.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-5.png","element":"img","alt":" φmin(�su + su","inline":true},{"text":") is bounded away from zero (by Assumption ","element":"span"},{"href":"#id-121","text":"6.2(","element":"a"},{"text":"iv)(c) and ","element":"span"},{"style":{"height":17.6},"width":933.18,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-6.png","element":"img","alt":" �su ⩽ C′′′s), infu∈U ψu({δ ∈ Rp : ∥δ∥0 ⩽ �su + su}","inline":true},{"text":") is bounded away from zero (because inf","element":"span"},{"style":{"height":17.82},"width":471.51,"height":44.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-7.png","element":"img","alt":"u∈U mini⩽n wui ⩾ c(1 − c","inline":true},{"text":")), and sup","element":"span"},{"style":{"height":18.19},"width":323.02,"height":45.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-8.png","element":"img","alt":"u∈U ∥�Ψu0∥∞ ⩽ C","inline":true,"padRight":true},{"text":"with probability 1 ","element":"span"},{"style":{"height":17.6},"width":132.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-9.png","element":"img","alt":" − o(1).","inline":true}],[{"text":"In the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":"th iteration, the penalty loadings are constructed based on (","element":"span"},{"style":{"height":23.2},"width":152.68,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-10.png","element":"img","alt":"�θ(k)u )u∈U","inline":true},{"text":", defined as ","element":"span"},{"style":{"height":17.02},"width":132.74,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-11.png","element":"img","alt":" �Ψujj =","inline":true},{"style":{"height":23.82},"width":1128.04,"height":59.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-12.png","element":"img","alt":"En[|fj(X){Yu − Λ(f(X)′�θ(k)u )}|2]}1/2 for j = 1, . . . , p, u ∈ U","inline":true},{"text":". We assume (","element":"span"},{"style":{"height":23.2},"width":152.68,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-13.png","element":"img","alt":"�θ(k)u )u∈U ","inline":true,"padRight":true},{"text":"satisfy the rates ","element":"span"},{"text":"above uniformly in ","element":"span"},{"style":{"height":13.2},"width":240.6,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-14.png","element":"img","alt":" u ∈ U. Then","inline":true}],[{"style":{"width":"88%"},"width":1654,"height":268,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-15.png","element":"img"}],[{"text":"and therefore, provided that (2","element":"span"},{"style":{"height":17.77},"width":381.29,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-16.png","element":"img","alt":"Cδn/c) < 1 − 1/ 4√c","inline":true},{"text":", uniformly in ","element":"span"},{"style":{"height":15.6},"width":601.02,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-17.png","element":"img","alt":" u ∈ U, ℓ�Ψu0 ⩽ �Ψu ⩽ L�Ψu0 for","inline":true},{"style":{"height":17.77},"width":307.52,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-18.png","element":"img","alt":"ℓ > 1/ 4√c and L","inline":true,"padRight":true},{"text":"uniformly bounded with probability 1 ","element":"span"},{"style":{"height":8.4},"width":64.68,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-19.png","element":"img","alt":" − o","inline":true},{"text":"(1). Then the same proof for the initial penalty loading choice applies to the iterate (","element":"span"},{"style":{"height":17.6},"width":992.85,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-20.png","element":"img","alt":"k + 1). ■","inline":true}],[{"text":"N.1. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proofs for Lasso with Functional Response: Penalty Level.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-179","style":{"fontStyle":"italic"},"text":"I.1. ","element":"a"},{"text":"By the triangle inequality","element":"span"}],[{"style":{"width":"93%"},"width":1748,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-21.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":12.8},"width":47.64,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-22.png","element":"img","alt":" Uϵ ","inline":true,"padRight":true},{"text":"is a minimal ","element":"span"},{"style":{"height":13.2},"width":189.1,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-23.png","element":"img","alt":" ϵ-net of U","inline":true},{"text":". We will set ","element":"span"},{"style":{"height":19.53},"width":515.38,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-24.png","element":"img","alt":" ϵ = 1/n so that |Uϵ| ⩽ ndu.","inline":true}],[{"text":"The proofs in this section rely on the following result due to ","element":"span"},{"href":"#id-190","referenceIndex":67,"text":"Jing et al. ","element":"a"},{"href":"#id-190","referenceIndex":67,"text":"(2003)","element":"a"},{"text":".","element":"span"}],[{"id":"id-191","style":{"height":17.6},"width":1569.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-25.png","element":"img","alt":"Lemma N.1 (Moderate deviations for self-normalized sums). Let Z1,. . ., Zn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be independent, zero-mean random variables and ","element":"span"},{"style":{"height":21.45},"width":932.16,"height":53.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-26.png","element":"img","alt":" µ ∈ (0, 1]. Let Sn,n = �ni=1 Zi, V 2n,n = �ni=1 Z2i ,","inline":true}],[{"style":{"width":"61%"},"width":1150,"height":183,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-27.png","element":"img"}],[{"style":{"height":15.02},"width":289.66,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-28.png","element":"img","alt":"and 0 < ℓn ⩽ n","inline":true}],[{"style":{"width":"59%"},"width":1112,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/84-29.png","element":"img"}],[{"text":"For each ","element":"span"},{"style":{"height":16.4},"width":587.59,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-0.png","element":"img","alt":" j = 1, . . . , p, and each u ∈ Uϵ","inline":true},{"text":", we will apply Lemma ","element":"span"},{"href":"#id-191","text":"N.1 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":18.22},"width":400.14,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-1.png","element":"img","alt":" Zi := fj(Xi)ζui, and","inline":true}],[{"id":"id-193","style":{"width":"99%"},"width":1867,"height":259,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-2.png","element":"img"}],[{"text":"provided that max","element":"span"},{"style":{"height":20.95},"width":1274.38,"height":52.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-3.png","element":"img","alt":"u,j{¯EP [|fj(X)ζu|3]1/3/¯EP [|fj(X)ζu|2]1/2}Φ−1(1−γ/2pNn) ⩽ δnn1/6","inline":true},{"text":", which holds by Condition WL since ","element":"span"},{"style":{"height":17.6},"width":152.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-4.png","element":"img","alt":" γ ⩾ 1/n","inline":true,"padRight":true},{"text":"(under this condition there is ","element":"span"},{"style":{"height":15.02},"width":152.62,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-5.png","element":"img","alt":" ℓn → ∞","inline":true,"padRight":true},{"text":"obeying conditions of Lemma ","element":"span"},{"href":"#id-191","text":"N.1)","element":"a"}],[{"text":"Moreover, by triangle inequality we have","element":"span"}],[{"id":"id-192","style":{"width":"80%"},"width":1514,"height":329,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-6.png","element":"img"}],[{"text":"To control the first term in ","element":"span"},{"href":"#id-192","text":"(N.3) ","element":"a"},{"text":"we note that by Condition WL, ","element":"span"},{"style":{"height":17.02},"width":101.04,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-7.png","element":"img","alt":"�Ψu0jj","inline":true,"padRight":true},{"text":"is bounded away from zero with probability 1 ","element":"span"},{"style":{"height":8.4},"width":62.77,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-8.png","element":"img","alt":" − o","inline":true},{"text":"(1) uniformly over ","element":"span"},{"style":{"height":16.4},"width":428.95,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-9.png","element":"img","alt":" u ∈ U and j = 1, . . . , p","inline":true},{"text":". Thus we have uniformly over","element":"span"}],[{"style":{"width":"99%"},"width":1867,"height":141,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-10.png","element":"img"}],[{"text":"with the same probability. Moreover, we have","element":"span"}],[{"style":{"width":"99%"},"width":1867,"height":390,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-11.png","element":"img"}],[{"text":"By ","element":"span"},{"href":"#id-193","text":"(N.2)","element":"a"}],[{"style":{"width":"50%"},"width":952,"height":84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-12.png","element":"img"}],[{"text":"with probability 1 ","element":"span"},{"style":{"height":8.4},"width":64.64,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-13.png","element":"img","alt":" − o","inline":true},{"text":"(1), so that with the same probability","element":"span"}],[{"style":{"width":"74%"},"width":1398,"height":201,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-14.png","element":"img"}],[{"text":"where the last inequality follows by Condition WL(iii).","element":"span"}],[{"text":"The last term in ","element":"span"},{"href":"#id-192","text":"(N.3) ","element":"a"},{"text":"is of the order ","element":"span"},{"style":{"height":20.33},"width":141.54,"height":50.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-15.png","element":"img","alt":" o(n−1/2","inline":true},{"text":") with probability 1 ","element":"span"},{"style":{"height":8.4},"width":62.53,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-16.png","element":"img","alt":" − o","inline":true},{"text":"(1) since by Condition WL,","element":"span"}],[{"style":{"width":"49%"},"width":923,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/85-17.png","element":"img"}],[{"text":"with probability 1 ","element":"span"},{"style":{"height":15.42},"width":101.07,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-0.png","element":"img","alt":" − ∆n","inline":true},{"text":", and noting that by Condition WL sup","element":"span"},{"style":{"height":20.74},"width":232.64,"height":51.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-1.png","element":"img","alt":"u∈U ∥�Ψ−1u0 ∥∞","inline":true,"padRight":true},{"text":"is uniformly bounded ","element":"span"},{"text":"with probability at least 1 ","element":"span"},{"style":{"height":17.6},"width":244.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-2.png","element":"img","alt":" − o(1) − ∆n.","inline":true}],[{"text":"The results above imply that ","element":"span"},{"href":"#id-192","text":"(N.3) ","element":"a"},{"text":"is bounded by ","element":"span"},{"style":{"height":17.77},"width":161.1,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-3.png","element":"img","alt":" o(1)/√n","inline":true,"padRight":true},{"text":"with probability 1 ","element":"span"},{"style":{"height":17.6},"width":269.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-4.png","element":"img","alt":" − o(1). Since","inline":true}],[{"style":{"width":"99%"},"width":1871,"height":285,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-5.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-115","style":{"fontStyle":"italic"},"text":"I.2. ","element":"a"},{"text":"We start with the last statement of the lemma since it is more difficult (others will use similar calculations). Consider the class of functions ","element":"span"},{"style":{"height":17.6},"width":712.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-6.png","element":"img","alt":" F = {Yu : u ∈ U}, F′ = {EP [Yu | X] :","inline":true},{"style":{"height":19.13},"width":1127.45,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-7.png","element":"img","alt":"u ∈ U}, and G = {ζ2u = (Yu − EP [Yu | X])2 : u ∈ U}. Let F","inline":true,"padRight":true},{"text":"be a measurable envelope for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"text":"which ","element":"span"},{"text":"satisfies ","element":"span"},{"style":{"height":14.62},"width":159.94,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-8.png","element":"img","alt":" F ⩽ Bn.","inline":true}],[{"text":"Because ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"text":"is a VC-class of functions with VC index ","element":"span"},{"style":{"height":15.02},"width":88.2,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-9.png","element":"img","alt":" C′du","inline":true},{"text":", by Lemma ","element":"span"},{"href":"#id-135","text":"K.1(","element":"a"},{"text":"1) we have","element":"span"}],[{"id":"id-196","style":{"width":"74%"},"width":1395,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-10.png","element":"img"}],[{"text":"To bound the covering number for ","element":"span"},{"style":{"height":13.2},"width":51.7,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-11.png","element":"img","alt":" F′ ","inline":true,"padRight":true},{"text":"we apply Lemma ","element":"span"},{"href":"#id-171","text":"K.2, ","element":"a"},{"text":"and since E[","element":"span"},{"style":{"height":17.6},"width":392.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-12.png","element":"img","alt":"F | X] ⩽ F, we have","inline":true}],[{"id":"id-194","style":{"width":"82%"},"width":1538,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-13.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":19.14},"width":467.6,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-14.png","element":"img","alt":" G ⊂ (F − F′)2, G = 4F 2 ","inline":true,"padRight":true},{"text":"is an envelope for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"text":"and the covering number for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"text":"satisfies","element":"span"}],[{"id":"id-195","style":{"width":"96%"},"width":1808,"height":306,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-15.png","element":"img"}],[{"text":"where (i) and (ii) follow by Lemma ","element":"span"},{"href":"#id-135","text":"K.1(","element":"a"},{"text":"2), and (iii) follows from ","element":"span"},{"href":"#id-194","text":"(N.7)","element":"a"},{"text":".","element":"span"}],[{"text":"Hence, the entropy bound for the class ","element":"span"},{"style":{"height":22.02},"width":1042.36,"height":55.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-16.png","element":"img","alt":" M = ∪j∈[p]Mj, where Mj = {f2j (X)G}, j ∈ [p] and","inline":true,"padRight":true},{"text":"envelope ","element":"span"},{"style":{"height":19.05},"width":412.35,"height":47.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-17.png","element":"img","alt":" M = 4K2nF 2, satisfies","inline":true}],[{"style":{"width":"82%"},"width":1545,"height":342,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-18.png","element":"img"}],[{"text":"where (a) follows by Lemma ","element":"span"},{"href":"#id-135","text":"K.1(","element":"a"},{"text":"2) for union of classes, (b) holds by Lemma ","element":"span"},{"href":"#id-135","text":"K.1(","element":"a"},{"text":"2) when one class has only a single function, (c) by ","element":"span"},{"href":"#id-195","text":"(N.8) ","element":"a"},{"text":"and (d) follows from ","element":"span"},{"href":"#id-196","text":"(N.6) ","element":"a"},{"text":"and ","element":"span"},{"style":{"height":13.6},"width":69.82,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-19.png","element":"img","alt":" ϵ ⩽","inline":true,"padRight":true},{"text":"1. Therefore, since sup","element":"span"},{"style":{"height":22.02},"width":451.45,"height":55.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-20.png","element":"img","alt":"u∈U maxj⩽p EP [f2j (X)ζ2u","inline":true},{"text":"] is bounded away from zero and from above, by Lemma ","element":"span"},{"href":"#id-143","text":"C.1 ","element":"a"},{"text":"we have ","element":"span"},{"text":"with probability 1 ","element":"span"},{"style":{"height":17.6},"width":347.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-21.png","element":"img","alt":" − O(1/ log n) that","inline":true}],[{"style":{"width":"89%"},"width":1682,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/86-22.png","element":"img"}],[{"text":"using the envelope ","element":"span"},{"style":{"height":19.05},"width":551.26,"height":47.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-0.png","element":"img","alt":" M = 4K2nB2n, v = C′, a = pn","inline":true,"padRight":true},{"text":"and a constant ","element":"span"},{"style":{"height":8},"width":38.5,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-1.png","element":"img","alt":" σ.","inline":true}],[{"text":"Consider the first term. By Lemma ","element":"span"},{"href":"#id-143","text":"C.1 ","element":"a"},{"text":"we have with probability 1 ","element":"span"},{"style":{"height":17.6},"width":347.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-2.png","element":"img","alt":" − O(1/ log n) that","inline":true}],[{"style":{"width":"94%"},"width":1771,"height":218,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-3.png","element":"img"}],[{"text":"using the envelope ","element":"span"},{"style":{"height":16},"width":584.26,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-4.png","element":"img","alt":" F = 2KnBn, v = C′, a = pn","inline":true},{"text":", the entropy bound in Lemma ","element":"span"},{"href":"#id-171","text":"K.2, ","element":"a"},{"text":"and ","element":"span"},{"style":{"height":15.13},"width":97.68,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-5.png","element":"img","alt":" σ2 ∝","inline":true},{"style":{"height":17.35},"width":404.68,"height":43.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-6.png","element":"img","alt":"Lnn−ν ⩽ F 2 for all n","inline":true,"padRight":true},{"text":"sufficiently large, because ","element":"span"},{"style":{"height":16.8},"width":299.82,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-7.png","element":"img","alt":" Lnn−ν ↘ 0 and","inline":true}],[{"style":{"width":"93%"},"width":1758,"height":182,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-8.png","element":"img"}],[{"text":"To bound the second term in the statement of the lemma, it follows that","element":"span"}],[{"style":{"width":"97%"},"width":1816,"height":278,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-9.png","element":"img"}],[{"text":"where the first inequality holds by Jensen’s inequality, and the second inequality holds by assumption. Since ","element":"span"},{"style":{"height":20.95},"width":631.76,"height":52.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-10.png","element":"img","alt":" c ⩽ maxj⩽p{EP [fj(X)2]}1/2 ⩽ C","inline":true},{"text":", the result follows by Lemma ","element":"span"},{"href":"#id-143","text":"C.1 ","element":"a"},{"text":"which yields with probability 1 ","element":"span"},{"style":{"height":17.6},"width":252.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-11.png","element":"img","alt":" − O(1/ log n)","inline":true}],[{"style":{"width":"81%"},"width":1532,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-12.png","element":"img"}],[{"text":"where we used the choice ","element":"span"},{"style":{"height":19.05},"width":1366.18,"height":47.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-13.png","element":"img","alt":" C ⩽ σ = C′ ⩽ F = K2n, v = C, a = pn. ■","inline":true}],[{"text":"N.2. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proofs for Lasso with Functional Response: Linear Case.","element":"span"}],[{"href":"#id-177","style":{"height":16.4},"width":748.02,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-14.png","element":"img","alt":"Proof of Lemma I.3. Let �δu = �θu − θu","inline":true},{"text":". Throughout the proof we assume that the events ","element":"span"},{"style":{"height":19.05},"width":91.58,"height":47.63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-15.png","element":"img","alt":" c2r ⩾","inline":true,"padRight":true},{"text":"sup","element":"span"},{"style":{"height":20.74},"width":1487.62,"height":51.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-16.png","element":"img","alt":"u∈U En[r2u], λ/n ⩾ c supu∈U ∥�Ψ−1u0 En[ζuf(X)]∥∞ and ℓ�Ψu0 ⩽ �Ψu ⩽ L�Ψu0 occur.","inline":true}],[{"text":"By definition of ","element":"span"},{"style":{"height":15.6},"width":54.09,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-17.png","element":"img","alt":"�θu,","inline":true}],[{"style":{"width":"67%"},"width":1255,"height":137,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-18.png","element":"img"}],[{"text":"and ","element":"span"},{"style":{"height":15.6},"width":542.32,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-19.png","element":"img","alt":" ℓ�Ψu0 ⩽ �Ψu ⩽ L�Ψu0, we have","inline":true}],[{"style":{"width":"71%"},"width":1347,"height":306,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/87-20.png","element":"img"}],[{"id":"id-197","style":{"width":"99%"},"width":1867,"height":445,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-0.png","element":"img"}],[{"text":"Let","element":"span"}],[{"style":{"width":"33%"},"width":626,"height":93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-1.png","element":"img"}],[{"text":"Therefore if ","element":"span"},{"style":{"height":19.09},"width":878.47,"height":47.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-2.png","element":"img","alt":"�δu ̸∈ ∆˜c,u = {δ ∈ Rp : ∥δT cu∥1 ⩽ ˜c∥δTu∥1}","inline":true},{"text":", we have that ","element":"span"},{"style":{"height":21.29},"width":429.17,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-3.png","element":"img","alt":"�L + 1c�∥�Ψu0�δuTu∥1 ⩽","inline":true}],[{"style":{"width":"63%"},"width":1180,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-4.png","element":"img"}],[{"text":"Otherwise assume ","element":"span"},{"style":{"height":17.92},"width":178.48,"height":44.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-5.png","element":"img","alt":"�δu ∈ ∆˜c,u","inline":true},{"text":". In this case ","element":"span"},{"href":"#id-197","text":"(N.12)","element":"a"},{"text":", the definition of ","element":"span"},{"style":{"height":17.77},"width":551.6,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-6.png","element":"img","alt":" κ˜c, and ∥�δuTu∥1 ⩽ √s∥�δuTu∥,","inline":true,"padRight":true},{"text":"we have","element":"span"}],[{"id":"id-198","style":{"width":"98%"},"width":1853,"height":277,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-7.png","element":"img"}],[{"text":"To establish the ","element":"span"},{"style":{"height":15.02},"width":35.18,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-8.png","element":"img","alt":" ℓ1","inline":true},{"text":"-bound, first assume that ","element":"span"},{"style":{"height":17.92},"width":194.4,"height":44.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-9.png","element":"img","alt":"�δu ∈ ∆2˜c,u","inline":true},{"text":". In that case","element":"span"}],[{"style":{"width":"64%"},"width":1206,"height":142,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-10.png","element":"img"}],[{"text":"where we used that ","element":"span"},{"style":{"height":17.77},"width":388.92,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-11.png","element":"img","alt":" ∥�δuTu∥1 ⩽ √s∥�δuTu∥","inline":true},{"text":", the definition of the restricted eigenvalue, and the prediction rate derived in ","element":"span"},{"href":"#id-198","text":"(N.13)","element":"a"},{"text":".","element":"span"}],[{"text":"Otherwise note that ","element":"span"},{"style":{"height":17.92},"width":194.39,"height":44.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-12.png","element":"img","alt":"�δu ̸∈ ∆2˜c,u","inline":true,"padRight":true},{"text":"implies that","element":"span"},{"style":{"height":21.29},"width":973.32,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-13.png","element":"img","alt":"�L + 1c�∥�Ψu0�δuTu∥1 ⩽ 12�ℓ − 1c�∥�Ψu0�δuT cu∥1 so that","inline":true,"padRight":true},{"href":"#id-197","text":"(N.12) ","element":"a"},{"text":"yields","element":"span"}],[{"style":{"width":"83%"},"width":1558,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-14.png","element":"img"}],[{"text":"where we used that max","element":"span"},{"style":{"height":19.14},"width":293.27,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-15.png","element":"img","alt":"t t(2cr − t) ⩽ c2r","inline":true},{"text":". Therefore","element":"span"}],[{"style":{"width":"94%"},"width":1764,"height":168,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-16.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-182","style":{"fontStyle":"italic"},"text":"I.4. ","element":"a"},{"text":"Step 1. Let ","element":"span"},{"style":{"height":32.4},"width":722.42,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-17.png","element":"img","alt":" Lu = 4c0∥�Ψ−1u0 ∥∞�ncrλ +√sκ˜c ∥�Ψu0∥∞�.","inline":true,"padRight":true},{"text":"By Step 2 below and the definition of ","element":"span"},{"style":{"height":15.02},"width":216.76,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-18.png","element":"img","alt":" Lu we have","inline":true}],[{"id":"id-199","style":{"width":"58%"},"width":1098,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-19.png","element":"img"}],[{"text":"Consider any ","element":"span"},{"style":{"height":19.73},"width":896,"height":49.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-20.png","element":"img","alt":" M ∈ M = {m ∈ N : m > 2φmax(m) supu∈U L2u}","inline":true},{"text":", and suppose ","element":"span"},{"style":{"height":14.62},"width":159.33,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/88-21.png","element":"img","alt":" �su > M.","inline":true}],[{"text":"Next recall the sublinearity of the maximum sparse eigenvalue (for a proof see Lemma 3 in ","element":"span"},{"href":"#id-48","referenceIndex":11,"text":"Belloni and Chernozhukov ","element":"a"},{"href":"#id-48","referenceIndex":11,"text":"(2013)","element":"a"},{"text":"), namely, for any integer ","element":"span"},{"style":{"height":14.8},"width":77.97,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-0.png","element":"img","alt":" k ⩾","inline":true,"padRight":true},{"text":"0 and constant ","element":"span"},{"style":{"height":14.8},"width":288.5,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-1.png","element":"img","alt":" ℓ ⩾ 1 we have","inline":true},{"style":{"height":17.6},"width":641.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-2.png","element":"img","alt":"φmax(ℓk) ⩽ ⌈ℓ⌉φmax(k), where ⌈ℓ⌉","inline":true,"padRight":true},{"text":"denotes the ceiling of ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-3.png","element":"img","alt":" ℓ","inline":true},{"text":". Therefore","element":"span"}],[{"style":{"width":"44%"},"width":838,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-4.png","element":"img"}],[{"text":"Thus, since ","element":"span"},{"style":{"height":19.13},"width":985.55,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-5.png","element":"img","alt":" ⌈k⌉ ⩽ 2k for any k ⩾ 1 we have M ⩽ 2φmax(M)L2u ","inline":true,"padRight":true},{"text":"which violates the condition that ","element":"span"},{"style":{"height":13.6},"width":153.45,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-6.png","element":"img","alt":"M ∈ M","inline":true},{"text":". Therefore, we have ","element":"span"},{"style":{"height":14.62},"width":159.33,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-7.png","element":"img","alt":" �su ⩽ M.","inline":true}],[{"text":"In turn, applying ","element":"span"},{"href":"#id-199","text":"(N.14) ","element":"a"},{"text":"once more with ","element":"span"},{"style":{"height":19.13},"width":694,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-8.png","element":"img","alt":" �su ⩽ M we obtain �su ⩽ φmax(M)L2u.","inline":true,"padRight":true},{"text":"The result follows ","element":"span"},{"text":"by minimizing the bound over ","element":"span"},{"style":{"height":13.6},"width":164.83,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-9.png","element":"img","alt":" M ∈ M.","inline":true}],[{"style":{"width":"73%"},"width":1378,"height":175,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-10.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":19.41},"width":1792.01,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-11.png","element":"img","alt":" Ru = (ru1, . . . , run)′, Yu = (Yu1, . . . , Yun)′, ¯ζu = (ζu1, . . . , ζun)′, and F = [f(X1); . . . ; f(Xn)]′.","inline":true,"padRight":true},{"text":"We have from the optimality conditions that the Lasso estimator ","element":"span"},{"style":{"height":15.02},"width":202.58,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-12.png","element":"img","alt":"�θu satisfies","inline":true}],[{"style":{"width":"87%"},"width":1636,"height":608,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-13.png","element":"img"}],[{"text":"where we used that ","element":"span"},{"style":{"height":23.8},"width":430.54,"height":59.5,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-14.png","element":"img","alt":" ∥v∥ ⩽ ∥v∥1/20 ∥v∥∞ and","inline":true}],[{"style":{"width":"99%"},"width":1869,"height":764,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-15.png","element":"img"}],[{"text":"The result follows by noting that (","element":"span"},{"style":{"height":17.6},"width":536.61,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-16.png","element":"img","alt":"L + [1/c])/(1 − 1/[ℓc]) = c0ℓ","inline":true,"padRight":true},{"text":"by definition of ","element":"span"},{"style":{"height":10.62},"width":49.81,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/89-17.png","element":"img","alt":" c0.","inline":true}],[{"style":{"width":"1%"},"width":21,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-0.png","element":"img"}],[{"href":"#id-178","style":{"height":19.41},"width":1872.68,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-1.png","element":"img","alt":"Proof of Lemma I.5. Define mu := (E[Yu1 | X1], . . . , E[Yun | Xn])′, ¯ζu := (ζu1, . . . , ζun)′, and the","inline":true},{"style":{"height":17.6},"width":787.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-2.png","element":"img","alt":"n × p matrix F := [f(X1); . . . ; f(Xn)]′.","inline":true,"padRight":true},{"text":"For a set of indices ","element":"span"},{"style":{"height":17.6},"width":632.03,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-3.png","element":"img","alt":" S ⊂ {1, . . . , p} we define �PS =","inline":true},{"style":{"height":19.13},"width":456.94,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-4.png","element":"img","alt":"F[S](F[S]′F[S])−1F[S]′ ","inline":true,"padRight":true},{"text":"denote the projection matrix on the columns associated with the indices in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"where we interpret ","element":"span"},{"style":{"height":14.7},"width":50.02,"height":36.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-5.png","element":"img","alt":"�PS","inline":true,"padRight":true},{"text":"as a null operator if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"S ","element":"span"},{"text":"is empty.","element":"span"}],[{"text":"Since ","element":"span"},{"style":{"height":16.4},"width":459.12,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-6.png","element":"img","alt":" Yui = mui + ζui we have","inline":true}],[{"style":{"width":"34%"},"width":647,"height":64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-7.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"I ","element":"span"},{"text":"is the identity operator. Therefore","element":"span"}],[{"id":"id-200","style":{"width":"70%"},"width":1323,"height":64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-8.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":20.8},"width":870.15,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-9.png","element":"img","alt":" ∥F[ �Tu]/√n(F[ �Tu]′F[ �Tu]/n)−1∥ ⩽�1/φmin(˜su","inline":true},{"text":"), the last term in ","element":"span"},{"href":"#id-200","text":"(N.15) ","element":"a"},{"text":"satisfies","element":"span"}],[{"style":{"width":"43%"},"width":819,"height":120,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-10.png","element":"img"}],[{"text":"By Lemma ","element":"span"},{"href":"#id-179","text":"I.1 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":17.6},"width":152.83,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-11.png","element":"img","alt":" γ = 1/n","inline":true},{"text":", we have that with probability 1 ","element":"span"},{"style":{"height":8.4},"width":64.64,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-12.png","element":"img","alt":" − o","inline":true},{"text":"(1), uniformly in ","element":"span"},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-13.png","element":"img","alt":" u ∈ U","inline":true}],[{"style":{"width":"86%"},"width":1622,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-14.png","element":"img"}],[{"text":"The result follows.","element":"span"}],[{"text":"The last statement follows from noting that the mean square approximation error provides an upper bound to the best mean square approximation error based on the model ","element":"span"},{"style":{"height":14.62},"width":45.5,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-15.png","element":"img","alt":"�Tu","inline":true,"padRight":true},{"text":"provided that the model include the Lasso’s mode, i.e. ","element":"span"},{"style":{"height":14.62},"width":150.79,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-16.png","element":"img","alt":"�Tu ⊆ �Tu","inline":true},{"text":". Indeed, we have","element":"span"}],[{"style":{"width":"84%"},"width":1579,"height":459,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-17.png","element":"img"}],[{"text":"where we invoked Lemma ","element":"span"},{"href":"#id-177","text":"I.3 ","element":"a"},{"text":"to bound ","element":"span"},{"style":{"height":18.37},"width":1104,"height":45.93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-18.png","element":"img","alt":" ∥f(X)′(�θu − θu)∥Pn,2. ■","inline":true}],[{"text":"N.3. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proofs for Lasso with Functional Response: Logistic Case.","element":"span"}],[{"href":"#id-167","style":{"height":17.6},"width":1140.19,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-19.png","element":"img","alt":"Proof of Lemma I.6. Let δu = �θu−θu and Su = −En[f(X)ζu","inline":true},{"text":"]. By definition of ","element":"span"},{"style":{"height":17.6},"width":386.78,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-20.png","element":"img","alt":"�θu we have Mu(�θu)+","inline":true}],[{"id":"id-189","style":{"width":"99%"},"width":1864,"height":184,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-21.png","element":"img"}],[{"id":"id-201","text":"Moreover, by convexity of ","element":"span"},{"style":{"height":17.6},"width":92.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-22.png","element":"img","alt":" Mu(·","inline":true},{"text":") and H¨older’s inequality we have","element":"span"}],[{"style":{"width":"93%"},"width":1756,"height":60,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/90-23.png","element":"img"}],[{"text":"because","element":"span"}],[{"id":"id-203","style":{"width":"99%"},"width":1868,"height":409,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-0.png","element":"img"}],[{"text":"Combining ","element":"span"},{"href":"#id-189","text":"(N.16) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-201","text":"(N.17) ","element":"a"},{"text":"we have","element":"span"}],[{"style":{"width":"99%"},"width":1867,"height":314,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-1.png","element":"img"}],[{"text":"Suppose ","element":"span"},{"style":{"height":19.09},"width":921.11,"height":47.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-2.png","element":"img","alt":" δu ̸∈ ∆2˜c,u, namely ∥δu,T cu∥1 ⩾ 2˜c∥δu,Tu∥1. Thus,","inline":true}],[{"style":{"width":"91%"},"width":1712,"height":200,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-3.png","element":"img"}],[{"text":"The relation above implies that if ","element":"span"},{"style":{"height":17.92},"width":194.4,"height":44.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-4.png","element":"img","alt":" δu ̸∈ ∆2˜c,u","inline":true}],[{"id":"id-205","style":{"width":"99%"},"width":1868,"height":366,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-5.png","element":"img"}],[{"text":"we have that ","element":"span"},{"style":{"height":15.02},"width":201.49,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-6.png","element":"img","alt":" δu satisfies","inline":true}],[{"id":"id-204","style":{"width":"59%"},"width":1117,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-7.png","element":"img"}],[{"text":"For every ","element":"span"},{"style":{"height":27.62},"width":1628.04,"height":69.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-8.png","element":"img","alt":" u ∈ U, since Au = ∆2˜c,u ∪ {δ : ∥δ∥1 ⩽ 6c∥�Ψ−1u0 ∥∞ℓc−1 nλ∥ru/√wu∥Pn,2∥√wuf(X)′δ∥Pn,2}, it","inline":true,"padRight":true},{"text":"follows that ","element":"span"},{"style":{"height":15.42},"width":147.06,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-9.png","element":"img","alt":" δu ∈ Au","inline":true},{"text":", and we have","element":"span"}],[{"style":{"width":"74%"},"width":1398,"height":343,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-10.png","element":"img"}],[{"text":"where (1) follows by Lemma ","element":"span"},{"href":"#id-202","text":"N.2 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":15.42},"width":52.73,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-11.png","element":"img","alt":" Au","inline":true},{"text":", (2) follows from ","element":"span"},{"href":"#id-203","text":"(N.18) ","element":"a"},{"text":"and ","element":"span"},{"style":{"height":17.6},"width":215.07,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-12.png","element":"img","alt":" |rui| ⩽ |˜rui|","inline":true},{"text":", (3) follows by ","element":"span"},{"style":{"height":18.3},"width":586.26,"height":45.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-13.png","element":"img","alt":"∥�Ψu0δu,Tu∥1 ⩽ ∥�Ψu0∥∞∥δu,Tu∥1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"href":"#id-204","text":"(N.22)","element":"a"},{"text":", (4) follows from simplifications and ","element":"span"},{"style":{"height":17.6},"width":345.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/91-14.png","element":"img","alt":" |rui| ⩽ |˜rui|. Since","inline":true,"padRight":true},{"text":"the inequality (","element":"span"},{"style":{"height":19.13},"width":263.36,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-0.png","element":"img","alt":"x2 ∧ ax) ⩽ bx","inline":true,"padRight":true},{"text":"holding for ","element":"span"},{"style":{"height":16},"width":664.42,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-1.png","element":"img","alt":" x > 0 and b < a < 0 implies x ⩽ b","inline":true},{"text":", the above system of the inequalities, provided that for every ","element":"span"},{"style":{"height":12.8},"width":110.34,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-2.png","element":"img","alt":" u ∈ U","inline":true}],[{"style":{"width":"53%"},"width":994,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-3.png","element":"img"}],[{"text":"implies that","element":"span"}],[{"style":{"height":42.4},"width":1230.03,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-4.png","element":"img","alt":"∥√wuf(X)′δu∥Pn,2 ⩽ 3�(L + 1c)∥�Ψu0∥∞λ√sn¯κ2˜c+ 9˜c∥˜ru/√wu∥Pn,2","inline":true}],[{"text":"The second result follows from the definition of ¯","element":"span"},{"style":{"height":10.72},"width":59.29,"height":26.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-5.png","element":"img","alt":"κ2˜c","inline":true},{"text":", ","element":"span"},{"href":"#id-205","text":"(N.21) ","element":"a"},{"text":"and the bound on ","element":"span"},{"style":{"height":18.74},"width":350.89,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-6.png","element":"img","alt":" ∥√wuf(X)′δu∥Pn,2","inline":true,"padRight":true},{"text":"just derived, namely for every ","element":"span"},{"style":{"height":13.2},"width":275.41,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-7.png","element":"img","alt":" u ∈ U we have","inline":true}],[{"style":{"width":"86%"},"width":1611,"height":216,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-183","style":{"fontStyle":"italic"},"text":"I.7. ","element":"a"},{"text":"The proof of both bounds are similar to the proof of sparsity for the linear case (Lemma ","element":"span"},{"href":"#id-182","text":"I.4) ","element":"a"},{"text":"differing only on the definition of ","element":"span"},{"style":{"height":14.62},"width":49.7,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-9.png","element":"img","alt":" Lu","inline":true,"padRight":true},{"text":"which are a consequence of pre-sparsity bounds established in Step 2 and Step 3.","element":"span"}],[{"text":"Step 1. To establish the first bound by Step 2 below, triangle inequality and the definition of","element":"span"}],[{"style":{"width":"89%"},"width":1670,"height":513,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-10.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":32},"width":421.97,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-11.png","element":"img","alt":" Lu = c0ψ(Au)�3∥�Ψu0∥∞","inline":true}],[{"id":"id-206","style":{"width":"58%"},"width":1098,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-12.png","element":"img"}],[{"text":"which has the same structure as ","element":"span"},{"href":"#id-199","text":"(N.14) ","element":"a"},{"text":"in the Step 1 of the proof of Lemma ","element":"span"},{"href":"#id-182","text":"I.4.","element":"a"}],[{"text":"Consider any ","element":"span"},{"style":{"height":19.72},"width":941.69,"height":49.31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-13.png","element":"img","alt":" M ∈ M = {m ∈ N : m > 2φmax(m) supu∈U L2u}","inline":true},{"text":", and suppose ","element":"span"},{"style":{"height":16.4},"width":328.67,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-14.png","element":"img","alt":" �su > M. By the","inline":true,"padRight":true},{"text":"sublinearity of the maximum sparse eigenvalue (Lemma 3 in ","element":"span"},{"href":"#id-48","referenceIndex":11,"text":"Belloni and Chernozhukov ","element":"a"},{"href":"#id-48","referenceIndex":11,"text":"(2013)","element":"a"},{"text":"), for any integer ","element":"span"},{"style":{"height":14.8},"width":73.63,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-15.png","element":"img","alt":" k ⩾","inline":true,"padRight":true},{"text":"0 and constant ","element":"span"},{"style":{"height":17.6},"width":943.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-16.png","element":"img","alt":" ℓ ⩾ 1 we have φmax(ℓk) ⩽ ⌈ℓ⌉φmax(k), where ⌈ℓ⌉","inline":true,"padRight":true},{"text":"denotes the ceiling of ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-17.png","element":"img","alt":" ℓ","inline":true},{"text":". Therefore","element":"span"}],[{"style":{"width":"44%"},"width":838,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-18.png","element":"img"}],[{"text":"Thus, since ","element":"span"},{"style":{"height":19.13},"width":985.55,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-19.png","element":"img","alt":" ⌈k⌉ ⩽ 2k for any k ⩾ 1 we have M ⩽ 2φmax(M)L2u ","inline":true,"padRight":true},{"text":"which violates the condition that ","element":"span"},{"style":{"height":13.6},"width":153.45,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-20.png","element":"img","alt":"M ∈ M","inline":true},{"text":". Therefore, we have ","element":"span"},{"style":{"height":14.62},"width":146.24,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-21.png","element":"img","alt":" �su ⩽ M","inline":true},{"text":". In turn, applying ","element":"span"},{"href":"#id-206","text":"(N.23) ","element":"a"},{"text":"once more with ","element":"span"},{"style":{"height":15.02},"width":347.05,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-22.png","element":"img","alt":" �su ⩽ M we obtain","inline":true},{"style":{"height":19.14},"width":335.61,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-23.png","element":"img","alt":"�su ⩽ φmax(M)L2u.","inline":true,"padRight":true},{"text":"The result follows by minimizing the bound over ","element":"span"},{"style":{"height":13.6},"width":164.83,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/92-24.png","element":"img","alt":" M ∈ M.","inline":true}],[{"style":{"width":"87%"},"width":1632,"height":452,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-0.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":32},"width":391.24,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-1.png","element":"img","alt":" Lu = 2c0�3∥�Ψu0∥∞","inline":true}],[{"text":"and the proof follows similarly to the Step 1 in the proof of Lemma ","element":"span"},{"href":"#id-182","text":"I.4.","element":"a"}],[{"style":{"width":"97%"},"width":1818,"height":224,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-2.png","element":"img"}],[{"text":"Let Λ","element":"span"},{"style":{"height":17.6},"width":1712.26,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-3.png","element":"img","alt":"ui := EP [Yui | Xi] and Su = −En[f(X)ζu] = −En[(Yu − Λu)f(X)]. Let �Tu = supp(�θu),","inline":true},{"style":{"height":17.6},"width":1872.62,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-4.png","element":"img","alt":"�su = ∥�θu∥0, δu = �θu − θu, and �Λui = exp(f(Xi)′�θu)/{1 + exp(f(Xi)′�θu)}. For any j ∈ �Tu we have","inline":true}],[{"style":{"width":"95%"},"width":1779,"height":455,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-5.png","element":"img"}],[{"text":"Step 3. In this step we show that if max","element":"span"},{"style":{"height":17.82},"width":734.3,"height":44.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-6.png","element":"img","alt":"i⩽n |f(Xi)′(�θu − θu) − ˜rui| ⩽ 1 we have","inline":true}],[{"id":"id-208","style":{"width":"79%"},"width":1495,"height":141,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-7.png","element":"img"}],[{"text":"Note that uniformly in ","element":"span"},{"style":{"height":12.8},"width":129.45,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-8.png","element":"img","alt":" u ∈ U","inline":true},{"text":", Lemma ","element":"span"},{"href":"#id-207","text":"N.5 ","element":"a"},{"text":"establishes that ","element":"span"},{"style":{"height":17.6},"width":665.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-9.png","element":"img","alt":" |�Λui − Λui| ⩽ wui2|f(X)′δu − ˜rui|","inline":true,"padRight":true},{"text":"since max","element":"span"},{"style":{"height":17.82},"width":412.54,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-10.png","element":"img","alt":"i⩽n |f(Xi)′δu − ˜rui| ⩽","inline":true,"padRight":true},{"text":"1 is assumed. Thus, combining this bound with the calculations performed in Step 2 we obtain","element":"span"}],[{"style":{"width":"71%"},"width":1333,"height":59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-11.png","element":"img"}],[{"text":"which implies ","element":"span"},{"href":"#id-208","text":"(N.25)","element":"a"},{"text":". ","element":"span"}],[{"href":"#id-184","style":{"height":21.75},"width":1739.41,"height":54.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-12.png","element":"img","alt":"Proof of Lemma I.8. Let ˜δu = ˜θu − θu and ˜tu = ∥√wuf(X)′˜δu∥Pn,2 and Su = −En[f(X)ζu].","inline":true,"padRight":true},{"text":"By Lemma ","element":"span"},{"href":"#id-202","text":"N.2 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":17.6},"width":774.26,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-13.png","element":"img","alt":" Au = {δ ∈ Rp : ∥δ∥0 ⩽ ˜su + su}, we have","inline":true}],[{"style":{"width":"83%"},"width":1563,"height":245,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/93-14.png","element":"img"}],[{"text":"where the second inequality holds by calculations as in ","element":"span"},{"href":"#id-203","text":"(N.18) ","element":"a"},{"text":"and H¨older’s inequality, and the last inequality follows from","element":"span"}],[{"style":{"height":45.42},"width":1439.7,"height":113.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-0.png","element":"img","alt":"∥˜δu∥1 ⩽�˜su + su∥˜δu∥1 ⩽ √˜su + su�φmin(˜su + su)∥f(X)′˜δu∥Pn,2 ⩽ √˜su + su�φmin(˜su + su)","inline":true}],[{"text":"by the definition ","element":"span"},{"style":{"height":29.51},"width":624.81,"height":73.78,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-1.png","element":"img","alt":" ψu(A) := minδ∈A∥√wuf(X)′δ∥Pn,2∥f(X)′δ∥Pn,2 .","inline":true}],[{"text":"Recall the assumed conditions ¯","element":"span"},{"style":{"height":17.6},"width":164.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-2.png","element":"img","alt":"qAu/6 >","inline":true}],[{"style":{"width":"84%"},"width":1581,"height":233,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-3.png","element":"img"}],[{"text":"so that ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":31.6},"width":586.57,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-4.png","element":"img","alt":"tu ⩽�0 ∨ {Mu(˜θu) − Mu(θu)}","inline":true,"padRight":true},{"text":"which implies the result. Otherwise, we have","element":"span"}],[{"style":{"width":"78%"},"width":1471,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-5.png","element":"img"}],[{"text":"since for positive numbers ","element":"span"},{"style":{"fontStyle":"italic"},"text":"a","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"b","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c","element":"span"},{"text":", inequality ","element":"span"},{"style":{"height":19.28},"width":762.98,"height":48.19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-6.png","element":"img","alt":" a2 ⩽ b + ac implies a ⩽√b + c, we have","inline":true}],[{"style":{"width":"93%"},"width":1746,"height":186,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-7.png","element":"img"}],[{"text":"N.4. ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Technical Lemmas: Logistic Case. ","element":"span"},{"text":"The proof of the following lower bound builds upon ideas developed in ","element":"span"},{"href":"#id-35","referenceIndex":10,"text":"Belloni and Chernozhukov ","element":"a"},{"href":"#id-35","referenceIndex":10,"text":"(2011) ","element":"a"},{"text":"for high-dimensional quantile regressions.","element":"span"}],[{"id":"id-202","style":{"fontWeight":"bold"},"text":"Lemma N.2 ","element":"span"},{"text":"(Minoration Lemma)","element":"span"},{"style":{"height":16.8},"width":826.08,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-8.png","element":"img","alt":". For any u ∈ U and δ ∈ Au ⊂ Rp, we have","inline":true}],[{"style":{"width":"71%"},"width":1342,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-9.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where","element":"span"}],[{"style":{"width":"32%"},"width":608,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-10.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Step 1. (Minoration). Consider the following non-negative convex function","element":"span"}],[{"style":{"width":"81%"},"width":1518,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-11.png","element":"img"}],[{"text":"Note that if ¯","element":"span"},{"style":{"height":17.6},"width":870.21,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-12.png","element":"img","alt":"qAu = 0 the statement is trivial since Fu(δ) ⩾","inline":true,"padRight":true},{"text":"0. Thus we can assume ¯","element":"span"},{"style":{"height":15.89},"width":158.46,"height":39.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-13.png","element":"img","alt":"qAu > 0.","inline":true}],[{"text":"Step 2 below shows that for any ","element":"span"},{"style":{"height":21.74},"width":1241.92,"height":54.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-14.png","element":"img","alt":" δ = t˜δ ∈ Rp where t ∈ R and ˜δ ∈ Au such that ∥√wuf(X)′δ∥Pn,2 ⩽","inline":true,"padRight":true},{"text":"¯","element":"span"},{"style":{"height":16.7},"width":231.92,"height":41.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-15.png","element":"img","alt":"qAu we have","inline":true}],[{"id":"id-209","style":{"width":"64%"},"width":1200,"height":83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-16.png","element":"img"}],[{"text":"Thus ","element":"span"},{"href":"#id-209","text":"(N.26) ","element":"a"},{"text":"covers the case that ","element":"span"},{"style":{"height":18.74},"width":697.61,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/94-17.png","element":"img","alt":" δ ∈ Au and ∥√wuf(X)′δ∥Pn,2 ⩽ ¯qAu.","inline":true}],[{"text":"In the case that ","element":"span"},{"style":{"height":18.74},"width":703.78,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-0.png","element":"img","alt":" δ ∈ Au and ∥√wuf(X)′δ∥Pn,2 > ¯qAu","inline":true},{"text":", by convexity","element":"span"},{"href":"#id-210","style":{"height":19.56},"width":518.42,"height":48.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-1.png","element":"img","alt":"34 of Fu and Fu(0) = 0 we","inline":true,"padRight":true},{"text":"have","element":"span"}],[{"id":"id-211","style":{"width":"92%"},"width":1727,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-2.png","element":"img"}],[{"text":"where the last step follows by ","element":"span"},{"href":"#id-209","text":"(N.26) ","element":"a"},{"text":"since","element":"span"}],[{"style":{"width":"52%"},"width":985,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-3.png","element":"img"}],[{"text":"Combining ","element":"span"},{"href":"#id-209","text":"(N.26) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-211","text":"(N.27) ","element":"a"},{"text":"we have","element":"span"}],[{"style":{"width":"59%"},"width":1122,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-4.png","element":"img"}],[{"text":"Step 2. (Proof of ","element":"span"},{"href":"#id-209","text":"(N.26)","element":"a"},{"text":") Let ˜","element":"span"},{"style":{"height":10.62},"width":51.29,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-5.png","element":"img","alt":"rui","inline":true,"padRight":true},{"text":"be such that Λ(","element":"span"},{"style":{"height":17.6},"width":887.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-6.png","element":"img","alt":"f(Xi)′θu + ˜rui) = Λ(f(Xi)′θu) + rui = EP [Yui |","inline":true},{"style":{"height":17.6},"width":1872.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-7.png","element":"img","alt":"Xi]. Defining gui(t) = log{1 + exp(f(Xi)′θu + ˜rui + tf(Xi)′δ)}, ˜gui(t) = log{1 + exp(f(Xi)′θu +","inline":true}],[{"id":"id-214","style":{"width":"99%"},"width":1867,"height":521,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-8.png","element":"img"}],[{"text":"Note that the function ","element":"span"},{"style":{"height":12},"width":52.42,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-9.png","element":"img","alt":" gui","inline":true,"padRight":true},{"text":"is three times differentiable and satisfies,","element":"span"}],[{"style":{"width":"70%"},"width":1313,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-10.png","element":"img"}],[{"text":"where Λ","element":"span"},{"style":{"height":17.6},"width":1426.46,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-11.png","element":"img","alt":"ui(t) := exp(f(Xi)′θu + ˜rui + tf(Xi)′δ)/{1 + exp(f(Xi)′θu + ˜rui + tf(X)′δ)}","inline":true},{"text":". Thus we have ","element":"span"},{"style":{"height":18.09},"width":429.04,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-12.png","element":"img","alt":"|g′′′ui(t)| ⩽ |f(X)′δ|g′′ui(t","inline":true},{"text":"). Therefore, by Lemmas ","element":"span"},{"href":"#id-212","text":"N.3 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-213","text":"N.4 ","element":"a"},{"text":"given following the conclusion of this ","element":"span"},{"text":"proof, we have","element":"span"}],[{"id":"id-215","style":{"width":"90%"},"width":1687,"height":152,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-13.png","element":"img"}],[{"text":"Moreover, letting Υ","element":"span"},{"style":{"height":17.6},"width":567.09,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-14.png","element":"img","alt":"ui(t) = ˜gui(t) − gui(t) we have","inline":true}],[{"style":{"width":"54%"},"width":1028,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-15.png","element":"img"}],[{"text":"where ","element":"span"},{"text":"˜","element":"span"},{"text":"Λ","element":"span"},{"style":{"height":17.6},"width":1370.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-16.png","element":"img","alt":"ui(t) := exp(f(Xi)′θu + tf(Xi)′δ)/{1 + exp(f(Xi)′θu + tf(Xi)′δ)}. Thus","inline":true}],[{"id":"id-210","style":{"width":"99%"},"width":1868,"height":210,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/95-17.png","element":"img"}],[{"text":"Therefore, combining ","element":"span"},{"href":"#id-214","text":"(N.28) ","element":"a"},{"text":"with the bounds ","element":"span"},{"href":"#id-215","text":"(N.29) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-210","text":"(N.30) ","element":"a"},{"text":"we have","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"M","element":"span"},{"style":{"height":21.63},"width":1470.99,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-0.png","element":"img","alt":"u(θu + δ) − Mu(θu) − ∂θMu(θu)′δ ⩾ 12En�wu|f(X)′δ|2�− 16En�wu|f(X)′δ|3�","inline":true},{"style":{"height":18.74},"width":664.79,"height":46.85,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-1.png","element":"img","alt":"−2∥˜ru/√wu∥Pn,2∥√wuf(X)′δ∥Pn,2,","inline":true}],[{"style":{"width":"100%"},"width":1875,"height":356,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-2.png","element":"img"}],[{"text":"since the scalar ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"cancels out. Thus, ","element":"span"},{"style":{"height":19.13},"width":651.52,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-3.png","element":"img","alt":" En[wu|f(X)′δ|3] ⩽ En[wu|f(X)′δ|2","inline":true},{"text":"]. Therefore we have","element":"span"}],[{"style":{"width":"91%"},"width":1720,"height":188,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-4.png","element":"img"}],[{"text":"which establishes that ","element":"span"},{"style":{"height":25.64},"width":1436.64,"height":64.09,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-5.png","element":"img","alt":" Fu(δ) := Mu(θu + δ) − Mu(θu) − ∂θMu(θu)′δ + 2∥ ˜ru√wu ∥Pn,2∥√wuf(X)′δ∥Pn,2","inline":true,"padRight":true},{"text":"is larger than ","element":"span"},{"style":{"height":22.22},"width":1586.74,"height":55.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-6.png","element":"img","alt":"13En�wu|f(X)′δ|2�for any δ = t˜δ, t ∈ R, ˜δ ∈ Au and ∥√wuf(X)′δ∥Pn,2 ⩽ ¯qAu. ■","inline":true}],[{"id":"id-212","style":{"fontWeight":"bold"},"text":"Lemma N.3 ","element":"span"},{"text":"(Lemma 1 from ","element":"span"},{"href":"#id-45","referenceIndex":8,"text":"Bach ","element":"a"},{"href":"#id-45","referenceIndex":8,"text":"(2010)","element":"a"},{"text":")","element":"span"},{"style":{"height":16},"width":309.1,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-7.png","element":"img","alt":". Let g : R → R","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a three times differentiable convex function such that for all ","element":"span"},{"style":{"height":17.6},"width":777.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-8.png","element":"img","alt":" t ∈ R, |g′′′(t)| ⩽ Mg′′(t) for some M ⩾ 0","inline":true},{"style":{"fontStyle":"italic"},"text":". Then, for all ","element":"span"},{"style":{"height":14.8},"width":261.7,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-9.png","element":"img","alt":" t ⩾ 0 we have","inline":true}],[{"style":{"width":"82%"},"width":1551,"height":94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-10.png","element":"img"}],[{"id":"id-213","style":{"height":21.29},"width":1188.4,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-11.png","element":"img","alt":"Lemma N.4. For t ⩾ 0 we have exp(−t) + t − 1 ⩾ 12t2 − 16t3.","inline":true}],[{"href":"#id-213","style":{"height":16.4},"width":568.8,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-12.png","element":"img","alt":"Proof of Lemma N.4. For t ⩾","inline":true,"padRight":true},{"text":"0, consider the function ","element":"span"},{"style":{"height":19.13},"width":817.7,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-13.png","element":"img","alt":" f(t) = exp(−t) + t3/6 − t2/2 + t − 1. The","inline":true,"padRight":true},{"text":"statement is equivalent to ","element":"span"},{"style":{"height":17.6},"width":321.97,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-14.png","element":"img","alt":" f(t) ⩾ 0 for t ⩾","inline":true,"padRight":true},{"text":"0. It follows that ","element":"span"},{"style":{"height":17.6},"width":654.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-15.png","element":"img","alt":" f(0) = 0, f′(0) = 0, and f′′(t) =","inline":true,"padRight":true},{"text":"exp(","element":"span"},{"style":{"height":17.6},"width":478.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-16.png","element":"img","alt":"−t) + t − 1 ⩾ 0 so that f","inline":true,"padRight":true},{"text":"is convex. Therefore ","element":"span"},{"style":{"height":17.6},"width":881.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-17.png","element":"img","alt":" f(t) ⩾ f(0) + tf′(0) = 0. ■","inline":true}],[{"id":"id-207","style":{"fontWeight":"bold"},"text":"Lemma N.5. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The logistic link function satisfies ","element":"span"},{"style":{"height":17.6},"width":941.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-18.png","element":"img","alt":" |Λ(t+t0)−Λ(t0)| ⩽ Λ′(t0){exp(|t|)−1}. If |t| ⩽ 1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"we have ","element":"span"},{"text":"exp(","element":"span"},{"style":{"height":17.6},"width":265.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-19.png","element":"img","alt":"|t|) − 1 ⩽ 2|t|.","inline":true}],[{"style":{"height":27.65},"width":1642.29,"height":69.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-20.png","element":"img","alt":"Proof. Note that |Λ′′(s)| ⩽ Λ′(s) for all s ∈ R. So that −1 ⩽ dds log(Λ′(s)) = Λ′′(s)Λ′(s) ⩽","inline":true,"padRight":true},{"text":"1. Suppose ","element":"span"},{"style":{"height":13.6},"width":66.58,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-21.png","element":"img","alt":"s ⩾","inline":true,"padRight":true},{"text":"0. Therefore","element":"span"}],[{"style":{"width":"38%"},"width":724,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-22.png","element":"img"}],[{"text":"In turn this implies Λ","element":"span"},{"style":{"height":17.6},"width":981.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-23.png","element":"img","alt":"′(t0) exp(−s) ⩽ Λ′(s + t0) ⩽ Λ′(t0) exp(s). For t >","inline":true,"padRight":true},{"text":"0, integrating one more time from 0 to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"62%"},"width":1169,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-24.png","element":"img"}],[{"text":"Similarly, for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t < ","element":"span"},{"text":"0, integrating from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"to 0, we have","element":"span"}],[{"style":{"width":"62%"},"width":1169,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-25.png","element":"img"}],[{"text":"The first result follows by noting that 1 ","element":"span"},{"style":{"height":17.6},"width":536.02,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-26.png","element":"img","alt":" − exp(−|t|) ⩽ exp(|t|) − 1.","inline":true,"padRight":true},{"text":"The second follows by verification. ","element":"span"},{"style":{"height":0},"width":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/96-27.png","element":"img","alt":"■","inline":true}]]},{"heading":"Appendix O. Simulation Experiment","paragraphs":[[{"id":"id-154","text":"In this section, we present results from a brief simulation experiment. The results illustrate the ","element":"span"},{"text":"performance of our proposed treatment effect estimator that makes use of estimating equations satisfying the key orthogonality condition given in equation (2) in the main text and variable selection relative to an estimator that uses variable selection but is based on a “naive” estimating equation that does not satisfy the orthogonality condition. We find that inference based on the naive estimator can suffer from substantial size distortions and that the performance of this estimator is strongly dependent on features of the data generating process (DGP). We also find that tests based on the estimator constructed using our procedure have size close to the nominal level uniformly across all DGPs we consider consistent with the theory developed in the paper.","element":"span"}],[{"text":"For simplicity, we consider the case where the treatment, ","element":"span"},{"style":{"height":15.02},"width":34.72,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-0.png","element":"img","alt":" di","inline":true},{"text":", is exogenous conditional on control variables ","element":"span"},{"style":{"height":10.62},"width":36.94,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-1.png","element":"img","alt":" xi","inline":true},{"text":". In this case, we can apply the results of the paper substituting ","element":"span"},{"style":{"height":15.02},"width":143.03,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-2.png","element":"img","alt":" di for zi","inline":true,"padRight":true},{"text":"in each instance where instruments ","element":"span"},{"style":{"height":10.62},"width":32.3,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-3.png","element":"img","alt":" zi","inline":true,"padRight":true},{"text":"are used since ","element":"span"},{"style":{"height":15.02},"width":34.71,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-4.png","element":"img","alt":" di","inline":true,"padRight":true},{"text":"is conditionally exogenous and thus a valid instrument for itself. All of the simulation results are based on data generated as","element":"span"}],[{"style":{"width":"33%"},"width":624,"height":177,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-5.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":711.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-6.png","element":"img","alt":" vi ∼ U(0, 1), ζi ∼ N(0, 1), vi and ζi","inline":true,"padRight":true},{"text":"are independent, ","element":"span"},{"style":{"height":17.6},"width":670.27,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-7.png","element":"img","alt":" p = dim(xi) = 250, the covariates","inline":true},{"style":{"height":21.18},"width":703.5,"height":52.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-8.png","element":"img","alt":"xi ∼ N(0, Σ) with Σkj = (0.5)|j−k|","inline":true},{"text":", and the sample size ","element":"span"},{"style":{"height":16},"width":441.47,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-9.png","element":"img","alt":" n = 200. θ0 is a p ×","inline":true,"padRight":true},{"text":"1 vector with elements set as ","element":"span"},{"style":{"height":19.75},"width":733.27,"height":49.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-10.png","element":"img","alt":" θ0,j = (1/j)2 for j = 1, ..., p. cd and cy","inline":true,"padRight":true},{"text":"are scalars that control the strength of the relationship between the controls, the outcome, and the treatment variable. We use several different combinations of ","element":"span"},{"style":{"height":17.42},"width":435.56,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-11.png","element":"img","alt":" cd and cy, setting cd =","inline":true}],[{"style":{"width":"98%"},"width":1853,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-12.png","element":"img"}],[{"text":"We report results for two different inference procedures in Figure 11. The right panel of the figure shows size of 5% level t-tests for the average treatment effect where the point estimate is formed using our proposed estimator based on model selection and orthogonal estimating equations and the standard error is estimated using a plug-in estimator of the asymptotic variance. The left panel shows size of 5% level t-tests for the average treatment effect estimated as","element":"span"}],[{"style":{"width":"35%"},"width":661,"height":123,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-13.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":18.22},"width":136.71,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-14.png","element":"img","alt":" �gy(d, xi","inline":true},{"text":") is a post-model-selection estimator of E[","element":"span"},{"style":{"height":17.6},"width":328.82,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-15.png","element":"img","alt":"Y |D = d, X = xi","inline":true},{"text":"] and the standard error is estimated using a plug-in estimator of the asymptotic variance of ","element":"span"},{"style":{"height":15.02},"width":117.9,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-16.png","element":"img","alt":"�θnaive.","inline":true}],[{"style":{"width":"100%"},"width":1875,"height":347,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/97-17.png","element":"img"}],[{"text":"set equal to 95%. After running the Square-Root Lasso, we then estimate regression coefficients by regressing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":"onto only those variables that were estimated to have non-zero coefficients by the Square-Root Lasso. We then form estimates of E[","element":"span"},{"style":{"height":17.6},"width":318.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/98-0.png","element":"img","alt":"Y |D = 1, X = xi","inline":true},{"text":"] by plugging in (1","element":"span"},{"style":{"height":18.09},"width":249.9,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/98-1.png","element":"img","alt":", x′i)′ into the","inline":true,"padRight":true},{"text":"estimated model for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ..., n ","element":"span"},{"text":"and form estimates of E[","element":"span"},{"style":{"height":17.6},"width":328.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/98-2.png","element":"img","alt":"Y |D = 0, X = xi","inline":true},{"text":"] by plugging in (0","element":"span"},{"style":{"height":18.09},"width":90.83,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/98-3.png","element":"img","alt":", x′i)′","inline":true,"padRight":true},{"text":"into the estimated model for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ..., n","element":"span"},{"text":".","element":"span"}],[{"text":"For our proposed method, we also need an estimate of the propensity score. We obtain our estimates of the propensity score by using ","element":"span"},{"style":{"height":15.02},"width":71.11,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/98-4.png","element":"img","alt":" ℓ1−","inline":true},{"text":"penalized logistic regression with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"as the outcome and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"as the covariates with penalty level set equal to ","element":"span"},{"style":{"height":19.13},"width":605.9,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/98-5.png","element":"img","alt":" .5√nΦ−1(1−1/2p)/n where Φ(·","inline":true},{"text":") is the standard normal distribution function using the MATLAB function “glmlasso”.","element":"span"},{"style":{"height":8.4},"width":33.93,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/98-6.png","element":"img","alt":"35 ","inline":true,"padRight":true},{"text":"We standardize the variables in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"and set penalty loadings equal to 1. After running the ","element":"span"},{"style":{"height":15.02},"width":71.11,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/98-7.png","element":"img","alt":" ℓ1−","inline":true},{"text":"penalized logistic regression, we estimate the propensity score by taking fitted values from the conventional logistic regression of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"D ","element":"span"},{"text":"onto only those variables that had non-zero estimated coefficients in the ","element":"span"},{"style":{"height":15.02},"width":71.11,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/98-8.png","element":"img","alt":" ℓ1−","inline":true},{"text":"penalized logistic regression.","element":"span"}],[{"text":"$41","element":"span"}]]},{"heading":"References","paragraphs":[[{"text":"Abadie, A. ","element":"span"},{"text":"(2002): “Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the American Statistical Association","element":"span"},{"text":", 97, 284–292.","element":"span"}],[{"text":"——— (2003): “Semiparametric Instrumental Variable Estimation of Treatment Response Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 113, 231–263.","element":"span"}],[{"text":"Ai, C. and X. Chen ","element":"span"},{"text":"(2003): “Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 71, 1795–1843.","element":"span"}],[{"text":"——— (2012): “The semiparametric efficiency bound for models of sequential moment restrictions containing unknown functions,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 170, 442–457.","element":"span"}],[{"text":"Andrews, D. W. ","element":"span"},{"text":"(1994a): “Empirical process methods in econometrics,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Handbook of Econometrics","element":"span"},{"text":", 4, 2247–2294.","element":"span"}],[{"text":"Andrews, D. W. K. ","element":"span"},{"text":"(1994b): “Asymptotics for semiparametric econometric models via stochastic equicontinuity,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 62, 43–72.","element":"span"}],[{"text":"Angrist, J. D. and J.-S. Pischke ","element":"span"},{"text":"(2008): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mostly Harmless Econometrics: An Empiricist’s Companion","element":"span"},{"text":", Princeton University Press.","element":"span"}],[{"text":"Bach, F. ","element":"span"},{"text":"(2010): “Self-concordant analysis for logistic regression,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Electronic Journal of Statistics","element":"span"},{"text":", 4, 384–414.","element":"span"}],[{"text":"Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen ","element":"span"},{"text":"(2012): “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 80, 2369–2429, arxiv, 2010.","element":"span"}],[{"style":{"height":17.6},"width":963.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/99-0.png","element":"img","alt":"Belloni, A. and V. Chernozhukov (2011): “ℓ1","inline":true},{"text":"-Penalized Quantile Regression for High Dimensional Sparse Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annals of Statistics","element":"span"},{"text":", 39, 82–130.","element":"span"}],[{"text":"——— (2013): ","element":"span"},{"text":"“Least Squares After Model Selection in High-dimensional Sparse Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Bernoulli","element":"span"},{"text":", 19, 521–547, arXiv, 2009.","element":"span"}],[{"text":"Belloni, A., V. Chernozhukov, I. Fernandez-Val, and C. Hansen ","element":"span"},{"text":"(2015): “Supplement to “Program Evaluation with High-Dimensional Data”,” Tech. rep., ArXiv.","element":"span"}],[{"text":"Belloni, A., V. Chernozhukov, and C. Hansen ","element":"span"},{"text":"(2010): “LASSO Methods for Gaussian Instrumental Variables Models,” 2010 arXiv:[math.ST], http://arxiv.org/abs/1012.1297.","element":"span"}],[{"text":"——— (2013a): “Inference for High-Dimensional Sparse Econometric Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Economics and Econometrics. 10th World Congress of Econometric Society. August 2010","element":"span"},{"text":", III, 245– 295.","element":"span"}],[{"text":"——— (2014a): “Inference on Treatment Effects After Selection Amongst High-Dimensional Controls,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Review of Economic Studies","element":"span"},{"text":", 81, 608–650.","element":"span"}],[{"text":"Belloni, A., V. Chernozhukov, and K. Kato ","element":"span"},{"text":"(2013b): “Uniform Post Selection Inference for LAD Regression Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1304.0282","element":"span"},{"text":".","element":"span"}],[{"text":"Belloni, A., V. Chernozhukov, and L. Wang ","element":"span"},{"text":"(2011): “Square-Root-LASSO: Pivotal Recovery of Sparse Signals via Conic Programming,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Biometrika","element":"span"},{"text":", 98, 791–806, arxiv, 2010.","element":"span"}],[{"text":"Belloni, A., V. Chernozhukov, L. Wang, et al. ","element":"span"},{"text":"(2014b): “Pivotal estimation via square-root lasso in nonparametric regression,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 42, 757–788.","element":"span"}],[{"text":"Belloni, A., V. Chernozhukov, and Y. Wei ","element":"span"},{"text":"(2013c): “Honest Confidence Regions for Logistic Regression with a Large Number of Controls,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1304.3969","element":"span"},{"text":".","element":"span"}],[{"text":"Benjamin, D. J. ","element":"span"},{"text":"(2003): “Does 401(k) eligibility increase saving? Evidence from propensity score subclassification,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Public Economics","element":"span"},{"text":", 87, 1259–1290.","element":"span"}],[{"text":"Berry, S., J. Levinsohn, and A. Pakes ","element":"span"},{"text":"(1995): “Automobile Prices in Market Equilibrium,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 63, 841–890.","element":"span"}],[{"text":"Bickel, P. J. ","element":"span"},{"text":"(1982): “On adaptive estimation,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 647–671. ","element":"span"},{"text":"Bickel, P. J. and D. A. Freedman ","element":"span"},{"text":"(1981): “Some asymptotic theory for the bootstrap,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 1196–1217.","element":"span"}],[{"text":"Bickel, P. J., Y. Ritov, and A. B. Tsybakov ","element":"span"},{"text":"(2009): “Simultaneous analysis of Lasso and Dantzig selector,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annals of Statistics","element":"span"},{"text":", 37, 1705–1732.","element":"span"}],[{"text":"Cand`es, E. and T. Tao ","element":"span"},{"text":"(2007): “The Dantzig selector: statistical estimation when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"is much larger than ","element":"span"},{"style":{"height":16},"width":318.56,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/100-0.png","element":"img","alt":" n,” Ann. Statist.","inline":true},{"text":", 35, 2313–2351.","element":"span"}],[{"text":"Caner, M. and H. H. Zhang ","element":"span"},{"text":"(2014): “Adaptive elastic net for generalized methods of moments,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Business and Economic Statistics","element":"span"},{"text":", 32, 30–47.","element":"span"}],[{"text":"Cattaneo, M., M. Jansson, and W. Newey ","element":"span"},{"text":"(2010): “Alternative Asymptotics and the Partially Linear Model with Many Regressors,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Working Paper, http://econ-www.mit.edu/files/6204","element":"span"},{"text":".","element":"span"}],[{"text":"Cattaneo, M. D. ","element":"span"},{"text":"(2010): “Efficient semiparametric estimation of multi-valued treatment effects under ignorability,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 155, 138–154.","element":"span"}],[{"text":"Chamberlain, G. ","element":"span"},{"text":"(1992): “Efficiency Bounds for Semiparametric Regression,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 60, 567–596.","element":"span"}],[{"text":"Chamberlain, G. and G. W. Imbens ","element":"span"},{"text":"(2003): “Nonparametric applications of Bayesian inference,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Business & Economic Statistics","element":"span"},{"text":", 21, 12–18.","element":"span"}],[{"text":"Chen, X. ","element":"span"},{"text":"(2007): “Large Sample Sieve Estimatin of Semi-Nonparametric Models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Handbook of Econometrics","element":"span"},{"text":", 6, 5559–5632.","element":"span"}],[{"text":"Chen, X., O. Linton, and I. v. Keilegom ","element":"span"},{"text":"(2003): “Estimation of Semiparametric Models when the Criterion Function Is Not Smooth,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 71, 1591–1608.","element":"span"}],[{"text":"Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and a. W. Newey ","element":"span"},{"text":"(2016): “Double Machine Learning for Treatment and Causal Parameters,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ArXiv e-prints","element":"span"},{"text":".","element":"span"}],[{"text":"Chernozhukov, V., D. Chetverikov, and K. Kato ","element":"span"},{"text":"(2012): “Gaussian approximation of suprema of empirical processes,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ArXiv e-prints","element":"span"},{"text":".","element":"span"}],[{"style":{"height":17.45},"width":892.78,"height":43.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/100-1.png","element":"img","alt":"Chernozhukov, V. and I. Fern´andez-Val","inline":true,"padRight":true},{"text":"(2005): “Subsampling inference on quantile regression processes,” ","element":"span"},{"style":{"height":16.4},"width":159.92,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/100-2.png","element":"img","alt":" Sankhy¯a","inline":true},{"text":", 67, 253–276.","element":"span"}],[{"style":{"height":17.45},"width":1153.48,"height":43.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/100-3.png","element":"img","alt":"Chernozhukov, V., I. Fern´andez-Val, and B. Melly","inline":true,"padRight":true},{"text":"(2013): “Inference on counterfactual distributions,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 81, 2205–2268.","element":"span"}],[{"text":"Chernozhukov, V. and C. Hansen ","element":"span"},{"text":"(2004): “The impact of 401(k) participation on the wealth distribution: An instrumental quantile regression analysis,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Review of Economics and Statistics","element":"span"},{"text":", 86, 735–751.","element":"span"}],[{"style":{"width":"99%"},"width":1870,"height":166,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/100-4.png","element":"img"}],[{"text":"Chernozhukov, V., C. Hansen, and M. Spindler ","element":"span"},{"text":"(2015a): ","element":"span"},{"text":"“Post-Selection and PostRegularization Inference in Linear Models with Very Many Controls and Instruments,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"American Economic Review: Papers and Proceedings","element":"span"},{"text":", 105, 486–490.","element":"span"}],[{"text":"——— (2015b): “Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annual Review of Economics","element":"span"},{"text":", 7, 649–688.","element":"span"}],[{"text":"Chesher, A. ","element":"span"},{"text":"(2003): “Identification in nonseparable models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 71, 1405–1441. ","element":"span"},{"text":"Dudley, R. M. ","element":"span"},{"text":"(1999): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Uniform central limit theorems","element":"span"},{"text":", vol. 63 of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Cambridge Studies in Advanced Mathematics","element":"span"},{"text":", Cambridge: Cambridge University Press.","element":"span"}],[{"text":"Engen, E. M. and W. G. Gale ","element":"span"},{"text":"(2000): “The Effects of 401(k) Plans on Household Wealth: Differences Across Earnings Groups,” Working Paper 8032, National Bureau of Economic Research.","element":"span"}],[{"text":"Engen, E. M., W. G. Gale, and J. K. Scholz ","element":"span"},{"text":"(1996): “The Illusory Effects of Saving Incentives on Saving,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Economic Perspectives","element":"span"},{"text":", 10, 113–138.","element":"span"}],[{"text":"Escanciano, J. C. and L. Zhu ","element":"span"},{"text":"(2013): “Set inferences and sensitivity analysis in semiparametric conditionally identified models,” CeMMAP working papers CWP55/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.","element":"span"}],[{"text":"Fan, J. and R. Li ","element":"span"},{"text":"(2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of American Statistical Association","element":"span"},{"text":", 96, 1348–1360.","element":"span"}],[{"text":"Farrell, M. ","element":"span"},{"text":"(2015): “Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 174, 1–23.","element":"span"}],[{"text":"Frank, I. E. and J. H. Friedman ","element":"span"},{"text":"(1993): “A Statistical View of Some Chemometrics Regression Tools,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Technometrics","element":"span"},{"text":", 35, 109–135.","element":"span"}],[{"style":{"height":16.65},"width":601.06,"height":41.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/101-0.png","element":"img","alt":"Fr¨olich, M. and B. Melly","inline":true,"padRight":true},{"text":"(2013): “Identification of treatment effects on the treated with one-sided non-compliance,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometric Reviews","element":"span"},{"text":", 32, 384–414.","element":"span"}],[{"text":"Ghosal, S., A. Sen, and A. W. van der Vaart ","element":"span"},{"text":"(2000): “Testing Monotonicity of Regression,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ann. Statist.","element":"span"},{"text":", 28, 1054–1082.","element":"span"}],[{"style":{"height":17.45},"width":448.78,"height":43.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/101-1.png","element":"img","alt":"Gin´e, E. and J. Zinn","inline":true,"padRight":true},{"text":"(1984): “Some limit theorems for empirical processes,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ann. Probab.","element":"span"},{"text":", 12, 929–998, with discussion.","element":"span"}],[{"text":"Hahn, J. ","element":"span"},{"text":"(1997): “Bayesian bootstrap of the quantile regression estimator: a large sample study,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Internat. Econom. Rev.","element":"span"},{"text":", 38, 795–808.","element":"span"}],[{"text":"——— (1998): “On the role of the propensity score in efficient semiparametric estimation of average treatment effects,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 315–331.","element":"span"}],[{"text":"Hansen, B. E. ","element":"span"},{"text":"(1996): “Inference when a nuisance parameter is not identified under the null hypothesis,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 64, 413–430.","element":"span"}],[{"text":"Hansen, L. P. ","element":"span"},{"text":"(1982): “Large sample properties of generalized method of moments estimators,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 50, 1029–1054.","element":"span"}],[{"text":"Hansen, L. P. and K. J. Singleton ","element":"span"},{"text":"(1982): “Generalized instrumental variables estimation of nonlinear rational expectations models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 50, 1269–1286.","element":"span"}],[{"text":"Heckman, J. and E. J. Vytlacil ","element":"span"},{"text":"(1999): “Local instrumental variables and latent variable models for identifying and bounding treatment effects,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proc. Natl. Acad. Sci. USA","element":"span"},{"text":", 96, 4730– 4734 (electronic).","element":"span"}],[{"text":"Heckman, J. J. and E. Vytlacil ","element":"span"},{"text":"(2005): “Structural equations, treatment effects, and econometric policy evaluation,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 73, 669–738.","element":"span"}],[{"text":"Hong, H. and D. Nekipelov ","element":"span"},{"text":"(2010): “Semiparametric efficiency in nonlinear LATE models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Quantitative Economics","element":"span"},{"text":", 1, 279–304.","element":"span"}],[{"text":"Hong, H. and O. Scaillet ","element":"span"},{"text":"(2006): “A fast subsampling method for nonlinear dynamic models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J. Econometrics","element":"span"},{"text":", 133, 557–578.","element":"span"}],[{"text":"Huang, J., J. L. Horowitz, and S. Ma ","element":"span"},{"text":"(2008): “Asymptotic properties of bridge estimators in sparse high-dimensional regression models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 36, 587613.","element":"span"}],[{"text":"Huang, J., J. L. Horowitz, and F. Wei ","element":"span"},{"text":"(2010): “Variable selection in nonparametric additive models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ann. Statist.","element":"span"},{"text":", 38, 2282–2313.","element":"span"}],[{"text":"Imbens, G. W. and J. D. Angrist ","element":"span"},{"text":"(1994): “Identification and Estimation of Local Average Treatment Effects,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 62, 467–475.","element":"span"}],[{"text":"Imbens, G. W. and W. K. Newey ","element":"span"},{"text":"(2009): “Identification and estimation of triangular simultaneous equations models without additivity,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 77, 1481–1512.","element":"span"}],[{"text":"Imbens, G. W. and D. B. Rubin ","element":"span"},{"text":"(2015): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction","element":"span"},{"text":", Cambridge University Press.","element":"span"}],[{"text":"Jing, B.-Y., Q.-M. Shao, and Q. Wang ","element":"span"},{"text":"(2003): “Self-normalized Cramr-type large deviations for independent random variables,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ann. Probab.","element":"span"},{"text":", 31, 2167–2215.","element":"span"}],[{"text":"Kato, K. ","element":"span"},{"text":"(2011): “Group Lasso for high dimensional sparse quantile regression models,” Preprint, ArXiv.","element":"span"}],[{"text":"Kline, P. and A. Santos ","element":"span"},{"text":"(2012): “A Score Based Approach to Wild Bootstrap Inference,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometric Methods","element":"span"},{"text":", 1, 23–41.","element":"span"}],[{"text":"Koenker, R. ","element":"span"},{"text":"(1988): “Asymptotic Theory and Econometric Practice,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Aplpied Econometrics","element":"span"},{"text":", 3, 139–147.","element":"span"}],[{"text":"——— (2005): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Quantile regression","element":"span"},{"text":", Cambridge university press. ","element":"span"},{"text":"Kosorok, M. R. ","element":"span"},{"text":"(2008): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Introduction to Empirical Processes and Semiparametric Inference","element":"span"},{"text":", Series in Statistics, Berlin: Springer.","element":"span"}],[{"style":{"height":16.66},"width":663.04,"height":41.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/102-0.png","element":"img","alt":"Leeb, H. and B. M. P¨otscher","inline":true,"padRight":true},{"text":"(2008a): “Can one estimate the unconditional distribution of post-model-selection estimators?” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometric Theory","element":"span"},{"text":", 24, 338–376.","element":"span"}],[{"text":"——— (2008b): “Recent developments in model selection and related areas,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometric Theory","element":"span"},{"text":", 24, 319–322.","element":"span"}],[{"text":"Linton, O. ","element":"span"},{"text":"(1996): “Edgeworth approximation for MINPIN estimators in semiparametric regression models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometric Theory","element":"span"},{"text":", 12, 30–60.","element":"span"}],[{"text":"Mammen, E. ","element":"span"},{"text":"(1993): “Bootstrap and wild bootstrap for high dimensional linear models,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 255–285.","element":"span"}],[{"text":"Meinshausen, N. and B. Yu ","element":"span"},{"text":"(2009): “Lasso-type recovery of sparse representations for high-dimensional data,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annals of Statistics","element":"span"},{"text":", 37, 2246–2270.","element":"span"}],[{"text":"Newey, W. K. ","element":"span"},{"text":"(1990): “Semiparametric efficiency bounds,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Applied Econometrics","element":"span"},{"text":", 5, 99–135.","element":"span"}],[{"text":"——— (1994): “The asymptotic variance of semiparametric estimators,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 62, 1349– 1382.","element":"span"}],[{"text":"——— (1997): “Convergence Rates and Asymptotic Normality for Series Estimators,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Econometrics","element":"span"},{"text":", 79, 147–168.","element":"span"}],[{"style":{"height":17.6},"width":510.19,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/102-1.png","element":"img","alt":"Neyman, J. (1979): “C(α","inline":true},{"text":") tests and their use,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Sankhya","element":"span"},{"text":", 41, 1–21.","element":"span"}],[{"text":"Ogburn, E. L., A. Rotnitzky, and J. M. Robins ","element":"span"},{"text":"(2015): “Doubly robust estimation of the local average treatment effect curve,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the Royal Statistical Society: Series B","element":"span"},{"text":", 77, 373–396.","element":"span"}],[{"text":"Poterba, J. M., S. F. Venti, and D. A. Wise ","element":"span"},{"text":"(1994): “401(k) Plans and Tax-Deferred savings,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Studies in the Economics of Aging","element":"span"},{"text":", ed. by D. A. Wise, Chicago, IL: University of Chicago Press.","element":"span"}],[{"text":"——— (1995): “Do 401(k) Contributions Crowd Out Other Personal Saving?” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Public Economics","element":"span"},{"text":", 58, 1–32.","element":"span"}],[{"text":"——— (1996): “Personal Retirement Saving Programs and Asset Accumulation: Reconciling the Evidence,” Working Paper 5599, National Bureau of Economic Research.","element":"span"}],[{"text":"——— (2001): “The Transition to Personal Accounts and Increasing Retirement Wealth: Macro and Micro Evidence,” Working Paper 8610, National Bureau of Economic Research.","element":"span"}],[{"style":{"height":16.65},"width":295.66,"height":41.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/103-0.png","element":"img","alt":"P¨otscher, B.","inline":true,"padRight":true},{"text":"(2009): “Confidence Sets Based on Sparse Estimators Are Necessarily Large,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Sankhya","element":"span"},{"text":", 71-A, 1–18.","element":"span"}],[{"text":"Robins, J. M. and A. Rotnitzky ","element":"span"},{"text":"(1995): “Semiparametric efficiency in multivariate regression models with missing data,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J. Amer. Statist. Assoc.","element":"span"},{"text":", 90, 122–129.","element":"span"}],[{"text":"Robinson, P. M. ","element":"span"},{"text":"(1988): “Root-","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"text":"-consistent semiparametric regression,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 56, 931– 954.","element":"span"}],[{"text":"Romano, J. P. and A. M. Shaikh ","element":"span"},{"text":"(2012): “On the uniform asymptotic validity of subsampling and the bootstrap,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 40, 2798–2822.","element":"span"}],[{"text":"Rothe, C. and S. Firpo ","element":"span"},{"text":"(2013): “Semiparametric Estimation and Inference Using Doubly Robust Moment Conditions,” Tech. rep., NYU preprint.","element":"span"}],[{"text":"Sherman, R. ","element":"span"},{"text":"(1994): “Maximal inequalities for degenerate ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"text":"-processes with applications to optimization estimators,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ann. Statist.","element":"span"},{"text":", 22, 439–459.","element":"span"}],[{"text":"Spindler, M., V. Chernozhukov, and C. Hansen ","element":"span"},{"text":"(2016): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"hdm: High-Dimensional Metrics","element":"span"},{"text":", R package version 0.1.0, http://CRAN.R-project.org/package=hdm.","element":"span"}],[{"text":"Tibshirani, R. ","element":"span"},{"text":"(1996): “Regression shrinkage and selection via the Lasso,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J. Roy. Statist. Soc. Ser. B","element":"span"},{"text":", 58, 267–288.","element":"span"}],[{"text":"Tsybakov, A. B. ","element":"span"},{"text":"(2009): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Introduction to nonparametric estimation","element":"span"},{"text":", Springer. ","element":"span"},{"text":"van de Geer, S. A. ","element":"span"},{"text":"(2008): “High-dimensional generalized linear models and the lasso,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Annals of Statistics","element":"span"},{"text":", 36, 614–645.","element":"span"}],[{"text":"van der Vaart, A. W. ","element":"span"},{"text":"(1991): “On differentiable functionals,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Annals of Statistics","element":"span"},{"text":", 178–204. ——— (1998): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Asymptotic Statistics","element":"span"},{"text":", Cambridge University Press.","element":"span"}],[{"text":"van der Vaart, A. W. and J. A. Wellner ","element":"span"},{"text":"(1996): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Weak Convergence and Empirical Processes","element":"span"},{"text":", Springer Series in Statistics.","element":"span"}],[{"text":"Vytlacil, E. J. ","element":"span"},{"text":"(2002): “Independence, Monotonicity, and Latent Index Models: An Equivalence Result,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometrica","element":"span"},{"text":", 70, 331–341.","element":"span"}],[{"text":"Wasserman, L. ","element":"span"},{"text":"(2006): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"All of nonparametric statistics","element":"span"},{"text":", Springer New York. ","element":"span"},{"text":"Wooldridge, J. M. ","element":"span"},{"text":"(2010): ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Econometric Analysis of Cross Section and Panel Data","element":"span"},{"text":", Cambridge, Massachusetts: The MIT Press, second ed.","element":"span"}],[{"text":"Zou, H. ","element":"span"},{"text":"(2006): “The Adaptive Lasso And Its Oracle Properties,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the American Statistical Association","element":"span"},{"text":", 101, 1418–1429.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Table 1: Estimates and standard errors of average effects","element":"figcaption","subtype":"caption"}],[{"style":{"width":"98%"},"width":1846,"height":1030,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/104-0.png","element":"img"}],[{"style":{"width":"91%"},"width":1719,"height":2295,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/105-0.png","element":"img"}],[{"text":"Figure 1. ","element":"figcaption","subtype":"caption"},{"text":"QTE and QTE-T estimates of the effect of 401(k) eligibility on net financial assets.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"91%"},"width":1719,"height":2295,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/106-0.png","element":"img"}],[{"text":"Figure 2. ","element":"figcaption","subtype":"caption"},{"text":"LQTE and LQTE-T estimates of the effect of 401(k) participation on net financial assets.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"97%"},"width":1824,"height":1323,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/107-0.png","element":"img"}],[{"style":{"width":"97%"},"width":1824,"height":1323,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/108-0.png","element":"img"}],[{"style":{"width":"92%"},"width":1723,"height":2295,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/109-0.png","element":"img"}],[{"text":"Figure 3. ","element":"figcaption","subtype":"caption"},{"text":"QTE and QTE-T estimates based on the Indicators specification.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"92%"},"width":1723,"height":2295,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/110-0.png","element":"img"}],[{"text":"Figure 4. ","element":"figcaption","subtype":"caption"},{"text":"QTE and QTE-T estimates based on the Quadratic Spline specification.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"92%"},"width":1723,"height":2295,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/111-0.png","element":"img"}],[{"text":"Figure 5. ","element":"figcaption","subtype":"caption"},{"text":"QTE and QTE-T estimates based on the Quadratic Spline Plus Interaction specification.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"92%"},"width":1723,"height":2295,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/112-0.png","element":"img"}],[{"text":"Figure 6. ","element":"figcaption","subtype":"caption"},{"text":"QTE and QTE-T estimates based on the Quadratic Spline Plus Many Interaction specification.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"92%"},"width":1723,"height":2295,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/113-0.png","element":"img"}],[{"text":"Figure 7. ","element":"figcaption","subtype":"caption"},{"text":"LQTE and LQTE-T estimates based on the Indicators specification.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"92%"},"width":1723,"height":2295,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/114-0.png","element":"img"}],[{"text":"Figure 8. ","element":"figcaption","subtype":"caption"},{"text":"LQTE and LQTE-T estimates based on the Quadratic Spline specification.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"92%"},"width":1723,"height":2295,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/115-0.png","element":"img"}],[{"text":"Figure 9. ","element":"figcaption","subtype":"caption"},{"text":"LQTE and LQTE-T estimates based on the Quadratic Spline Plus Interaction specification.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"92%"},"width":1723,"height":2295,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/116-0.png","element":"img"}],[{"text":"Figure 10. ","element":"figcaption","subtype":"caption"},{"text":"LQTE and LQTE-T estimates based on the Quadratic Spline Plus Many Interaction specification.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"90%"},"width":1694,"height":668,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1311.2645/images/117-0.png","element":"img"}],[{"text":"Figure 11. ","element":"figcaption","subtype":"caption"},{"text":"Rejection frequencies of 5% level tests for average treatment effect estimators following model selection. The left panel shows size of a test based on a “naive” estimator (Naive rp(0.05)), and the right panel shows size of a test based on our proposed procedure (Proposed rp(0.05)).","element":"figcaption","subtype":"caption"}]]}],"_version":"3.3.4"},"paperNode":"$28:props:children:props:children:0:props:product"}]]