35:[["$","audio",null,{"id":"tts"}],["$","$L3a",null,{"paperID":"1605.02541","publisher":"arxiv","paperJSON":{"title":"Mean Absolute Percentage Error for regression models","paperID":"1605.02541","avgLineHeight":17.98,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"We study in this paper the consequences of using the Mean Absolute Percentage Error (MAPE) as a measure of quality for regression models. We prove the existence of an optimal MAPE model and we show the universal consistency of Empirical Risk Minimization based on the MAPE. We also show that finding the best model under the MAPE is equivalent to doing weighted Mean Absolute Error (MAE) regression, and we apply this weighting strategy to kernel regression. The behavior of the MAPE kernel regression is illustrated on simulated data.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Keywords: ","element":"span"},{"text":"Mean Absolute Percentage Error; Empirical Risk Minimization;","element":"span"}],[{"text":"Consistency; Optimization; Kernel Regression.","element":"span"}]]},{"heading":"1. Introduction","paragraphs":[[{"text":"Classical regression models are obtained by choosing a model that minimizes an empirical estimation of the Mean Square Error (MSE). Other quality measures are used, in general for robustness reasons. This is the case of the Huber loss","element":"span"}],[{"style":{"width":"99%"},"width":1374,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/0-0.png","element":"img"}],[{"text":"for instance. Another example of regression quality measure is given by the Mean Absolute Percentage Error (MAPE). If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"denotes the vector of explanatory variables (the input to the regression model), ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"denotes the target variable and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is a regression model, the MAPE of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is obtained by averaging the ratio ","element":"span"},{"style":{"height":24.43},"width":120.06,"height":61.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/1-0.png","element":"img","alt":"|g(x)−y||y|","inline":true}],[{"style":{"width":"12%"},"width":300,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/1-1.png","element":"img"}],[{"text":"The MAPE is often used in practice because of its very intuitive interpretation in terms of relative error. The use of the MAPE is relevant in finance, for instance, as gains and losses are often measured in relative values. It is also useful to calibrate prices of products, since customers are sometimes more sensitive to","element":"span"}],[{"style":{"width":"35%"},"width":862,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/1-2.png","element":"img"}],[{"text":"In real world applications, the MAPE is frequently used when the quantity to predict is known to remain way above zero. It was used for instance as the quality measure in a electricity consumption forecasting contest organized by GdF ecometering on datascience.net","element":"span"},{"href":"#id-0","style":{"height":7.6},"width":16,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/1-3.png","element":"img","alt":"1","inline":true},{"text":". More generally, it has been argued that","element":"span"}],[{"style":{"width":"58%"},"width":1435,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/1-4.png","element":"img"}],[{"text":"where enough data are available, see e.g. ","element":"span"},{"href":"#id-1","text":"[2]","element":"a"},{"text":".","element":"span"}],[{"text":"We study in this paper the consequences of using the MAPE as the quality measure for regression models. Section ","element":"span"},{"text":"2 ","element":"span"},{"text":"introduces our notations and the general context. It recalls the definition of the MAPE. Section ","element":"span"},{"text":"3 ","element":"span"},{"text":"is dedicated to a first","element":"span"}],[{"style":{"width":"58%"},"width":1435,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/1-5.png","element":"img"}],[{"text":"optimal regression model with respect to the MSE is given by the regression function (i.e., the conditional expectation of the target variable knowing the explanatory variables). Section ","element":"span"},{"text":"3 ","element":"span"},{"text":"shows that an optimal model can also be defined for the MAPE. Section ","element":"span"},{"text":"4 ","element":"span"},{"text":"studies the consequences of replacing MSE/MAE by the","element":"span"}],[{"style":{"width":"58%"},"width":1435,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/1-6.png","element":"img"}],[{"text":"dimension. We show in particular that MAE based measures can be used to upper bound MAPE ones. Section ","element":"span"},{"text":"5 ","element":"span"},{"text":"proves a universal consistency result for Empirical Risk Minimization applied to the MAPE, using results from Section ","element":"span"},{"text":"4. ","element":"span"},{"text":"Finally, Section ","element":"span"},{"text":"6 ","element":"span"},{"text":"shows how to perform MAPE regression in practice. It adapts","element":"span"}],[{"id":"id-0","style":{"width":"58%"},"width":1435,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/1-7.png","element":"img"}],[{"text":"obtained model on simulated data.","element":"span"}]]},{"heading":"2. General setting and notations","paragraphs":[[{"text":"We use in this paper a standard regression setting in which the data are fully described by a random pair ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":"= (","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, Y ","element":"span"},{"text":") with values in ","element":"span"},{"style":{"height":13.39},"width":127.23,"height":33.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-0.png","element":"img","alt":" Rd × R","inline":true},{"text":". We are interested in finding a good model for the pair, that is a (measurable) function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"from ","element":"span"},{"style":{"height":17.38},"width":396.3,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-1.png","element":"img","alt":" Rd to R such that g(X","inline":true},{"text":") is “close to” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":". In the classical regression setting, the closeness of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":") to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":"is measured via the ","element":"span"},{"style":{"height":13.19},"width":43.12,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-2.png","element":"img","alt":" L2","inline":true,"padRight":true},{"text":"risk, also called the mean","element":"span"}],[{"text":"squared error (MSE), defined by","element":"span"}],[{"style":{"width":"72%"},"width":992,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-3.png","element":"img"}],[{"text":"In this definition, the expectation is computed by respect to the random pair (","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, Y ","element":"span"},{"text":") and might be denoted ","element":"span"},{"style":{"height":18.17},"width":312.88,"height":45.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-4.png","element":"img","alt":" EX,Y (g(X) − Y )2","inline":true,"padRight":true},{"text":"to make this point explicit.","element":"span"}],[{"style":{"width":"58%"},"width":1435,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-5.png","element":"img"}],[{"text":"settings.","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"denote the regression function of the problem, that is the function","element":"span"}],[{"text":"from ","element":"span"},{"style":{"height":16.58},"width":297.54,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-6.png","element":"img","alt":" Rd to R given by","inline":true}],[{"style":{"width":"62%"},"width":866,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-7.png","element":"img"}],[{"text":"It is well known (see e.g. [","element":"span"},{"href":"#id-2","text":"3","element":"a"},{"text":"]) that the regression function is the best model in the case of the mean squared error in the sense that ","element":"span"},{"style":{"height":15.6},"width":95.18,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-8.png","element":"img","alt":" L2(m","inline":true},{"text":") minimizes ","element":"span"},{"style":{"height":15.6},"width":178.14,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-9.png","element":"img","alt":" L2(g) over","inline":true,"padRight":true},{"text":"the set of all measurable functions from ","element":"span"},{"style":{"height":13.38},"width":149.14,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-10.png","element":"img","alt":" Rd to R.","inline":true}],[{"text":"More generally, the quality of a model is measured via a ","element":"span"},{"style":{"fontWeight":"bold"},"text":"loss function","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l","element":"span"},{"text":", from ","element":"span"},{"style":{"height":13.38},"width":160.58,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-11.png","element":"img","alt":" R2 to R+","inline":true},{"text":". The point-wise loss of the model ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":", Y ","element":"span"},{"text":") and the ","element":"span"},{"style":{"fontWeight":"bold"},"text":"risk ","element":"span"},{"text":"of","element":"span"}],[{"text":"the model is","element":"span"}],[{"style":{"width":"58%"},"width":1435,"height":135,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-12.png","element":"img"}],[{"text":"leads to the ","element":"span"},{"style":{"height":13.19},"width":105.6,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-13.png","element":"img","alt":" LMSE","inline":true,"padRight":true},{"text":"risk defined above as ","element":"span"},{"style":{"height":16},"width":330.26,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-14.png","element":"img","alt":" Ll2(g) = LMSE(g).","inline":true}],[{"text":"The ","element":"span"},{"style":{"fontWeight":"bold"},"text":"optimal risk ","element":"span"},{"text":"is the infimum of ","element":"span"},{"style":{"height":13.19},"width":37.12,"height":32.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-15.png","element":"img","alt":" Ll","inline":true,"padRight":true},{"text":"over measurable functions, that is","element":"span"}],[{"style":{"width":"63%"},"width":870,"height":66,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/2-16.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.39},"width":157.74,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-0.png","element":"img","alt":" M(Rd, R","inline":true},{"text":") denotes the set of measurable functions from ","element":"span"},{"style":{"height":13.39},"width":45.78,"height":33.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-1.png","element":"img","alt":" Rd","inline":true,"padRight":true},{"text":"to ","element":"span"},{"text":"R","element":"span"},{"text":". As","element":"span"}],[{"text":"recalled above we have","element":"span"}],[{"style":{"width":"82%"},"width":1142,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-2.png","element":"img"}],[{"text":"As explained in the introduction, there are practical situations in which the ","element":"span"},{"style":{"height":13.19},"width":43.12,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-3.png","element":"img","alt":"L2","inline":true,"padRight":true},{"text":"risk is not a good way of measuring the closeness of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":") to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":". We focus in this paper on the case of the mean absolute percentage error (MAPE) as an alternative to the MSE. Let us recall that the loss function associated to the","element":"span"}],[{"text":"MAPE is given by","element":"span"}],[{"style":{"width":"64%"},"width":881,"height":92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-4.png","element":"img"}],[{"text":"with the conventions that for all ","element":"span"},{"style":{"height":17.1},"width":265.88,"height":42.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-5.png","element":"img","alt":" a ̸= 0, a0 = ∞","inline":true,"padRight":true},{"text":"and that ","element":"span"},{"style":{"height":19.37},"width":309.73,"height":48.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-6.png","element":"img","alt":" 00 = 1. Then the","inline":true}],[{"text":"MAPE-risk of model ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"is","element":"span"}],[{"style":{"width":"78%"},"width":1076,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-7.png","element":"img"}],[{"text":"Notice that according to Fubini’s theorem, ","element":"span"},{"style":{"height":15.6},"width":279.91,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-8.png","element":"img","alt":" LMAP E(g) < ∞","inline":true,"padRight":true},{"text":"implies in particular that ","element":"span"},{"style":{"height":16},"width":260.42,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-9.png","element":"img","alt":" E(|g(X)|) < ∞","inline":true,"padRight":true},{"text":"and thus that interesting models belong to ","element":"span"},{"style":{"height":17.39},"width":257.24,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-10.png","element":"img","alt":" L1(PX), where","inline":true},{"style":{"height":13.19},"width":51.36,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-11.png","element":"img","alt":"PX","inline":true,"padRight":true},{"text":"is the probability measure on ","element":"span"},{"style":{"height":16.58},"width":311.47,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-12.png","element":"img","alt":" Rd induced by X.","inline":true}],[{"text":"We will also use in this paper the mean absolute error (MAE). It is based on the absolute error loss, ","element":"span"},{"style":{"height":16},"width":912.54,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-13.png","element":"img","alt":" lMAE = l1 defined by lMAE(p, y) = |p − y|. As other","inline":true}],[{"text":"risks, the MAE-risk is given by","element":"span"}],[{"style":{"width":"74%"},"width":1028,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-14.png","element":"img"}]]},{"heading":"3. Existence of the MAPE-regression function50","paragraphs":[[{"text":"A natural theoretical question associated to the MAPE is whether an optimal model exists. More precisely, is there a function ","element":"span"},{"style":{"height":9.19},"width":140.89,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-15.png","element":"img","alt":" mMAP E","inline":true,"padRight":true},{"text":"such that for all models","element":"span"}],[{"style":{"width":"44%"},"width":613,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-16.png","element":"img"}],[{"text":"Obviously, we have","element":"span"}],[{"style":{"width":"55%"},"width":761,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/3-17.png","element":"img"}],[{"text":"A natural strategy to study the existence of ","element":"span"},{"style":{"height":9.19},"width":140.89,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/4-0.png","element":"img","alt":" mMAP E","inline":true,"padRight":true},{"text":"is therefore to consider a point-wise approximation, i.e. to minimize the conditional expectation introduced above for each value of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":". In other words, we want to solve, if possible, the","element":"span"}],[{"text":"optimization problem","element":"span"}],[{"id":"id-5","style":{"width":"77%"},"width":1071,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/4-1.png","element":"img"}],[{"text":"for all values of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":".","element":"span"}],[{"style":{"width":"58%"},"width":1435,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/4-2.png","element":"img"}],[{"text":"introduce necessary and sufficient conditions for the problem to involve finite values, then we show that under those conditions, it has at least one global solution for each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"and finally we introduce a simple rule to select one of the solutions.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"3.1. Finite values for the point-wise problem","element":"span"},{"text":"60","element":"span"}],[{"text":"To simplify the analysis, let us introduce a real valued random variable ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"}],[{"text":"and study the optimization problem","element":"span"}],[{"style":{"width":"61%"},"width":849,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/4-3.png","element":"img"}],[{"text":"Depending on the distribution of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"and of the value of ","element":"span"},{"style":{"height":28.8},"width":426.86,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/4-4.png","element":"img","alt":" m, J(m) = E�|m−T ||T | �is","inline":true,"padRight":true},{"text":"not always a finite value, excepted for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"= 0. In this latter case, for any random variable ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J","element":"span"},{"text":"(0) = 1 using the above convention.","element":"span"}],[{"text":"Let us consider an example demonstrating problems that might arise for ","element":"span"},{"style":{"height":15.2},"width":245.02,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/4-5.png","element":"img","alt":"m ̸= 0. Let T","inline":true,"padRight":true},{"text":"be distributed according to the uniform distribution on [","element":"span"},{"style":{"height":14},"width":62.32,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/4-6.png","element":"img","alt":"−1,","inline":true,"padRight":true},{"text":"1].","element":"span"}],[{"text":"Then","element":"span"}],[{"style":{"width":"30%"},"width":424,"height":92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/4-7.png","element":"img"}],[{"text":"If ","element":"span"},{"style":{"height":16},"width":114.62,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/4-8.png","element":"img","alt":" m ∈]0,","inline":true,"padRight":true},{"text":"1], we have","element":"span"}],[{"style":{"width":"85%"},"width":1177,"height":351,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/4-9.png","element":"img"}],[{"text":"This example shows that when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is likely to take values close to 0, then ","element":"span"},{"style":{"height":15.6},"width":183.8,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-0.png","element":"img","alt":" J(m) = ∞","inline":true}],[{"style":{"width":"58%"},"width":1435,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-1.png","element":"img"}],[{"text":"when ","element":"span"},{"style":{"height":22.17},"width":41.57,"height":55.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-2.png","element":"img","alt":"1|T | ","inline":true,"padRight":true},{"text":"as a finite expectation, that is when the probability that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"is smaller ","element":"span"},{"text":"than ","element":"span"},{"style":{"height":0},"width":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-3.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"decreases sufficiently quickly when ","element":"span"},{"style":{"height":0},"width":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-4.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"goes to zero.","element":"span"}],[{"text":"More formally, we have the following proposition.","element":"span"}],[{"id":"id-3","style":{"height":16},"width":658.18,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-5.png","element":"img","alt":"Proposition 1. J(m) < ∞ for all m","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"if and only if","element":"span"}],[{"style":{"width":"58%"},"width":1433,"height":319,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-6.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"If any of those conditions is not fulfilled, then ","element":"span"},{"style":{"height":16},"width":439.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-7.png","element":"img","alt":" J(m) = ∞ for all m ̸= 0.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"We have","element":"span"}],[{"style":{"width":"87%"},"width":1207,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-8.png","element":"img"}],[{"text":"If ","element":"span"},{"text":"P","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"= 0) ","element":"span"},{"style":{"fontStyle":"italic"},"text":"> ","element":"span"},{"text":"0 then for all ","element":"span"},{"style":{"height":16},"width":325.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-9.png","element":"img","alt":" m ̸= 0, J(m) = ∞","inline":true},{"text":". Let us therefore consider the case ","element":"span"},{"text":"P","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"= 0) = 0. We assume ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m > ","element":"span"},{"text":"0, the case ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m < ","element":"span"},{"text":"0 is completely identical.","element":"span"}],[{"text":"We have","element":"span"}],[{"style":{"width":"101%"},"width":1393,"height":208,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-10.png","element":"img"}],[{"text":"A simple upper bounding gives","element":"span"}],[{"style":{"width":"38%"},"width":536,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-11.png","element":"img"}],[{"text":"and symmetrically","element":"span"}],[{"style":{"width":"45%"},"width":623,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-12.png","element":"img"}],[{"text":"This shows that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":") is the sum of finite terms and of ","element":"span"},{"style":{"height":28.8},"width":412.68,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-13.png","element":"img","alt":" mE� IT ∈]0,m]−IT ∈[−m,0[T �.","inline":true,"padRight":true},{"text":"Because of the symmetry of the problem, we can focus on ","element":"span"},{"style":{"height":28.8},"width":205.49,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-14.png","element":"img","alt":" E� IT ∈]0,m]T �","inline":true},{"text":". It is also obvious that ","element":"span"},{"style":{"height":28.8},"width":205.49,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-15.png","element":"img","alt":" E� IT ∈]0,m]T �","inline":true},{"text":"is finite if and only if ","element":"span"},{"style":{"height":28.8},"width":347.3,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/5-16.png","element":"img","alt":" E� IT ∈]0,1]T �is finite.","inline":true}],[{"style":{"width":"58%"},"width":1437,"height":128,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-0.png","element":"img"}],[{"text":"more operational conditions in the rest of the proof.","element":"span"}],[{"text":"Let us therefore introduce the following functions:","element":"span"}],[{"style":{"width":"84%"},"width":1164,"height":434,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-1.png","element":"img"}],[{"text":"We have obviously for all ","element":"span"},{"style":{"height":19.37},"width":484.86,"height":48.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-2.png","element":"img","alt":" x ∈]0, 1], g−(x) ≤ 1x ≤ g+(x","inline":true},{"text":"). In addition","element":"span"}],[{"style":{"width":"93%"},"width":1291,"height":245,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-3.png","element":"img"}],[{"text":"According to the monotone convergence theorem,","element":"span"}],[{"style":{"width":"99%"},"width":1376,"height":143,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-4.png","element":"img"}],[{"style":{"height":16},"width":133.9,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-5.png","element":"img","alt":"E(g−(T","inline":true},{"text":")) are finite, or both are infinite. In addition, we have","element":"span"}],[{"style":{"width":"48%"},"width":667,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-6.png","element":"img"}],[{"text":"therefore ","element":"span"},{"style":{"height":28.8},"width":194.58,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-7.png","element":"img","alt":" E� IT ∈]0,1]T �","inline":true},{"text":"is finite if and only if ","element":"span"},{"style":{"height":15.6},"width":133.28,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-8.png","element":"img","alt":" E(g−(T","inline":true},{"text":")) is finite. So a sufficient and necessary condition for ","element":"span"},{"style":{"height":28.8},"width":194.58,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-9.png","element":"img","alt":" E� IT ∈]0,1]T �","inline":true},{"text":"to be finite is","element":"span"}],[{"style":{"width":"39%"},"width":544,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-10.png","element":"img"}],[{"text":"A symmetric derivation shows that ","element":"span"},{"style":{"height":28.8},"width":247.16,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-11.png","element":"img","alt":" E�−IT ∈]−1,0]T �","inline":true},{"text":"is finite if and only if","element":"span"}],[{"style":{"width":"71%"},"width":989,"height":172,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/6-12.png","element":"img"}],[{"text":"The conditions of Proposition ","element":"span"},{"href":"#id-3","text":"1 ","element":"a"},{"text":"can be used to characterize whether ","element":"span"},{"style":{"height":16},"width":106.79,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-0.png","element":"img","alt":" P(T ∈","inline":true,"padRight":true},{"text":"]0","element":"span"},{"style":{"height":5.2},"width":37.73,"height":13,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-1.png","element":"img","alt":", ϵ","inline":true},{"text":"]) decreases sufficiently quickly to ensure that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J ","element":"span"},{"text":"is not (almost) identically","element":"span"}],[{"text":"equal to +","element":"span"},{"style":{"height":7.2},"width":40,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-2.png","element":"img","alt":"∞","inline":true},{"text":". For instance, if ","element":"span"},{"style":{"height":16},"width":368.77,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-3.png","element":"img","alt":" P(T ∈]0, ϵ]) = ϵ, then","inline":true}],[{"style":{"width":"39%"},"width":541,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-4.png","element":"img"}],[{"text":"and the sum diverges, leading to ","element":"span"},{"style":{"height":16},"width":189.7,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-5.png","element":"img","alt":" J(m) = ∞","inline":true,"padRight":true},{"text":"(for ","element":"span"},{"style":{"height":16},"width":497.86,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-6.png","element":"img","alt":" m ̸= 0). On the contrary, if","inline":true}],[{"style":{"height":17.39},"width":386.65,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-7.png","element":"img","alt":"P(T ∈]0, ϵ]) = ϵ2, then","inline":true}],[{"style":{"width":"44%"},"width":612,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-8.png","element":"img"}],[{"text":"and thus the sum converges, leading to ","element":"span"},{"style":{"height":16},"width":186.06,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-9.png","element":"img","alt":" J(m) < ∞","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"(provided similar","element":"span"}],[{"style":{"width":"33%"},"width":811,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-10.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"3.2. Existence of a solution for the point-wise problem","element":"span"}],[{"text":"If the conditions of Proposition ","element":"span"},{"href":"#id-3","text":"1 ","element":"a"},{"text":"are not fulfilled, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":") is infinite excepted in ","element":"span"},{"style":{"height":16},"width":1375.15,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-11.png","element":"img","alt":"m = 0 and therefore arg minm∈R J(m) = 0. When they are fulfilled, we have to","inline":true,"padRight":true},{"text":"show that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":") has at least one global minimum. This is done in the following","element":"span"}],[{"style":{"width":"10%"},"width":266,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-12.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Proposition 2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Under the conditions of Proposition ","element":"span"},{"href":"#id-3","style":{"fontStyle":"italic"},"text":"1, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"J ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is convex and has at least one global minimum.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"We first note that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J ","element":"span"},{"text":"is convex. Indeed for all ","element":"span"},{"style":{"height":24.43},"width":334.75,"height":61.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-13.png","element":"img","alt":" t ̸= 0, m �→ |m−t||t|","inline":true,"padRight":true},{"text":"is obviously convex. Then the linearity of the expectation allows to conclude","element":"span"}],[{"style":{"width":"49%"},"width":1216,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-14.png","element":"img"}],[{"text":"As ","element":"span"},{"style":{"height":16},"width":1025.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-15.png","element":"img","alt":" P(T = 0) = 0, there is [a, b], a < b such that P(T ∈ [a, b]) >","inline":true,"padRight":true},{"text":"0 with either ","element":"span"},{"style":{"fontStyle":"italic"},"text":"a > ","element":"span"},{"text":"0 or ","element":"span"},{"style":{"fontStyle":"italic"},"text":"b < ","element":"span"},{"text":"0. Let us assume ","element":"span"},{"style":{"fontStyle":"italic"},"text":"a > ","element":"span"},{"text":"0, the other case being symmetric. Then for","element":"span"}],[{"style":{"width":"66%"},"width":921,"height":173,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-16.png","element":"img"}],[{"text":"Then","element":"span"}],[{"style":{"width":"38%"},"width":529,"height":192,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/7-17.png","element":"img"}],[{"text":"and therefore lim","element":"span"},{"style":{"height":16},"width":351.81,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-0.png","element":"img","alt":"m→+∞ J(m) = +∞.","inline":true}],[{"text":"Similarly, if ","element":"span"},{"style":{"height":16},"width":499.4,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-1.png","element":"img","alt":" m < 0 < a, then for t ∈ [a, b]","inline":true}],[{"style":{"width":"33%"},"width":467,"height":95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-2.png","element":"img"}],[{"text":"and then","element":"span"}],[{"style":{"width":"38%"},"width":529,"height":75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-3.png","element":"img"}],[{"text":"and therefore lim","element":"span"},{"style":{"height":16},"width":352.26,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-4.png","element":"img","alt":"m→−∞ J(m) = +∞.","inline":true}],[{"text":"Therefore, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J ","element":"span"},{"text":"is a coercive function and has at least a local minimum, which is global by convexity.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"3.3. Choosing the minimum","element":"span"},{"text":"95","element":"span"}],[{"text":"However, the minimum is not necessary unique, as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J ","element":"span"},{"text":"is not strictly convex. In general, the set of global minima will be a bounded interval of ","element":"span"},{"text":"R","element":"span"},{"text":". In this case, and by convention, we consider the mean value of the interval as the optimal solution.","element":"span"}],[{"text":"As an example of such behavior, we can consider the case where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is a random variable on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"3","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"text":", such that ","element":"span"},{"text":"P","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"= 1) = 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"3, ","element":"span"},{"text":"P","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"= 2) = 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"4 and","element":"span"}],[{"text":"P","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"= 3) = 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"3. Then the expected loss is","element":"span"}],[{"style":{"width":"68%"},"width":939,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-5.png","element":"img"}],[{"text":"and the figure ","element":"span"},{"href":"#id-4","text":"1 ","element":"a"},{"text":"illustrates that there is an infinity of solutions. Indeed when","element":"span"}],[{"style":{"height":16},"width":372.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-6.png","element":"img","alt":"m ∈ [1, 2], J becomes","inline":true}],[{"style":{"width":"49%"},"width":1212,"height":339,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-7.png","element":"img"}],[{"text":"More generally, for any random variable ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":", we have defined a unique value ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":", which is a global minimum of ","element":"span"},{"style":{"height":19.96},"width":326.28,"height":49.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-8.png","element":"img","alt":" J(m) = E�� m−TT ��","inline":true},{"text":". Moving back to our problem, it ensures that the MAPE-regression function ","element":"span"},{"style":{"height":9.19},"width":140.89,"height":22.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-9.png","element":"img","alt":" mMAP E","inline":true,"padRight":true},{"text":"introduced in ","element":"span"},{"href":"#id-5","text":"8 ","element":"a"},{"text":"is well defined and takes finite values on ","element":"span"},{"style":{"height":13.38},"width":45.78,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-10.png","element":"img","alt":" Rd","inline":true},{"text":". As ","element":"span"},{"style":{"height":9.19},"width":140.9,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-11.png","element":"img","alt":" mMAP E","inline":true,"padRight":true},{"text":"is point-wise optimal, it is","element":"span"}],[{"style":{"width":"18%"},"width":441,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/8-12.png","element":"img"}],[{"style":{"width":"65%"},"width":898,"height":815,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/9-0.png","element":"img"}],[{"id":"id-4","text":"Figure 1: Counterexample with an infinite number of solutions.","element":"figcaption","subtype":"caption"}]]},{"heading":"4. Eﬀects of the MAPE on complexity control","paragraphs":[[{"text":"One of the most standard learning strategy is the Empirical Risk Minimization (ERM) principle. We assume given a training set ","element":"span"},{"style":{"height":16.79},"width":352.5,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/9-1.png","element":"img","alt":" Dn = (Zi)1≤i≤N =","inline":true,"padRight":true},{"text":"(","element":"span"},{"style":{"height":16.79},"width":213.18,"height":41.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/9-2.png","element":"img","alt":"Xi, Yi)1≤i≤n","inline":true,"padRight":true},{"text":"which consists in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"i.i.d. copies of the random pair ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":"= (","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, Y ","element":"span"},{"text":").","element":"span"}],[{"style":{"width":"58%"},"width":1444,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/9-3.png","element":"img"}],[{"text":"from ","element":"span"},{"style":{"height":13.38},"width":138.35,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/9-4.png","element":"img","alt":" Rd to R","inline":true},{"text":". Given a loss function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l","element":"span"},{"text":", we denote ","element":"span"},{"style":{"height":18.92},"width":348.5,"height":47.29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/9-5.png","element":"img","alt":" L∗l,G = infg∈G Ll(g).","inline":true}],[{"text":"The empirical estimate of ","element":"span"},{"style":{"height":16},"width":73.85,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/9-6.png","element":"img","alt":" Ll(g","inline":true},{"text":") (called the ","element":"span"},{"style":{"fontWeight":"bold"},"text":"empirical risk","element":"span"},{"text":") is given by","element":"span"}],[{"style":{"width":"68%"},"width":947,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/9-7.png","element":"img"}],[{"text":"Then the ERM principle consists in choosing in the class ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"text":"the model that","element":"span"}],[{"text":"minimizes the empirical risk, that is","element":"span"}],[{"style":{"width":"67%"},"width":927,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/9-8.png","element":"img"}],[{"text":"The main theoretical question associated to the ERM principle is how to control ","element":"span"},{"style":{"height":16.39},"width":173.31,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/9-9.png","element":"img","alt":"Ll(�gl,Dn,G","inline":true},{"text":") in such a way that it converges to ","element":"span"},{"style":{"height":17.9},"width":71.82,"height":44.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/9-10.png","element":"img","alt":" L∗l,G","inline":true},{"text":". An extension of this question","element":"span"}],[{"text":"is whether ","element":"span"},{"style":{"height":15.5},"width":43.12,"height":38.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/10-0.png","element":"img","alt":" L∗l ","inline":true,"padRight":true},{"text":"can be reached if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"text":"is allowed to depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":": the ERM is said to","element":"span"}],[{"style":{"width":"59%"},"width":1446,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/10-1.png","element":"img"}],[{"text":"for any distribution of (","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, Y ","element":"span"},{"text":") (see Section ","element":"span"},{"text":"5)","element":"span"},{"text":".","element":"span"}],[{"text":"It is well known (see e.g. [","element":"span"},{"href":"#id-2","text":"3","element":"a"},{"text":"] chapter 9) that ERM consistency is related to uniform laws of large numbers (ULLN). In particular, we need to control","element":"span"}],[{"text":"quantities of the following form","element":"span"}],[{"style":{"width":"77%"},"width":1072,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/10-2.png","element":"img"}],[{"text":"This can be done via covering numbers or via the Vapnik-Chervonenkis dimension (VC-dim) of certain classes of functions derived from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G","element":"span"},{"text":". One might think that general results about arbitrary loss functions can be used to handle the case","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/10-3.png","element":"img"}],[{"text":"Lipschitz property of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l ","element":"span"},{"text":"(see Lemma 17.6 in [","element":"span"},{"href":"#id-6","text":"4","element":"a"},{"text":"], for instance) that is not fulfilled by the MAPE.","element":"span"}],[{"text":"The objective of this section is to analyze the effects over covering numbers (Section ","element":"span"},{"href":"#id-7","text":"4.2) ","element":"a"},{"text":"and VC-dimension (Section ","element":"span"},{"href":"#id-8","text":"4.3) ","element":"a"},{"text":"of using the MAPE as the loss","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/10-4.png","element":"img"}],[{"text":"those analyses (Section ","element":"span"},{"href":"#id-9","text":"4.4)","element":"a"},{"text":".","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"4.1. Classes of functions","element":"span"}],[{"text":"Given a class of models, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G","element":"span"},{"text":", and a loss function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l","element":"span"},{"text":", we introduce derived classes,","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"G, l","element":"span"},{"text":") given by","element":"span"}],[{"style":{"width":"86%"},"width":1196,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/10-5.png","element":"img"}],[{"text":"and ","element":"span"},{"style":{"height":16.98},"width":314.94,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/10-6.png","element":"img","alt":" H+(G, l) given by","inline":true}],[{"style":{"width":"95%"},"width":1308,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/10-7.png","element":"img"}],[{"id":"id-7","style":{"fontStyle":"italic"},"text":"4.2. Covering numbers","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"4.2.1. Notations and definitions","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"text":"be a class of positive functions from an arbitrary set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":"to ","element":"span"},{"style":{"height":12.99},"width":53.78,"height":32.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/10-8.png","element":"img","alt":" R+","inline":true},{"text":". The","element":"span"}],[{"text":"supremum norm on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"text":"is given by","element":"span"}],[{"style":{"width":"23%"},"width":321,"height":67,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/10-9.png","element":"img"}],[{"text":"We also define ","element":"span"},{"style":{"height":18.3},"width":386.84,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-0.png","element":"img","alt":" ∥F∥∞ = supf∈F ∥f∥∞","inline":true},{"text":". We have obviously","element":"span"}],[{"style":{"width":"58%"},"width":1444,"height":152,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-1.png","element":"img"}],[{"text":"only in ","element":"span"},{"style":{"height":12.98},"width":53.78,"height":32.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-2.png","element":"img","alt":" R+","inline":true},{"text":"), hence the absolute value.","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":7.2},"width":23,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-3.png","element":"img","alt":" κ","inline":true,"padRight":true},{"text":"be a dissimilarity on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F","element":"span"},{"text":", that is a positive and symmetric function from ","element":"span"},{"style":{"height":13.39},"width":47.16,"height":33.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-4.png","element":"img","alt":" F 2","inline":true,"padRight":true},{"text":"to ","element":"span"},{"style":{"height":12.99},"width":53.78,"height":32.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-5.png","element":"img","alt":" R+","inline":true,"padRight":true},{"text":"that measures how two functions from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"text":"are dissimilar (in particular ","element":"span"},{"style":{"height":15.6},"width":357.92,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-6.png","element":"img","alt":" κ(f, f) = 0). Then κ","inline":true,"padRight":true},{"text":"can be used to characterize the complexity of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F","element":"span"}],[{"style":{"width":"34%"},"width":843,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-7.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Definition 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a class of positive functions from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"style":{"fontStyle":"italic"},"text":"to ","element":"span"},{"style":{"height":12.98},"width":53.78,"height":32.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-8.png","element":"img","alt":" R+","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":7.2},"width":23,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-9.png","element":"img","alt":" κ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"a dissimilarity on ","element":"span"},{"style":{"height":14},"width":334.23,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-10.png","element":"img","alt":" F. For ϵ > 0 and p","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"a positive integer, a size ","element":"span"},{"style":{"height":14},"width":328.59,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-11.png","element":"img","alt":" p ϵ-cover of F with","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"respect to ","element":"span"},{"style":{"height":7.2},"width":23,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-12.png","element":"img","alt":" κ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a finite collection ","element":"span"},{"style":{"height":15.59},"width":162.45,"height":38.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-13.png","element":"img","alt":" f1, . . . , fp","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"of elements of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"such that for all","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"height":11.6},"width":68.64,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-14.png","element":"img","alt":" ∈ F","inline":true}],[{"style":{"width":"21%"},"width":301,"height":62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-15.png","element":"img"}],[{"text":"Then the ","element":"span"},{"style":{"height":7.2},"width":56.22,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-16.png","element":"img","alt":" κ ϵ","inline":true},{"text":"-covering number of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"text":"is defined as follow.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a class of positive functions from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"style":{"fontStyle":"italic"},"text":"to ","element":"span"},{"style":{"height":16.19},"width":109.41,"height":40.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-17.png","element":"img","alt":" R+, κ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a dissimilarity on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":9.6},"width":59.58,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-18.png","element":"img","alt":" ϵ >","inline":true,"padRight":true},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":". Then the ","element":"span"},{"style":{"height":7.2},"width":57.94,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-19.png","element":"img","alt":" κ ϵ","inline":true},{"style":{"fontStyle":"italic"},"text":"-covering number of ","element":"span"},{"style":{"height":16.4},"width":214.52,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-20.png","element":"img","alt":" F, N(ϵ, F, κ","inline":true},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":", is the size of the smallest ","element":"span"},{"style":{"height":7.2},"width":58.3,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-21.png","element":"img","alt":" κ ϵ","inline":true},{"style":{"fontStyle":"italic"},"text":"-cover of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F","element":"span"},{"style":{"fontStyle":"italic"},"text":". If such a cover does not exists, the","element":"span"}],[{"style":{"width":"18%"},"width":458,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-22.png","element":"img"}],[{"text":"The behavior of ","element":"span"},{"style":{"height":16.4},"width":155.35,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-23.png","element":"img","alt":" N(ϵ, F, κ","inline":true},{"text":") with respect to ","element":"span"},{"style":{"height":0},"width":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-24.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"characterizes the complexity of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"text":"as seen through ","element":"span"},{"style":{"height":7.2},"width":23,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-25.png","element":"img","alt":" κ","inline":true},{"text":". If the growth when ","element":"span"},{"style":{"height":8.8},"width":67.27,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-26.png","element":"img","alt":" ϵ →","inline":true,"padRight":true},{"text":"0 is slow enough (for an adapted choice of ","element":"span"},{"style":{"height":7.2},"width":23,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-27.png","element":"img","alt":" κ","inline":true},{"text":"), then some uniform law of large numbers applies (see Lemma ","element":"span"},{"href":"#id-10","text":"1)","element":"a"},{"text":".","element":"span"}],[{"id":"id-13","style":{"fontStyle":"italic"},"text":"4.2.2. Supremum covering numbers","element":"span"}],[{"text":"Supremum covering numbers are based on the supremum norm, that is","element":"span"}],[{"style":{"width":"41%"},"width":571,"height":66,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-28.png","element":"img"}],[{"text":"For classical loss functions, the supremum norm is generally ill-defined on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"G, l","element":"span"},{"text":"). For instance let ","element":"span"},{"style":{"height":13.19},"width":169.74,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-29.png","element":"img","alt":" h1 and h2","inline":true,"padRight":true},{"text":"be two functions from ","element":"span"},{"style":{"height":16},"width":128.59,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-30.png","element":"img","alt":" H(G, l2","inline":true},{"text":"), generated by ","element":"span"},{"style":{"height":14},"width":113.71,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/11-31.png","element":"img","alt":" g1 and","inline":true}],[{"style":{"height":17.39},"width":713.38,"height":43.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-0.png","element":"img","alt":"g2 (that is hi(x, y) = (gi(x) − y)2). Then","inline":true}],[{"style":{"width":"58%"},"width":1443,"height":240,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-1.png","element":"img"}],[{"text":"and a value of ","element":"span"},{"style":{"height":16},"width":431.45,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-2.png","element":"img","alt":" x such that g1(x) ̸= g2(x","inline":true},{"text":"). Then sup","element":"span"},{"style":{"height":18.3},"width":467.2,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-3.png","element":"img","alt":"y |h1(x, y) − h2(x, y)| = ∞.","inline":true}],[{"text":"A similar situation arises for the MAPE. Indeed, let ","element":"span"},{"style":{"height":13.19},"width":160.83,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-4.png","element":"img","alt":" h1 and h2","inline":true,"padRight":true},{"text":"be two functions from ","element":"span"},{"style":{"height":13.19},"width":139.02,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-5.png","element":"img","alt":" HMAP E","inline":true},{"text":", generated by ","element":"span"},{"style":{"height":24.43},"width":869.26,"height":61.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-6.png","element":"img","alt":" g1 and g2 in G (that is hi(x, y) = |gi(x)−y||y| ). Then","inline":true}],[{"style":{"width":"64%"},"width":885,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-7.png","element":"img"}],[{"text":"Thus unless ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"text":"is very restricted there is always ","element":"span"},{"style":{"height":15.6},"width":556.96,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-8.png","element":"img","alt":" x, g1 and g2 such that g1(x) ̸= 0","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16},"width":1023.25,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-9.png","element":"img","alt":" |g2(x)| ̸= |g1(x)|. Then for y > 0, ||g1(x) − y| − |g2(x) − y||","inline":true,"padRight":true},{"text":"has the general form ","element":"span"},{"style":{"height":14.4},"width":291.71,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-10.png","element":"img","alt":" α + βy with α >","inline":true,"padRight":true},{"text":"0 and thus lim","element":"span"},{"style":{"height":24.43},"width":556.45,"height":61.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-11.png","element":"img","alt":"y→0+||g1(x)−y|−|g2(x)−y|||y| = +∞.","inline":true}],[{"style":{"width":"58%"},"width":1444,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-12.png","element":"img"}],[{"text":"definition to a subset of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z","element":"span"},{"text":". This corresponds in practice to support assumptions on the data (","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, Y ","element":"span"},{"text":"). Hypotheses on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"text":"are also needed in general. In this latter case, one generally assumes ","element":"span"},{"style":{"height":16},"width":198.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-13.png","element":"img","alt":" ∥G∥∞ < ∞","inline":true},{"text":". In the former case, assumptions depends on the nature of the loss function.","element":"span"}],[{"text":"For instance in the case of the MSE, it is natural to assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"is ","element":"span"},{"style":{"fontWeight":"bold"},"text":"upper","element":"span"}],[{"style":{"height":14},"width":288.26,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-14.png","element":"img","alt":"bounded by YU","inline":true,"padRight":true},{"text":"with probability one. If (","element":"span"},{"style":{"height":17.39},"width":482.86,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-15.png","element":"img","alt":"x, y) ∈ Rd × [−YU, YU] then","inline":true}],[{"style":{"width":"48%"},"width":1190,"height":150,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-16.png","element":"img"}],[{"text":"In the case of the MAPE, a natural hypothesis is that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"is ","element":"span"},{"style":{"fontWeight":"bold"},"text":"lower bounded","element":"span"}],[{"text":"by ","element":"span"},{"style":{"height":13.19},"width":45.14,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-17.png","element":"img","alt":" YL","inline":true,"padRight":true},{"text":"(almost surely). If (","element":"span"},{"style":{"height":17.38},"width":731.8,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-18.png","element":"img","alt":"x, y) ∈ Rd × (] − ∞, −YL] ∪ [YL, ∞[), then","inline":true}],[{"style":{"width":"43%"},"width":599,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-19.png","element":"img"}],[{"text":"and therefore the supremum norm is well defined.","element":"span"}],[{"text":"The case of the MAE is slightly different. Indeed when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"is fixed, then for sufficiently large positive values of ","element":"span"},{"style":{"height":16},"width":787.19,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/12-20.png","element":"img","alt":" y, ||g1(x) − y| − |g2(x) − y|| = |g1(x) − g2(x)|.","inline":true,"padRight":true},{"text":"Similarly, for sufficient large negative values of ","element":"span"},{"style":{"height":16},"width":529.74,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-0.png","element":"img","alt":" y, ||g1(x) − y| − |g2(x) − y|| =","inline":true}],[{"style":{"width":"59%"},"width":1450,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-1.png","element":"img"}],[{"style":{"height":16},"width":198.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-2.png","element":"img","alt":"∥G∥∞ < ∞","inline":true},{"text":". In addition, we have the following proposition.","element":"span"}],[{"id":"id-12","style":{"fontWeight":"bold"},"text":"Proposition 3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be an arbitrary class of models with ","element":"span"},{"style":{"height":16},"width":167.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-3.png","element":"img","alt":" ∥G∥ < ∞","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and let","element":"span"}],[{"style":{"height":17.38},"width":308.04,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-4.png","element":"img","alt":"YL > 0. Let ∥.∥YL∞ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denote the supremum norm on ","element":"span"},{"href":"#id-11","style":{"height":17.38},"width":439.5,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-5.png","element":"img","alt":" H(G, lMAP E) defined by2","inline":true}],[{"style":{"width":"49%"},"width":677,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-6.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":14},"width":191.66,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-7.png","element":"img","alt":" ϵ > 0, then","inline":true}],[{"style":{"width":"71%"},"width":980,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":9.6},"width":66.35,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-9.png","element":"img","alt":" ϵ >","inline":true,"padRight":true},{"text":"0 and let ","element":"span"},{"style":{"height":15.32},"width":169.35,"height":38.29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-10.png","element":"img","alt":" h′1, . . . , h′k","inline":true,"padRight":true},{"text":"be a minimal ","element":"span"},{"style":{"height":13.19},"width":61.31,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-11.png","element":"img","alt":" ϵYL","inline":true,"padRight":true},{"text":"cover of ","element":"span"},{"style":{"height":16},"width":194.43,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-12.png","element":"img","alt":" H(G, lMAE","inline":true},{"text":") ","element":"span"},{"text":"(thus ","element":"span"},{"style":{"height":16.4},"width":527.45,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-13.png","element":"img","alt":" k = N(ϵYL, H(G, lMAE), ∥.∥∞","inline":true},{"text":")). Let ","element":"span"},{"style":{"height":10},"width":161.45,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-14.png","element":"img","alt":" g1, . . . , gk","inline":true,"padRight":true},{"text":"be the functions from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"text":"associated to ","element":"span"},{"style":{"height":15.31},"width":169.35,"height":38.28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-15.png","element":"img","alt":" h′1, . . . , h′k","inline":true,"padRight":true},{"text":"and let ","element":"span"},{"style":{"height":14},"width":169.36,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-16.png","element":"img","alt":" h1, . . . , hk","inline":true,"padRight":true},{"text":"be the corresponding functions in","element":"span"}],[{"style":{"width":"45%"},"width":1122,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-17.png","element":"img"}],[{"text":"Indeed let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"be an arbitrary element of ","element":"span"},{"style":{"height":15.6},"width":218.38,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-18.png","element":"img","alt":" H(G, lMAP E","inline":true},{"text":") associated ","element":"span"},{"style":{"height":14},"width":236.96,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-19.png","element":"img","alt":" g and let h′ be","inline":true,"padRight":true},{"text":"the corresponding function in ","element":"span"},{"style":{"height":15.6},"width":193.81,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-20.png","element":"img","alt":" H(G, lMAE","inline":true},{"text":"). Then for a given ","element":"span"},{"style":{"height":18.55},"width":351.97,"height":46.37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-21.png","element":"img","alt":" j, ∥h′−h′j∥∞ ≤ ϵYL.","inline":true}],[{"text":"We have then","element":"span"}],[{"style":{"width":"78%"},"width":1078,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-22.png","element":"img"}],[{"text":"For all ","element":"span"},{"style":{"height":22.17},"width":773.93,"height":55.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-23.png","element":"img","alt":" y ∈] − ∞, −YL] ∪ [YL, ∞[, 1|y| ≤ 1YL and thus","inline":true}],[{"style":{"width":"78%"},"width":1078,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-24.png","element":"img"}],[{"text":"Then","element":"span"}],[{"id":"id-11","style":{"width":"104%"},"width":1437,"height":335,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/13-25.png","element":"img"}],[{"text":"and thus","element":"span"}],[{"style":{"width":"19%"},"width":266,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-0.png","element":"img"}],[{"text":"which allows to conclude.","element":"span"}],[{"text":"This Proposition shows that the covering numbers associated to a class of functions ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"text":"under the MAPE are related to the covering numbers of the same class under the MAE, as long as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":"stays away from too small values.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"4.2.3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"style":{"fontStyle":"italic"},"text":"covering numbers","element":"span"},{"text":"170","element":"span"}],[{"style":{"height":15.59},"width":44.12,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-1.png","element":"img","alt":"Lp","inline":true,"padRight":true},{"text":"covering numbers are based on a data dependent norm. Based on the training set ","element":"span"},{"style":{"height":13.19},"width":52.99,"height":32.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-2.png","element":"img","alt":" Dn","inline":true},{"text":", we define for ","element":"span"},{"style":{"height":14},"width":117.38,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-3.png","element":"img","alt":" p ≥ 1 :","inline":true}],[{"style":{"width":"78%"},"width":1084,"height":135,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-4.png","element":"img"}],[{"text":"We have a simple proposition:","element":"span"}],[{"id":"id-15","style":{"fontWeight":"bold"},"text":"Proposition 4. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be an arbitrary class of models and ","element":"span"},{"style":{"height":13.19},"width":52.99,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-5.png","element":"img","alt":" Dn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"a data set such","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"that ","element":"span"},{"style":{"height":15.2},"width":171.9,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-6.png","element":"img","alt":" ∀i, Yi ̸= 0","inline":true},{"style":{"fontStyle":"italic"},"text":", then for all ","element":"span"},{"style":{"height":14},"width":105.1,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-7.png","element":"img","alt":" p ≥ 1,","inline":true}],[{"style":{"width":"85%"},"width":1181,"height":62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"The proof is similar to the one of Proposition ","element":"span"},{"href":"#id-12","text":"3.","element":"a"}],[{"text":"This Proposition is the adaptation of Proposition ","element":"span"},{"href":"#id-12","text":"3 ","element":"a"},{"text":"to ","element":"span"},{"style":{"height":15.59},"width":44.12,"height":38.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-9.png","element":"img","alt":" Lp","inline":true,"padRight":true},{"text":"covering numbers.","element":"span"}],[{"id":"id-8","style":{"fontStyle":"italic"},"text":"4.3. VC-dimension","element":"span"}],[{"style":{"width":"59%"},"width":1461,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-10.png","element":"img"}],[{"text":"dimension (VC dimension). We recall first the definition of the shattering coeffi-cients of a function class.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a class of functions from ","element":"span"},{"style":{"height":17.38},"width":311.45,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-11.png","element":"img","alt":" Rd to {0, 1} and n","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a positive","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"integer. Let ","element":"span"},{"style":{"height":16},"width":205.11,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-12.png","element":"img","alt":" {z1, . . . , zn}","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a set of ","element":"span"},{"style":{"height":16.59},"width":336.35,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-13.png","element":"img","alt":" n points of Rd. Let","inline":true}],[{"style":{"width":"83%"},"width":1152,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/14-14.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"that is the number of different binary vectors of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"style":{"fontStyle":"italic"},"text":"that are generated by functions of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"when they are applied to ","element":"span"},{"style":{"height":16},"width":217.04,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-0.png","element":"img","alt":" {z1, . . . , zn}.","inline":true}],[{"style":{"width":"51%"},"width":1267,"height":247,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-1.png","element":"img"}],[{"text":"Then the VC-dimension is defined as follows.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 4. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a class of functions from ","element":"span"},{"style":{"height":17.38},"width":197.78,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-2.png","element":"img","alt":" Rd to {0, 1}","inline":true},{"style":{"fontStyle":"italic"},"text":". The VC-dimension","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is defined by","element":"span"}],[{"style":{"width":"52%"},"width":722,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-3.png","element":"img"}],[{"text":"Interestingly, replacing the MAE by the MAPE does not increase the VC-dim of the relevant class of functions.","element":"span"}],[{"id":"id-16","style":{"fontWeight":"bold"},"text":"Proposition 5. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be an arbitrary class of models. We have","element":"span"}],[{"style":{"width":"62%"},"width":862,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-4.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Let us consider a set of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"points shattered by ","element":"span"},{"style":{"height":16.98},"width":491.03,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-5.png","element":"img","alt":" H+(G, lMAP E), (v1, . . . , vk),","inline":true}],[{"style":{"width":"59%"},"width":1445,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-6.png","element":"img"}],[{"text":"function ","element":"span"},{"style":{"height":16},"width":315.78,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-7.png","element":"img","alt":" hθ ∈ H(G, lMAP E","inline":true},{"text":") such that ","element":"span"},{"style":{"height":17.68},"width":497.27,"height":44.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-8.png","element":"img","alt":" ∀j, It≤hθ(x,y)(xj, yj, tj) = θj","inline":true},{"text":". Each ","element":"span"},{"style":{"height":13.19},"width":37.96,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-9.png","element":"img","alt":" hθ","inline":true,"padRight":true},{"text":"corresponds to a ","element":"span"},{"style":{"height":24.43},"width":575.04,"height":61.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-10.png","element":"img","alt":" gθ ∈ G, with hθ(x, y) = |gθ(x)−y||y| .","inline":true}],[{"text":"We define a new set of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"points, (","element":"span"},{"style":{"height":10},"width":180.49,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-11.png","element":"img","alt":"w1, . . . , wk","inline":true},{"text":") as follows. If ","element":"span"},{"style":{"height":16.39},"width":226.23,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-12.png","element":"img","alt":" yj ̸= 0, then","inline":true}],[{"style":{"height":16.79},"width":311.13,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-13.png","element":"img","alt":"wj = (xj, yj, |yj|tj","inline":true},{"text":"). For those points and for any ","element":"span"},{"style":{"height":14},"width":111.47,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-14.png","element":"img","alt":" g ∈ G,","inline":true}],[{"style":{"width":"99%"},"width":1371,"height":155,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-15.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"y","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"text":".","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-16.png","element":"img"}],[{"style":{"height":16.39},"width":1374.85,"height":40.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-17.png","element":"img","alt":"gθ(xj) = 0 and hθ(xj, 0) = ∞ if gθ(xj) ̸= 0. As the set of points is shattered tj >","inline":true,"padRight":true},{"text":"1 (or ","element":"span"},{"style":{"height":16.39},"width":228.7,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-18.png","element":"img","alt":" hθ(xj, 0) < tj","inline":true,"padRight":true},{"text":"will never be possible). In addition when ","element":"span"},{"style":{"height":16.39},"width":352.14,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-19.png","element":"img","alt":" θj = 1 then gθ(xj) ≥","inline":true,"padRight":true},{"text":"0 and when ","element":"span"},{"style":{"height":17.59},"width":1131.92,"height":43.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/15-20.png","element":"img","alt":" θj = 0 then gθ(xj) = 0. Then let wj = (xj, 0, minθ,θj=1 |gθ(xj)|","inline":true},{"text":"). ","element":"span"},{"text":"Notice that min","element":"span"},{"style":{"height":17.58},"width":276.27,"height":43.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-0.png","element":"img","alt":"θ,θj=1 |gθ(xj)| >","inline":true,"padRight":true},{"text":"0 (as there is a finite number of binary vectors","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-1.png","element":"img"}],[{"text":"and thus ","element":"span"},{"style":{"height":17.58},"width":522.78,"height":43.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-2.png","element":"img","alt":" h′θ(xj, yj) < minθ,θj=1 |gθ(xj)|","inline":true},{"text":", that is ","element":"span"},{"style":{"height":18.74},"width":399.68,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-3.png","element":"img","alt":" It≤h′θ(x,y)(wj) = 0 = θj","inline":true},{"text":". For ","element":"span"},{"style":{"height":10.8},"width":19,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-4.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":16.79},"width":487.16,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-5.png","element":"img","alt":" θj = 1, h′θ(xj, yj) = |gθ(xj)|","inline":true,"padRight":true},{"text":"and thus ","element":"span"},{"style":{"height":17.59},"width":521.76,"height":43.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-6.png","element":"img","alt":" h′θ(xj, yj) ≥ minθ,θj=1 |gθ(xj)|","inline":true},{"text":". ","element":"span"},{"text":"Then ","element":"span"},{"style":{"height":18.73},"width":407.88,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-7.png","element":"img","alt":" It≤h′θ(x,y)(wj) = 1 = θj.","inline":true}],[{"text":"This shows that for each binary vector ","element":"span"},{"style":{"height":17.38},"width":202.31,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-8.png","element":"img","alt":" θ ∈ {0, 1}k","inline":true},{"text":", there is a function","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-9.png","element":"img"}],[{"text":"shattered by ","element":"span"},{"style":{"height":16.98},"width":250.02,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-10.png","element":"img","alt":" H+(G, lMAE).","inline":true}],[{"text":"Therefore ","element":"span"},{"style":{"height":16.99},"width":1137.14,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-11.png","element":"img","alt":" V Cdim(H+(G, lMAE)) ≥ k. If V Cdim(H+(G, lMAP E)) < ∞, then","inline":true,"padRight":true},{"text":"we can take ","element":"span"},{"style":{"height":16.98},"width":454.5,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-12.png","element":"img","alt":" k = V Cdim(H+(G, lMAP E","inline":true},{"text":")) to get the conclusion.","element":"span"}],[{"text":"If ","element":"span"},{"style":{"height":16.98},"width":510.03,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-13.png","element":"img","alt":" V Cdim(H+(G, lMAP E)) = ∞","inline":true,"padRight":true},{"text":"then ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"can be chosen arbitrarily large and","element":"span"}],[{"style":{"width":"58%"},"width":1444,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-14.png","element":"img"}],[{"text":"Using theorem 9.4 from [","element":"span"},{"href":"#id-2","text":"3","element":"a"},{"text":"], we can bound the ","element":"span"},{"style":{"height":10.8},"width":44.12,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-15.png","element":"img","alt":" Lp","inline":true,"padRight":true},{"text":"covering number with a VC-dim based value. If ","element":"span"},{"style":{"height":16.99},"width":350.78,"height":42.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-16.png","element":"img","alt":" V Cdim(H+(G, l)) ≥","inline":true,"padRight":true},{"text":"2, ","element":"span"},{"style":{"height":13.6},"width":64.34,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-17.png","element":"img","alt":" p ≥","inline":true,"padRight":true},{"text":"1, and 0 ","element":"span"},{"style":{"height":21.63},"width":280.13,"height":54.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-18.png","element":"img","alt":" < ϵ < ∥H(G,l)∥∞4","inline":true,"padRight":true},{"text":",","element":"span"}],[{"text":"then","element":"span"}],[{"style":{"width":"98%"},"width":1357,"height":155,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-19.png","element":"img"}],[{"text":"Therefore, in practice, both the covering numbers and the VC-dimension of MAPE based classes can be derived from the VC-dimension of MAE based classes.","element":"span"}],[{"id":"id-9","style":{"fontStyle":"italic"},"text":"4.4. Examples of Uniform Laws of Large Numbers","element":"span"}],[{"style":{"width":"57%"},"width":1409,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-20.png","element":"img"}],[{"text":"Rephrased with our notations, Lemme 9.1 from ","element":"span"},{"href":"#id-2","text":"[3] ","element":"a"},{"text":"is","element":"span"}],[{"id":"id-10","style":{"fontWeight":"bold"},"text":"Lemma 1 ","element":"span"},{"text":"(Lemma 9.1 from [","element":"span"},{"href":"#id-2","text":"3","element":"a"},{"text":"])","element":"span"},{"style":{"height":14},"width":314.17,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-21.png","element":"img","alt":". For all n, let Fn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a class of functions from","element":"span"}],[{"style":{"height":16},"width":544.86,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-22.png","element":"img","alt":"Z to [0, B] and let ϵ > 0. Then","inline":true}],[{"style":{"width":"84%"},"width":1170,"height":146,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/16-23.png","element":"img"}],[{"style":{"width":"66%"},"width":915,"height":154,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-0.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"height":14},"width":118.6,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-1.png","element":"img","alt":" ϵ, then","inline":true}],[{"style":{"width":"81%"},"width":1127,"height":146,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-2.png","element":"img"}],[{"text":"A direct application of Lemma ","element":"span"},{"href":"#id-10","text":"1 ","element":"a"},{"text":"to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"G, l","element":"span"},{"text":") gives","element":"span"}],[{"style":{"width":"82%"},"width":1133,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-3.png","element":"img"}],[{"text":"provided the support of the supremum norm coincides with the support of (","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, Y ","element":"span"},{"text":") and functions in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"G, l","element":"span"},{"text":") are bounded.","element":"span"}],[{"text":"In order to fulfill this latter condition, we have to resort on the same strategy","element":"span"}],[{"style":{"width":"47%"},"width":1173,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-4.png","element":"img"}],[{"text":"As in Section ","element":"span"},{"href":"#id-13","text":"4.2.2 ","element":"a"},{"text":"let ","element":"span"},{"style":{"height":16},"width":200.34,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-5.png","element":"img","alt":" ∥G∥∞ < ∞","inline":true,"padRight":true},{"text":"and let ","element":"span"},{"style":{"height":13.19},"width":145.22,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-6.png","element":"img","alt":" YU < ∞","inline":true,"padRight":true},{"text":"be such that ","element":"span"},{"style":{"height":16},"width":156.66,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-7.png","element":"img","alt":" |Y | ≤ YU","inline":true}],[{"text":"almost surely, then","element":"span"}],[{"style":{"width":"54%"},"width":756,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-8.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"52%"},"width":728,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-9.png","element":"img"}],[{"text":"Then if ","element":"span"},{"style":{"height":17.78},"width":665.78,"height":44.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-10.png","element":"img","alt":" B ≥ ∥G∥2∞+Y 2U (resp. B ≥ ∥G∥∞+YU","inline":true},{"text":"), Lemma ","element":"span"},{"href":"#id-10","text":"1 ","element":"a"},{"text":"applies to ","element":"span"},{"style":{"height":15.6},"width":208.88,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-11.png","element":"img","alt":" H(G, lMSE)","inline":true,"padRight":true},{"text":"(resp. to ","element":"span"},{"style":{"height":16},"width":239.06,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-12.png","element":"img","alt":" H(G, lMAE)).","inline":true}],[{"text":"Similar results can be obtained for the MAPE. Indeed let us assume that","element":"span"}],[{"style":{"height":16},"width":196.32,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-13.png","element":"img","alt":"|Y | ≥ YL >","inline":true,"padRight":true},{"text":"0 almost surely. Then if ","element":"span"},{"style":{"height":16},"width":257.86,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-14.png","element":"img","alt":" ∥G∥∞ is finite,","inline":true}],[{"style":{"width":"85%"},"width":1182,"height":195,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-15.png","element":"img"}],[{"text":"This discussion shows that ","element":"span"},{"style":{"height":13.19},"width":45.14,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-16.png","element":"img","alt":" YL","inline":true},{"text":", the lower bound on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"text":", plays a very similar","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-17.png","element":"img"}],[{"text":"MAE and the MSE. A very similar analysis can be made when using the ","element":"span"},{"style":{"height":15.59},"width":44.12,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/17-18.png","element":"img","alt":" Lp","inline":true,"padRight":true},{"text":"covering numbers, on the basis of Theorem 9.1 from [","element":"span"},{"href":"#id-2","text":"3","element":"a"},{"text":"]. It can also be combined ","element":"span"},{"text":"with the results obtained on the VC-dimension. Rephrased with our notations,","element":"span"}],[{"text":"Theorem 9.1 from ","element":"span"},{"href":"#id-2","text":"[3] ","element":"a"},{"text":"is","element":"span"}],[{"id":"id-18","style":{"fontWeight":"bold"},"text":"Theorem 1 ","element":"span"},{"text":"(Theorem 9.1 from [","element":"span"},{"href":"#id-2","text":"3","element":"a"},{"text":"])","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"F ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a class of functions from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"style":{"fontStyle":"italic"},"text":"to","element":"span"}],[{"text":"[0","element":"span"},{"style":{"height":16},"width":537.03,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/18-0.png","element":"img","alt":", B]. Then for ϵ > 0 and n > 0","inline":true}],[{"style":{"width":"59%"},"width":1449,"height":224,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/18-1.png","element":"img"}],[{"text":"As for Lemma ","element":"span"},{"href":"#id-10","text":"1, ","element":"a"},{"text":"we bound ","element":"span"},{"style":{"height":16},"width":200.3,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/18-2.png","element":"img","alt":" ∥H(G, l)∥∞","inline":true,"padRight":true},{"text":"via assumptions on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G ","element":"span"},{"text":"and on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"text":". For","element":"span"}],[{"text":"instance for the MAE, we have","element":"span"}],[{"id":"id-17","style":{"width":"96%"},"width":1332,"height":202,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/18-3.png","element":"img"}],[{"text":"and for the MAPE","element":"span"}],[{"id":"id-14","style":{"width":"96%"},"width":1331,"height":209,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/18-4.png","element":"img"}],[{"text":"Equation ","element":"span"},{"href":"#id-14","text":"(20) ","element":"a"},{"text":"can be combined with results from Propositions ","element":"span"},{"href":"#id-15","text":"4 ","element":"a"},{"text":"or ","element":"span"},{"href":"#id-16","text":"5 ","element":"a"},{"text":"to allow a comparison between the MAE and the MAPE. For instance, using the VCdimension results, the right hand side of equation ","element":"span"},{"href":"#id-17","text":"(19) ","element":"a"},{"text":"is bounded above by","element":"span"}],[{"text":"24","element":"span"},{"style":{"height":38.4},"width":705.27,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/18-5.png","element":"img","alt":"�2e(∥G∥∞ + YU)pϵp log 3e(∥G∥∞ + YU))pϵp","inline":true}],[{"style":{"width":"4%"},"width":64,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/18-6.png","element":"img"}],[{"text":"while the right hand side of equation ","element":"span"},{"href":"#id-14","text":"(20) ","element":"a"},{"text":"is bounded above by","element":"span"}],[{"style":{"width":"97%"},"width":1348,"height":180,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/18-7.png","element":"img"}],[{"text":"In order to obtain almost sure uniform convergence of ","element":"span"},{"style":{"height":15.6},"width":449.98,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/18-8.png","element":"img","alt":"�Ll(g, Dn) to Ll(g) over G,","inline":true,"padRight":true},{"text":"those right hand side quantities must be summable (this allows one to apply the Borel-Cantelli Lemma). For fixed values of the VC dimension, of ","element":"span"},{"style":{"height":16},"width":252.3,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-0.png","element":"img","alt":" ∥G∥∞, YL and","inline":true},{"style":{"height":13.19},"width":47.14,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-1.png","element":"img","alt":"YU","inline":true,"padRight":true},{"text":"this is always the case. If those quantities are allowed to depend on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":", then","element":"span"}],[{"style":{"width":"58%"},"width":1443,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-2.png","element":"img"}],[{"text":"play symmetric roles for the MAE and the MAPE. Indeed for the MAE, a fast growth of ","element":"span"},{"style":{"height":13.19},"width":47.14,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-3.png","element":"img","alt":" YU","inline":true,"padRight":true},{"text":"with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"might prevent the bounds to be summable. For instance, if ","element":"span"},{"style":{"height":13.19},"width":47.14,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-4.png","element":"img","alt":" YU","inline":true,"padRight":true},{"text":"grows faster than ","element":"span"},{"style":{"height":16},"width":57.21,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-5.png","element":"img","alt":"√n","inline":true},{"text":", then ","element":"span"},{"style":{"height":19.37},"width":191.66,"height":48.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-6.png","element":"img","alt":"n(∥G∥∞+YU)2","inline":true,"padRight":true},{"text":"does not converges to zero and ","element":"span"},{"text":"the series is not summable. Similarly, if ","element":"span"},{"style":{"height":13.19},"width":45.14,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-7.png","element":"img","alt":" YL","inline":true,"padRight":true},{"text":"converges too quickly to zero, for","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-8.png","element":"img"}],[{"text":"summable. The following Section goes into more details about those conditions in the case of the MAPE.","element":"span"}]]},{"heading":"5. Consistency and the MAPE","paragraphs":[[{"text":"We show in this section that one can build on the ERM principle a strongly","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-9.png","element":"img"}],[{"text":"almost universal).","element":"span"}],[{"id":"id-31","style":{"fontWeight":"bold"},"text":"Theorem 2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z ","element":"span"},{"text":"= (","element":"span"},{"style":{"fontStyle":"italic"},"text":"X, Y ","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be a random pair taking values in ","element":"span"},{"style":{"height":13.38},"width":126.62,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-10.png","element":"img","alt":" Rd × R","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that ","element":"span"},{"style":{"height":16},"width":207.83,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-11.png","element":"img","alt":" |Y | ≥ YL >","inline":true,"padRight":true},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"almost surely (","element":"span"},{"style":{"height":13.19},"width":45.14,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-12.png","element":"img","alt":"YL","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a fixed real number). Let ","element":"span"},{"text":"(","element":"span"},{"style":{"height":16.79},"width":173.09,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-13.png","element":"img","alt":"Zn)n≥1 =","inline":true,"padRight":true},{"text":"(","element":"span"},{"style":{"height":16.79},"width":193.35,"height":41.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-14.png","element":"img","alt":"Xn, Yn)n≥1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a series of independent copies of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Z","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"style":{"width":"59%"},"width":1449,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-15.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"such that:","element":"span"}],[{"text":"1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"G","element":"span"},{"style":{"height":14.79},"width":181.2,"height":36.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-16.png","element":"img","alt":"n ⊂ Gn+1;","inline":true}],[{"text":"2. ","element":"span"},{"style":{"height":19.54},"width":153.67,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-17.png","element":"img","alt":"�n≥1 Gn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is dense in the set of ","element":"span"},{"style":{"height":17.38},"width":84.8,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-18.png","element":"img","alt":" L1(µ","inline":true},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"functions from ","element":"span"},{"style":{"height":13.38},"width":45.78,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-19.png","element":"img","alt":" Rd","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"to ","element":"span"},{"text":"R ","element":"span"},{"style":{"fontStyle":"italic"},"text":"for any","element":"span"}],[{"style":{"width":"40%"},"width":981,"height":184,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-20.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"If in addition","element":"span"}],[{"style":{"width":"39%"},"width":543,"height":83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-21.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"and there is ","element":"span"},{"style":{"height":12.4},"width":266.15,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-22.png","element":"img","alt":" δ > 0 such that","inline":true}],[{"style":{"width":"23%"},"width":327,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/19-23.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"then ","element":"span"},{"style":{"height":16.79},"width":399.81,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-0.png","element":"img","alt":" LMAP E(�glMAP E,Gn,Dn)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"converges almost surely to ","element":"span"},{"style":{"height":15.38},"width":147.97,"height":38.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-1.png","element":"img","alt":" L∗MAP E.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"We use the standard decomposition between estimation error and approx-","element":"span"}],[{"text":"imation error. More precisely, for ","element":"span"},{"style":{"height":14},"width":100.14,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-2.png","element":"img","alt":" g ∈ G","inline":true},{"text":", a class of functions,","element":"span"}],[{"style":{"width":"90%"},"width":1244,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-3.png","element":"img"}],[{"text":"We handle first the approximation error. As pointed out in Section ","element":"span"},{"text":"2, ","element":"span"},{"style":{"height":15.6},"width":228.85,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-4.png","element":"img","alt":" LMAP E(g) <","inline":true},{"style":{"height":7.2},"width":40,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-5.png","element":"img","alt":"∞","inline":true,"padRight":true},{"text":"implies that ","element":"span"},{"style":{"height":17.38},"width":181.2,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-6.png","element":"img","alt":" g ∈ L1(PX","inline":true},{"text":"). Therefore we can assume there is a series (","element":"span"},{"style":{"height":16.79},"width":113,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-7.png","element":"img","alt":"g∗k)k≥1","inline":true}],[{"text":"of functions from ","element":"span"},{"style":{"height":17.38},"width":111.85,"height":43.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-8.png","element":"img","alt":" L1(PX","inline":true},{"text":") such that","element":"span"}],[{"style":{"width":"35%"},"width":482,"height":82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-9.png","element":"img"}],[{"text":"by definition of ","element":"span"},{"style":{"height":15.38},"width":133.02,"height":38.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-10.png","element":"img","alt":" L∗MAP E ","inline":true,"padRight":true},{"text":"as an infimum.","element":"span"}],[{"text":"Let us consider two models ","element":"span"},{"style":{"height":14},"width":162.66,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-11.png","element":"img","alt":" g1 and g2","inline":true},{"text":". For arbitrary ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y","element":"span"},{"text":", we have","element":"span"}],[{"style":{"width":"52%"},"width":723,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-12.png","element":"img"}],[{"text":"and thus","element":"span"}],[{"style":{"width":"52%"},"width":723,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-13.png","element":"img"}],[{"text":"and therefore","element":"span"}],[{"style":{"width":"98%"},"width":1354,"height":480,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-14.png","element":"img"}],[{"text":"and thus","element":"span"}],[{"style":{"width":"71%"},"width":981,"height":88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/20-15.png","element":"img"}],[{"style":{"width":"59%"},"width":1451,"height":486,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-0.png","element":"img"}],[{"text":"This shows that lim","element":"span"},{"style":{"height":17.77},"width":481.76,"height":44.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-1.png","element":"img","alt":"n→∞ L∗MAP E,Gn = L∗MAP E.","inline":true}],[{"text":"The estimation error is handled via the complexity control techniques studied","element":"span"}],[{"text":"in the previous Section. Indeed, according to Theorem ","element":"span"},{"href":"#id-18","text":"1, ","element":"a"},{"text":"we have (for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"= 1)","element":"span"}],[{"style":{"width":"69%"},"width":957,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-2.png","element":"img"}],[{"text":"with","element":"span"}],[{"style":{"width":"90%"},"width":1239,"height":172,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-3.png","element":"img"}],[{"style":{"height":38.4},"width":894.88,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-4.png","element":"img","alt":"D(n, ϵ) ≤ 24�2e(1 + ∥Gn∥∞)pϵYL log 3e(1 + ∥Gn∥∞))ϵYL","inline":true}],[{"text":"Using the fact that log(","element":"span"},{"style":{"height":16},"width":276.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-5.png","element":"img","alt":"x) ≤ x, we have","inline":true}],[{"style":{"width":"64%"},"width":892,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-6.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"105%"},"width":1459,"height":238,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-7.png","element":"img"}],[{"text":"As lim","element":"span"},{"style":{"height":24.04},"width":488.22,"height":60.09,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-8.png","element":"img","alt":"n→∞Vn∥Gn∥2∞ log ∥Gn∥∞n = 0,","inline":true}],[{"style":{"width":"55%"},"width":765,"height":102,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-9.png","element":"img"}],[{"text":"As lim","element":"span"},{"style":{"height":25.77},"width":311.55,"height":64.42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-10.png","element":"img","alt":"n→∞ n1−δ∥Gn∥2∞ = ∞,","inline":true}],[{"style":{"width":"32%"},"width":444,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/21-11.png","element":"img"}],[{"text":"Therefore, for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"sufficiently large, ","element":"span"},{"style":{"height":16},"width":111.24,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-0.png","element":"img","alt":" D(n, ϵ","inline":true},{"text":") is dominated by a term of the form","element":"span"}],[{"style":{"width":"16%"},"width":228,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-1.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"height":9.6},"width":72.12,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-2.png","element":"img","alt":" α >","inline":true,"padRight":true},{"text":"0 and ","element":"span"},{"style":{"height":14.4},"width":71.13,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-3.png","element":"img","alt":" β >","inline":true,"padRight":true},{"text":"0 (both depending on ","element":"span"},{"style":{"height":0},"width":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-4.png","element":"img","alt":" ϵ","inline":true},{"text":"). This allows to conclude that ","element":"span"},{"style":{"height":19.18},"width":327.22,"height":47.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-5.png","element":"img","alt":"�n≥1 D(n, ϵ) < ∞","inline":true},{"text":". Then the Borel-Cantelli theorem implies that","element":"span"}],[{"style":{"width":"66%"},"width":915,"height":87,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-6.png","element":"img"}],[{"text":"The final part of the estimation error is handled in a traditional way. Let ","element":"span"},{"style":{"height":11.6},"width":100.03,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-7.png","element":"img","alt":" ϵ > 0.","inline":true}],[{"text":"There is ","element":"span"},{"style":{"height":14},"width":474.86,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-8.png","element":"img","alt":" N such that n ≥ N implies","inline":true}],[{"style":{"width":"66%"},"width":918,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-9.png","element":"img"}],[{"text":"Then ","element":"span"},{"style":{"height":16},"width":569.04,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-10.png","element":"img","alt":"�LMAP E(g, Dn) ≤ LMAP E(g) + ϵ","inline":true},{"text":". By definition","element":"span"}],[{"style":{"width":"57%"},"width":792,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-11.png","element":"img"}],[{"text":"and thus for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"56%"},"width":784,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-12.png","element":"img"}],[{"text":"By taking the infimum on ","element":"span"},{"style":{"height":13.19},"width":51.33,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-13.png","element":"img","alt":" Gn","inline":true},{"text":", we have therefore","element":"span"}],[{"style":{"width":"57%"},"width":787,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-14.png","element":"img"}],[{"text":"Applying again the hypothesis,","element":"span"}],[{"style":{"width":"72%"},"width":996,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-15.png","element":"img"}],[{"text":"and therefore","element":"span"}],[{"style":{"width":"53%"},"width":734,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-16.png","element":"img"}],[{"text":"As a consequence","element":"span"}],[{"style":{"width":"69%"},"width":963,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/22-17.png","element":"img"}],[{"text":"The combination of this result with the approximation result allows us to conclude.","element":"span"}],[{"text":"Notice that several aspects of this proof are specific to the MAPE. This is the","element":"span"}],[{"style":{"width":"59%"},"width":1451,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/23-0.png","element":"img"}],[{"text":"This is also the case of the estimation part which uses results from Section ","element":"span"},{"text":"4 ","element":"span"},{"text":"that are specific to the MAPE.","element":"span"}]]},{"heading":"6. MAPE kernel regression","paragraphs":[[{"text":"The previous Sections have been dedicated to the analysis of the theoretical","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/23-1.png","element":"img"}],[{"text":"MAPE regression and we compare it to MSE/MAE regression.","element":"span"}],[{"text":"On a practical point of view, building a MAPE regression model consists in minimizing the empirical estimate of the MAPE over a class of models ","element":"span"},{"style":{"height":14},"width":148.75,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/23-2.png","element":"img","alt":" Gn, that","inline":true}],[{"text":"is to solve","element":"span"}],[{"style":{"width":"54%"},"width":752,"height":102,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/23-3.png","element":"img"}],[{"text":"where the (","element":"span"},{"style":{"height":16.39},"width":198.72,"height":40.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/23-4.png","element":"img","alt":"xi, yi)1≤i≤n","inline":true,"padRight":true},{"text":"are the realizations of the random variables (","element":"span"},{"style":{"height":16.39},"width":225.25,"height":40.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/23-5.png","element":"img","alt":"Xi, Yi)1≤i≤n.","inline":true}],[{"text":"Optimization wise, this is simply a particular case of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"median regression ","element":"span"},{"text":"(which is in turn a particular case of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"quantile regression","element":"span"},{"text":"). Indeed, the quotient","element":"span"}],[{"style":{"width":"59%"},"width":1446,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/23-6.png","element":"img"}],[{"text":"implementation that supports instance weights can be used to find the optimal model. This is for example the case of ","element":"span"},{"style":{"fontFamily":"monospace"},"text":"quantreg ","element":"span"},{"text":"R package [","element":"span"},{"href":"#id-19","text":"5","element":"a"},{"text":"], among others. Notice that when ","element":"span"},{"style":{"height":13.19},"width":51.33,"height":32.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/23-7.png","element":"img","alt":" Gn","inline":true,"padRight":true},{"text":"corresponds to linear models, the optimization problem is a simple ","element":"span"},{"style":{"fontStyle":"italic"},"text":"linear programming ","element":"span"},{"text":"problem that can be solved by e.g. interior point","element":"span"}],[{"style":{"width":"11%"},"width":282,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/23-8.png","element":"img"}],[{"text":"For some complex models, instance weighting is not immediate. ","element":"span"},{"text":"As an example of MAPE-ing a classical model we show in this section how to turn kernel quantile regression into kernel MAPE regression. Notice that kernel regression introduces regularization and thus is not a direct form of ERM.","element":"span"}],[{"style":{"width":"58%"},"width":1431,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/23-9.png","element":"img"}],[{"id":"id-28","style":{"fontStyle":"italic"},"text":"6.1. From quantile regression to MAPE regression","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"6.1.1. Quantile regression","element":"span"}],[{"text":"Let us assume given a Reproducing Kernel Hilbert Space (RKHS), ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":", of functions from ","element":"span"},{"style":{"height":13.38},"width":137.92,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-0.png","element":"img","alt":" Rd to R","inline":true,"padRight":true},{"text":"(notice that ","element":"span"},{"style":{"height":13.38},"width":45.79,"height":33.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-1.png","element":"img","alt":" Rd ","inline":true,"padRight":true},{"text":"could be replaced by an arbitrary space","element":"span"}],[{"style":{"width":"58%"},"width":1444,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-2.png","element":"img"}],[{"text":"and ","element":"span"},{"style":{"height":14},"width":82.39,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-3.png","element":"img","alt":" H, φ","inline":true},{"text":". As always, we have ","element":"span"},{"style":{"height":16},"width":406.49,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-4.png","element":"img","alt":" k(x, x′) = ⟨φ(x), φ(x′)⟩.","inline":true}],[{"text":"The standard way of building regression models based on a RKHS consists in optimizing a regularized version of an empirical loss, i.e., in solving an","element":"span"}],[{"text":"optimization problem of the form","element":"span"}],[{"style":{"width":"73%"},"width":1009,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-5.png","element":"img"}],[{"text":"Notice that the reproducing property of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"implies that there is ","element":"span"},{"style":{"height":12},"width":283.76,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-6.png","element":"img","alt":" w ∈ H such that","inline":true}],[{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"style":{"height":16},"width":235.45,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-7.png","element":"img","alt":") = ⟨w, φ(x)⟩.","inline":true}],[{"text":"In particular, quantile regression can be kernelized via an appropriate choice for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"l","element":"span"},{"text":". Indeed, let ","element":"span"},{"style":{"height":9.6},"width":60,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-8.png","element":"img","alt":" τ ∈","inline":true,"padRight":true},{"text":"[0; 1] and let ","element":"span"},{"style":{"height":14},"width":426.78,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-9.png","element":"img","alt":" ρτ be the check-function","inline":true},{"text":", introduced in ","element":"span"},{"href":"#id-20","text":"[7]","element":"a"},{"text":":","element":"span"}],[{"id":"id-23","style":{"width":"39%"},"width":537,"height":145,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-10.png","element":"img"}],[{"text":"The check-function is also called the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"pinball loss","element":"span"},{"text":". Then, the kernel quantile optimization problem, treated in ","element":"span"},{"href":"#id-21","text":"[8, ","element":"a"},{"href":"#id-22","text":"9]","element":"a"},{"text":", is defined by:","element":"span"}],[{"style":{"width":"74%"},"width":1031,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-11.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":11.6},"width":65.31,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-12.png","element":"img","alt":" λ >","inline":true,"padRight":true},{"text":"0 handles the trade-off between the data fitting term and the regular-","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-13.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"6.2. MAPE primal problem","element":"span"}],[{"id":"id-24","text":"To consider the case of the MAPE, one can change the equation ","element":"span"},{"href":"#id-23","text":"(24) ","element":"a"},{"text":"to ","element":"span"},{"href":"#id-24","text":"(25)","element":"a"},{"text":":","element":"span"}],[{"style":{"width":"73%"},"width":1005,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/24-14.png","element":"img"}],[{"text":"Notice that for the sake of generality, we do not specify the value of ","element":"span"},{"style":{"height":6.8},"width":21,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/25-0.png","element":"img","alt":" τ","inline":true,"padRight":true},{"text":"in this derivation: thus equation ","element":"span"},{"href":"#id-24","text":"(25) ","element":"a"},{"text":"can be seen as a form of “relative quantile”.","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/25-1.png","element":"img"}],[{"text":"standard MAPE, that is to ","element":"span"},{"style":{"height":19.37},"width":108.19,"height":48.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/25-2.png","element":"img","alt":" τ = 12","inline":true},{"text":". The practical relevance of the “relative ","element":"span"},{"text":"quantile” remains to be assessed.","element":"span"}],[{"text":"Using the standard way of handling absolute values and using ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":") = ","element":"span"},{"style":{"height":16},"width":156.94,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/25-3.png","element":"img","alt":"⟨φ(x), w⟩","inline":true},{"text":", we can rewrite the regularization problem ","element":"span"},{"href":"#id-24","text":"(25) ","element":"a"},{"text":"as a (primal) op-","element":"span"}],[{"style":{"width":"58%"},"width":1442,"height":486,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/25-4.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.37},"width":143.63,"height":48.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/25-5.png","element":"img","alt":" C = 1nλ.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"6.2.1. MAPE dual problem","element":"span"}],[{"text":"Let us denote ","element":"span"},{"style":{"height":16},"width":241.42,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/25-6.png","element":"img","alt":" θ = (w, b, ξ, ξ⋆","inline":true},{"text":") the vector regrouping all the variables of the primal problem. We denote in addition:","element":"span"}],[{"style":{"width":"60%"},"width":832,"height":422,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/25-7.png","element":"img"}],[{"style":{"width":"37%"},"width":930,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/26-0.png","element":"img"}],[{"id":"id-25","text":"max","element":"span"}],[{"style":{"width":"104%"},"width":1439,"height":162,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/26-1.png","element":"img"}],[{"text":"where the ","element":"span"},{"style":{"height":11.59},"width":60.54,"height":28.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/26-2.png","element":"img","alt":" ui,k","inline":true,"padRight":true},{"text":"are the Lagrange multipliers. Some algebraic manipulations show that problem ","element":"span"},{"href":"#id-25","text":"(27) ","element":"a"},{"text":"is equivalent to problem ","element":"span"},{"href":"#id-26","text":"(28)","element":"a"},{"text":":","element":"span"}],[{"id":"id-26","style":{"width":"95%"},"width":1314,"height":463,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/26-3.png","element":"img"}],[{"text":"We can simplify the problem by introducing a new parametrisation via the variables ","element":"span"},{"style":{"height":11.59},"width":261.7,"height":28.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/26-4.png","element":"img","alt":" αi = ui,1 − ui,2","inline":true},{"text":". Then the value of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"w ","element":"span"},{"text":"is obtained from constraint ","element":"span"},{"href":"#id-26","text":"(29) ","element":"a"},{"text":"as ","element":"span"},{"style":{"height":17.6},"width":297.47,"height":43.99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/26-5.png","element":"img","alt":" w = �ni=1 αiφ(xi","inline":true},{"text":"). Constraints ","element":"span"},{"href":"#id-26","text":"(30) ","element":"a"},{"text":"can be rewritten into 1","element":"span"},{"style":{"height":16.59},"width":274.47,"height":41.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/26-6.png","element":"img","alt":"T α = 0. Taking","inline":true}],[{"text":"those equations into account, the objective function becomes","element":"span"}],[{"style":{"width":"93%"},"width":1290,"height":243,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/26-7.png","element":"img"}],[{"text":"Using constraints ","element":"span"},{"href":"#id-26","text":"(31) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-26","text":"(32)","element":"a"},{"text":", the last two terms simplify as follows:","element":"span"}],[{"style":{"width":"85%"},"width":1173,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/26-8.png","element":"img"}],[{"text":"and thus the objective function is given by","element":"span"}],[{"style":{"width":"93%"},"width":1291,"height":243,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/26-9.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":16.39},"width":241.14,"height":40.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-0.png","element":"img","alt":" Kij = k(xi, xj","inline":true},{"text":") is the kernel matrix. This shows that the objective function can be rewritten so as to depend only on the new variables ","element":"span"},{"style":{"height":9.19},"width":36.49,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-1.png","element":"img","alt":" αi","inline":true},{"text":". The last step of","element":"span"}],[{"style":{"width":"59%"},"width":1451,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-2.png","element":"img"}],[{"text":"The cases of constraints ","element":"span"},{"href":"#id-26","text":"(29) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-26","text":"(30) ","element":"a"},{"text":"have already been handled.","element":"span"}],[{"text":"Notice that given an arbitrary ","element":"span"},{"style":{"height":9.19},"width":36.49,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-3.png","element":"img","alt":" αi","inline":true},{"text":", there is always ","element":"span"},{"style":{"height":15.59},"width":448.22,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-4.png","element":"img","alt":" ui,1 ≥ 0 and ui,2 ≥ 0 such","inline":true,"padRight":true},{"text":"that ","element":"span"},{"style":{"height":11.59},"width":257.38,"height":28.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-5.png","element":"img","alt":" αi = ui,1 − ui,2","inline":true},{"text":". However, constraints ","element":"span"},{"href":"#id-26","text":"(31) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-26","text":"(32) ","element":"a"},{"text":"combined with ","element":"span"},{"style":{"height":15.19},"width":134.56,"height":37.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-6.png","element":"img","alt":" ui,3 ≥ 0","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":15.19},"width":104.92,"height":37.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-7.png","element":"img","alt":" ui,4 ≥","inline":true,"padRight":true},{"text":"0 show that ","element":"span"},{"style":{"height":11.59},"width":59.55,"height":28.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-8.png","element":"img","alt":" ui,1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":11.59},"width":59.55,"height":28.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-9.png","element":"img","alt":" ui,2","inline":true,"padRight":true},{"text":"(and thus ","element":"span"},{"style":{"height":9.19},"width":36.49,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-10.png","element":"img","alt":" αi","inline":true},{"text":") cannot be arbitrary, as we","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-11.png","element":"img"}],[{"style":{"height":24.43},"width":978.53,"height":61.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-12.png","element":"img","alt":"αi ≤ Cτ|yi|2 . As ui,1 ≥ 0, −αi ≤ ui,2 and thus αi ≥ C(1−τ)|yi|2","inline":true,"padRight":true},{"text":". Conversely, it is easy ","element":"span"},{"text":"to see that if ","element":"span"},{"style":{"height":9.19},"width":36.49,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-13.png","element":"img","alt":" αi","inline":true,"padRight":true},{"text":"satisfies the constraints ","element":"span"},{"style":{"height":24.43},"width":324.73,"height":61.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-14.png","element":"img","alt":"C(τ−1)|yi|2 ≤ αi ≤ Cτ|yi|2","inline":true,"padRight":true},{"text":", then there is ","element":"span"},{"style":{"height":11.59},"width":60.54,"height":28.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-15.png","element":"img","alt":" ui,k","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . , ","element":"span"},{"text":"4 such that ","element":"span"},{"style":{"height":11.59},"width":269.14,"height":28.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-16.png","element":"img","alt":" αi = ui,1 − ui,2","inline":true,"padRight":true},{"text":"and such that the constraints ","element":"span"},{"href":"#id-26","text":"(31)","element":"a"},{"text":", ","element":"span"},{"href":"#id-26","text":"(32) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-26","text":"(33) ","element":"a"},{"text":"are satisfied (take ","element":"span"},{"style":{"height":16.79},"width":740.51,"height":41.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-17.png","element":"img","alt":" ui,1 = max(0, αi) and ui,2 = max(0, −αi)).","inline":true}],[{"id":"id-27","style":{"width":"58%"},"width":1443,"height":330,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-18.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"6.2.2. Comparaison to the quantile regression","element":"span"}],[{"text":"In the case of quantile regression, [","element":"span"},{"href":"#id-21","text":"8","element":"a"},{"text":"] shows that the dual problem is equivalent","element":"span"}],[{"text":"to","element":"span"}],[{"style":{"width":"39%"},"width":537,"height":212,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-19.png","element":"img"}],[{"text":"In comparison to problem ","element":"span"},{"href":"#id-27","text":"(34)","element":"a"},{"text":", one can remark that the modification of the","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-20.png","element":"img"}],[{"text":"primal optimization problem is equivalent to changing the set of optimization in the dual optimization problem. More precisely, it is equivalent to reducing (resp. increasing) the “size” of the optimization set of ","element":"span"},{"style":{"height":16},"width":455.54,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/27-21.png","element":"img","alt":" αi if yi > 1 (resp. yi < 1).","inline":true}],[{"text":"Thus, the smaller is ","element":"span"},{"style":{"height":10},"width":30.54,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-0.png","element":"img","alt":" yi","inline":true},{"text":", the larger is the optimization set of ","element":"span"},{"style":{"height":9.19},"width":36.49,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-1.png","element":"img","alt":" αi","inline":true},{"text":". This permits","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-2.png","element":"img"}],[{"text":"error is potentially bigger). Moreover, by choosing a very large value of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"(or ","element":"span"},{"style":{"height":11.2},"width":133.33,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-3.png","element":"img","alt":"C → ∞","inline":true},{"text":"), one can ensure the same optimal value of each ","element":"span"},{"style":{"height":9.19},"width":36.49,"height":22.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-4.png","element":"img","alt":" αi","inline":true,"padRight":true},{"text":"in MAE and MAPE dual problems. This surprising fact can be explained by noticing that a very large value of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"corresponds to a very small value of ","element":"span"},{"style":{"height":15.6},"width":467.78,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-5.png","element":"img","alt":" λ (or λ → 0). When λ goes","inline":true}],[{"style":{"width":"59%"},"width":1445,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-6.png","element":"img"}],[{"text":"potential overfitting. When this overfitting appears, ","element":"span"},{"style":{"height":15.6},"width":174.04,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-7.png","element":"img","alt":" f(xi) ≃ yi","inline":true,"padRight":true},{"text":"regardless of the loss function and thus the different loss functions are equivalent.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"6.3. A simulation study","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"6.3.1. Generation of observations","element":"span"}],[{"style":{"width":"59%"},"width":1446,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-8.png","element":"img"}],[{"text":"described in section ","element":"span"},{"href":"#id-28","text":"6.1 ","element":"a"},{"text":"on simulated data, and we compare the results to the ones obtained by kernel median regression. Experiments have been realized using a Gaussian kernel.","element":"span"}],[{"text":"As in [","element":"span"},{"href":"#id-21","text":"8","element":"a"},{"text":"], we have simulated data according to the sinus cardinal function,","element":"span"}],[{"text":"defined by","element":"span"}],[{"style":{"width":"24%"},"width":338,"height":82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-9.png","element":"img"}],[{"text":"However, to illustrate the variation of the prediction according the proximity to zero, we add a parameter ","element":"span"},{"style":{"fontStyle":"italic"},"text":"a ","element":"span"},{"text":"and we define the translated sinus cardinal function by:","element":"span"}],[{"style":{"width":"32%"},"width":447,"height":82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-10.png","element":"img"}],[{"text":"For experiments, we have generated 1000 points to constitute a training set, and 1000 other points to constitute a test set. As in [","element":"span"},{"href":"#id-21","text":"8","element":"a"},{"text":"], the generation process is the following:","element":"span"}],[{"style":{"width":"28%"},"width":390,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-11.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"height":28.8},"width":982.8,"height":72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/28-12.png","element":"img","alt":" X ∼ U([−∞; ∞]) and ϵ(X) ∼ N�0, (0.1 · exp(1 − X))2�","inline":true}],[{"style":{"width":"59%"},"width":1445,"height":63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/29-0.png","element":"img"}],[{"text":"estimation, we have computed ","element":"span"},{"style":{"height":15.59},"width":370.86,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/29-1.png","element":"img","alt":"�fMAP E,a and �fMAE,a","inline":true,"padRight":true},{"text":"for several values of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"a","element":"span"},{"text":". The value of the regularization parameter ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"is chosen via a 5-fold cross-validation.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"6.3.2. Results","element":"span"}],[{"style":{"width":"99%"},"width":1369,"height":907,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/29-2.png","element":"img"}],[{"id":"id-29","text":"Table 1: Summary of the experimental results: for each value of the translation parameter ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"a","element":"figcaption","subtype":"caption"},{"text":", the table gives the MAPE of ","element":"figcaption","subtype":"caption"},{"style":{"height":12.94},"width":334.24,"height":32.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/29-3.png","element":"img","alt":"�fMAP E,a and �fMAE,a","inline":true,"padRight":true},{"text":"estimated on the test set. The table also reports the value of the regularization parameter ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"C ","element":"figcaption","subtype":"caption"},{"text":"for both loss function.","element":"figcaption","subtype":"caption"}],[{"text":"Results of experiments are described in the table ","element":"span"},{"href":"#id-29","text":"1. ","element":"a"},{"text":"As expected, in most","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/29-4.png","element":"img"}],[{"text":"especially the case when values of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":"are close to zero.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"6.3.3. Graphical illustration","element":"span"}],[{"text":"Some graphical representations of ","element":"span"},{"style":{"height":15.59},"width":367.11,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/29-5.png","element":"img","alt":"�fMAP E,a and �fMAE,a","inline":true,"padRight":true},{"text":"are given on Figure ","element":"span"},{"href":"#id-30","text":"2. ","element":"a"},{"text":"This Figure illustrates several interesting points:","element":"span"}],[{"style":{"width":"59%"},"width":1449,"height":112,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/29-6.png","element":"img"}],[{"style":{"width":"92%"},"width":1279,"height":176,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/30-0.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"Up to translation, ","element":"span"},{"style":{"height":15.59},"width":128.23,"height":38.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/30-1.png","element":"img","alt":"�fMAE,a","inline":true,"padRight":true},{"text":"looks roughly the same for each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"a","element":"span"},{"text":", whereas","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":255,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/30-2.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"Red curves are closer to 0 than blue curves. One can actually show that,","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/30-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"The red curve seems to converge toward the blue one for high values of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"a","element":"span"},{"text":".","element":"span"}]]},{"heading":"7. Conclusion","paragraphs":[[{"text":"We have shown that learning under the Mean Absolute Percentage Error","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/30-4.png","element":"img"}],[{"text":"particularly, we have shown the existence of an optimal model regarding to the MAPE and the consistency of the Empirical Risk Minimization. Experimental results on simulated data illustrate the efficiency of our approach to minimize the MAPE through kernel regressions, what also ensures its efficiency in application","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/30-5.png","element":"img"}],[{"text":"is positive by design and remains quite far away from zero, e.g. in price prediction for expensive goods). Two open theoretical questions can be formulated from this work. A first question is whether the lower bound hypothesis on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"can be lifted: in the case of MSE based regression, the upper bound hypothesis on","element":"span"}],[{"style":{"width":"59%"},"width":1445,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/30-6.png","element":"img"}],[{"text":"cannot be adapted immediately to the MAPE because of the importance of the lower bound on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"in the approximation part of Theorem ","element":"span"},{"href":"#id-31","text":"2. ","element":"a"},{"text":"A second question is whether the case of empirical regularized risk minimization can be shown to be consistent in the case of the MAPE.","element":"span"}],[{"id":"id-30","style":{"width":"96%"},"width":1325,"height":1424,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/31-0.png","element":"img"}],[{"text":"Figure 2: Representation of estimation: ","element":"figcaption","subtype":"caption"},{"style":{"height":12.94},"width":116.61,"height":32.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/31-1.png","element":"img","alt":"�fMAE,a","inline":true,"padRight":true},{"text":"in blue and ","element":"figcaption","subtype":"caption"},{"style":{"height":12.94},"width":248.08,"height":32.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/31-2.png","element":"img","alt":"�fMAP E,a in red.","inline":true}]]},{"heading":"Acknowledgment390","paragraphs":[[{"text":"The authors thank the anonymous reviewers for their valuable comments that helped improving this paper.","element":"span"}]]},{"heading":"References References","paragraphs":[[{"style":{"width":"59%"},"width":1448,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/32-0.png","element":"img"}],[{"text":"Mathematical Statistics 35 (1) (1964) 73–101.","element":"span"}],[{"id":"id-1","text":"[2] ","element":"span"},{"text":"J. S. Armstrong, F. Collopy, Error measures for generalizing about forecasting","element":"span"}],[{"style":{"width":"95%"},"width":1313,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/32-1.png","element":"img"}],[{"text":"(1992) 69 – 80.","element":"span"}],[{"id":"id-2","style":{"width":"59%"},"width":1449,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/32-2.png","element":"img"}],[{"text":"Nonparametric Regression, Springer, New York, 2002.","element":"span"}],[{"id":"id-6","text":"[4] ","element":"span"},{"text":"M. Anthony, P. L. Bartlett, Neural Network Learning: Theoretical Founda-","element":"span"}],[{"text":"tions, Cambridge University Press, 1999.","element":"span"}],[{"id":"id-19","text":"[5] R. Koenker, quantreg: Quantile regression. r package version 5.05 (2013).","element":"span"}],[{"style":{"width":"59%"},"width":1449,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/32-3.png","element":"img"}],[{"text":"2004.","element":"span"}],[{"id":"id-20","text":"[7] ","element":"span"},{"text":"R. Koenker, G. Bassett Jr, Regression quantiles, Econometrica: journal of","element":"span"}],[{"text":"the Econometric Society (1978) 33–50.","element":"span"}],[{"id":"id-21","text":"[8] ","element":"span"},{"text":"I. Takeuchi, Q. V. Le, T. D. Sears, A. J. Smola, Nonparametric quantile","element":"span"}],[{"style":{"width":"59%"},"width":1450,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.02541/images/32-4.png","element":"img"}],[{"id":"id-22","text":"[9] ","element":"span"},{"text":"Y. Li, Y. Liu, J. Zhu, Quantile regression in reproducing kernel hilbert spaces,","element":"span"}],[{"text":"Journal of the American Statistical Association 102 (477) (2007) 255–268.","element":"span"}]]}],"_version":"3.3.4"},"paperNode":"$28:props:children:props:children:0:props:product"}]]