1b:["$","$L29",null,{"isWhiteLabelled":false,"children":["$","$Lb",null,{"pt":{"compact":0,"expanded":3},"children":[["$","$L2a",null,{"noStar":true,"publisher":true,"task":true,"params":true,"size":"xl","product":{"id":"eyJwYXBlcklEIjoiMjAwMi4wNjUwNSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","updated":"2020-02-16T04:58:43.000Z","paperID":"2002.06505","published":"2020-02-16T04:58:43.000Z","authors":"[\"Kai Fong Ernest Chong\"]","title":"A closer look at the approximation capabilities of neural networks","scoreTrending":null,"summary":"$2b","lastCheckedForCode":"2022-09-01T11:26:26.459Z","links":[{"id":"eyJ1cmwiOiJodHRwczovL3BhcGVyc3dpdGhjb2RlLmNvbS9wYXBlci9hLWNsb3Nlci1sb29rLWF0LXRoZS1hcHByb3hpbWF0aW9uLTEifQ==","type":"pwc","url":"https://paperswithcode.com/paper/a-closer-look-at-the-approximation-1","data":null}],"reposConnection":{"edges":[]},"models":[],"tags":[],"summaries":[],"emailsConnection":{"edges":[{"author":"kai fong ernest chong","node":{"id":"eyJhZGRyZXNzIjoiY2hvbmdAc3V0ZC5lZHUuc2cifQ==","address":"chong@sutd.edu.sg","name":"Chong","avatar":null,"linkedin":null,"bio":null,"site":null,"override":null,"membership":[{"name":"Singapore University of Technology and Design (SUTD)"}],"paper":[{"modelsAggregate":{"count":0}},{"modelsAggregate":{"count":0}}],"github":[],"scholar":[{"thirdPartyID":"JewaBYEAAAAJ"}],"twitter":[],"location":[],"owner":[{"id":"eyJ1aWQiOiIxZGI1YTc2OC00MzFhLTQ4ZWMtYWIyZi1jMjAxZTUzMjRiN2MifQ==","name":"kai fong ernest chong","github":[],"email":[],"authored":[{"id":"eyJwYXBlcklEIjoiMjAwMi4wNjUwNSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2002.06505"},{"id":"eyJwYXBlcklEIjoiMjEwOC4wNTc2NSIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2108.05765"},{"id":"eyJwYXBlcklEIjoiMjIwNC4wNDY3NyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2204.04677"},{"id":"eyJwYXBlcklEIjoiMjEwNS4xMzg5MiIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2105.13892"},{"id":"eyJwYXBlcklEIjoiMjMwMy4xMTczMCIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","paperID":"2303.11730"},{"id":"eyJwYXBlcklEIjoiMjExOTYiLCJwdWJsaXNoZXIiOiJjdnByIn0=","publisher":"cvpr","paperID":"21196"}]}]}}]},"__typename":"paper","authorArray":["Kai Fong Ernest Chong"]}}],["$","$L18",null,{"container":true,"columns":100,"spacing":{"compact":0,"expanded":2,"large":3},"children":[["$","$L18",null,{"size":{"compact":100,"expanded":100,"large":68},"children":[["$","$7",null,{"children":["$","$L2c",null,{"publisher":"arxiv","paperID":"2002.06505","product":{"paper":"$1b:props:children:props:children:0:props:product","models":"$1b:props:children:props:children:0:props:product:models"},"isWhiteLabelled":false}]}],["$","$7",null,{"children":["$","$L2d",null,{"article":"$L2e","model":"$undefined"}]}]]}],["$","$L18",null,{"size":"grow","children":["$","$L2f",null,{}]}]]}],["$","$7",null,{"children":null}],[["$","audio",null,{"id":"tts"}],["$","$L30",null,{"paperID":"2002.06505","publisher":"arxiv","paperJSON":{"title":"A closer look at the approximation capabilities of neural networks","paperID":"2002.06505","avgLineHeight":11.07,"imgScale":4,"sections":[{"heading":"ABSTRACT","paragraphs":[[{"text":"The universal approximation theorem, in one of its most general versions, says that if we consider only continuous activation functions ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-0.png","element":"img","alt":" σ","inline":true},{"text":", then a standard feedforward neural network with one hidden layer is able to approximate any continuous multivariate function ","element":"span"},{"text":"f ","element":"span"},{"text":"to any given approximation threshold ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-1.png","element":"img","alt":" ε","inline":true},{"text":", if and only if ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-2.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is non-polynomial. In this paper, we give a direct algebraic proof of the theorem. Furthermore we shall explicitly quantify the number of hidden units required for approximation. Specifically, if ","element":"span"},{"style":{"height":13.2},"width":145.76,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-3.png","element":"img","alt":" X ⊆ Rn","inline":true,"padRight":true},{"text":"is compact, then a neural network with ","element":"span"},{"text":"n ","element":"span"},{"text":"input units, ","element":"span"},{"text":"m ","element":"span"},{"text":"output units, and a single hidden layer with","element":"span"},{"style":{"height":20.27},"width":96.72,"height":50.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-4.png","element":"img","alt":"�n+dd �","inline":true},{"text":"hidden units (independent of ","element":"span"},{"text":"m ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-5.png","element":"img","alt":" ε","inline":true},{"text":"), can uniformly approximate any polynomial function ","element":"span"},{"style":{"height":14},"width":223.84,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-6.png","element":"img","alt":"f : X → Rm","inline":true,"padRight":true},{"text":"whose total degree is at most ","element":"span"},{"text":"d ","element":"span"},{"text":"for each of its ","element":"span"},{"text":"m ","element":"span"},{"text":"coordinate functions. In the general case that ","element":"span"},{"text":"f ","element":"span"},{"text":"is any continuous function, we show there exists some ","element":"span"},{"style":{"height":16},"width":221.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-7.png","element":"img","alt":" N ∈ O(ε−n)","inline":true,"padRight":true},{"text":"(independent of ","element":"span"},{"text":"m","element":"span"},{"text":"), such that ","element":"span"},{"text":"N ","element":"span"},{"text":"hidden units would suffice to approximate ","element":"span"},{"text":"f","element":"span"},{"text":". We also show that this uniform approximation property (UAP) still holds even under seemingly strong conditions imposed on the weights. We highlight several consequences: (i) For any ","element":"span"},{"style":{"height":12.4},"width":100.64,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-8.png","element":"img","alt":" δ > 0","inline":true},{"text":", the UAP still holds if we restrict all non-bias weights ","element":"span"},{"text":"w ","element":"span"},{"text":"in the last layer to satisfy ","element":"span"},{"style":{"height":16},"width":129.4,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-9.png","element":"img","alt":" |w| < δ","inline":true},{"text":". (ii) There exists some ","element":"span"},{"style":{"height":11.6},"width":110.24,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-10.png","element":"img","alt":" λ > 0","inline":true,"padRight":true},{"text":"(depending only on ","element":"span"},{"text":"f ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-11.png","element":"img","alt":" σ","inline":true},{"text":"), such that the UAP still holds if we restrict all non-bias weights ","element":"span"},{"text":"w ","element":"span"},{"text":"in the first layer to satisfy ","element":"span"},{"style":{"height":16},"width":143.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-12.png","element":"img","alt":" |w| > λ","inline":true},{"text":". (iii) If the non-bias weights in the first layer are fixed and randomly chosen from a suitable range, then the UAP holds with probability ","element":"span"},{"text":"1","element":"span"},{"text":".","element":"span"}]]},{"heading":"1 INTRODUCTION AND OVERVIEW","paragraphs":[[{"text":"A standard (feedforward) neural network with ","element":"span"},{"text":"n ","element":"span"},{"text":"input units, ","element":"span"},{"text":"m ","element":"span"},{"text":"output units, and with one or more hidden layers, refers to a computational model ","element":"span"},{"text":"N ","element":"span"},{"text":"that can compute a certain class of functions ","element":"span"},{"style":{"height":14},"width":264.64,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-13.png","element":"img","alt":"ρ : Rn → Rm","inline":true},{"text":", where ","element":"span"},{"style":{"height":10},"width":148.2,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-14.png","element":"img","alt":" ρ = ρW","inline":true,"padRight":true},{"text":"is parametrized by ","element":"span"},{"text":"W ","element":"span"},{"text":"(called the weights of ","element":"span"},{"text":"N","element":"span"},{"text":"). Implicitly, the definition of ","element":"span"},{"style":{"height":10},"width":21,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-15.png","element":"img","alt":" ρ","inline":true,"padRight":true},{"text":"depends on a choice of some fixed function ","element":"span"},{"style":{"height":11.2},"width":199.4,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-16.png","element":"img","alt":" σ : R → R","inline":true},{"text":", called the activation function of ","element":"span"},{"text":"N","element":"span"},{"text":". Typically, ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-17.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is assumed to be continuous, and historically, the earliest commonly used activation functions were sigmoidal.","element":"span"}],[{"text":"A key fundamental result justifying the use of sigmoidal activation functions was due to ","element":"span"},{"href":"#id-0","referenceIndex":2,"text":"Cybenko ","element":"a"},{"href":"#id-0","referenceIndex":2,"text":"(1989)","element":"a"},{"text":", ","element":"span"},{"href":"#id-1","referenceIndex":10,"text":"Hornik et al. ","element":"a"},{"href":"#id-1","referenceIndex":10,"text":"(1989)","element":"a"},{"text":", and ","element":"span"},{"href":"#id-2","referenceIndex":8,"text":"Funahashi ","element":"a"},{"href":"#id-2","referenceIndex":8,"text":"(1989)","element":"a"},{"text":", who independently proved the first version of what is now famously called the universal approximation theorem. This first version says that if ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-18.png","element":"img","alt":"σ","inline":true,"padRight":true},{"text":"is sigmoidal, then a standard neural network with one hidden layer would be able to uniformly approximate any continuous function ","element":"span"},{"style":{"height":14},"width":230.56,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-19.png","element":"img","alt":" f : X → Rm","inline":true,"padRight":true},{"text":"whose domain ","element":"span"},{"style":{"height":13.2},"width":147.68,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-20.png","element":"img","alt":" X ⊆ Rn","inline":true,"padRight":true},{"text":"is compact. ","element":"span"},{"href":"#id-3","referenceIndex":11,"text":"Hornik ","element":"a"},{"href":"#id-3","referenceIndex":11,"text":"(1991) ","element":"a"},{"text":"extended the theorem to the case when ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-21.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is any continuous bounded non-constant activation function. Subsequently, ","element":"span"},{"href":"#id-4","referenceIndex":14,"text":"Leshno et al. ","element":"a"},{"href":"#id-4","referenceIndex":14,"text":"(1993) ","element":"a"},{"text":"proved that for the class of continuous activation functions, a standard neural network with one hidden layer is able to uniformly approximate any continuous function ","element":"span"},{"style":{"height":14},"width":211.84,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-22.png","element":"img","alt":" f : X → Rm","inline":true,"padRight":true},{"text":"on any compact ","element":"span"},{"style":{"height":13.2},"width":138.08,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-23.png","element":"img","alt":" X ⊆ Rn","inline":true},{"text":", if and only if ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-24.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is non-polynomial.","element":"span"}],[{"text":"Although a single hidden layer is sufficient for the uniform approximation property (UAP) to hold, the number of hidden units required could be arbitrarily large. Given a subclass ","element":"span"},{"text":"F ","element":"span"},{"text":"of real-valued continuous functions on a compact set ","element":"span"},{"style":{"height":13.2},"width":150.08,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-25.png","element":"img","alt":" X ⊆ Rn","inline":true},{"text":", a fixed activation function ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-26.png","element":"img","alt":" σ","inline":true},{"text":", and some ","element":"span"},{"style":{"height":11.6},"width":104,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-27.png","element":"img","alt":" ε > 0","inline":true},{"text":", let ","element":"span"},{"style":{"height":16},"width":282.4,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-28.png","element":"img","alt":" N = N(F, σ, ε)","inline":true,"padRight":true},{"text":"be the minimum number of hidden units required for a single-hidden-layer neural network to be able to uniformly approximate every ","element":"span"},{"style":{"height":14.4},"width":119.92,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/0-29.png","element":"img","alt":" f ∈ F","inline":true,"padRight":true},{"text":"within an approximation error threshold of ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-0.png","element":"img","alt":" ε","inline":true},{"text":". If ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-1.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is the rectified linear unit (ReLU) ","element":"span"},{"style":{"height":16},"width":261.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-2.png","element":"img","alt":" x �→ max(0, x)","inline":true},{"text":", then ","element":"span"},{"text":"N ","element":"span"},{"text":"is at least ","element":"span"},{"style":{"height":22.05},"width":111.05,"height":55.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-3.png","element":"img","alt":" Ω( 1√ε)","inline":true,"padRight":true},{"text":"when ","element":"span"},{"text":"F ","element":"span"},{"text":"is the class of ","element":"span"},{"style":{"height":13.36},"width":47.2,"height":33.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-4.png","element":"img","alt":" C2","inline":true,"padRight":true},{"text":"non-linear functions ","element":"span"},{"href":"#id-5","referenceIndex":32,"text":"(Yarotsky, ","element":"a"},{"href":"#id-5","referenceIndex":32,"text":"2017)","element":"a"},{"text":", or the class of strongly convex differentiable functions ","element":"span"},{"href":"#id-6","referenceIndex":16,"text":"(Liang & Srikant, ","element":"a"},{"href":"#id-6","referenceIndex":16,"text":"2016)","element":"a"},{"text":"; see also ","element":"span"},{"href":"#id-7","referenceIndex":1,"text":"(Arora et al., ","element":"a"},{"href":"#id-7","referenceIndex":1,"text":"2018)","element":"a"},{"text":". If ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-5.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is any smooth non-polynomial function, then ","element":"span"},{"text":"N ","element":"span"},{"text":"is at most ","element":"span"},{"style":{"height":16},"width":129.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-6.png","element":"img","alt":" O(ε−n)","inline":true,"padRight":true},{"text":"for the class of ","element":"span"},{"style":{"height":13.36},"width":47.2,"height":33.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-7.png","element":"img","alt":" C1","inline":true,"padRight":true},{"text":"functions with bounded Sobolev norm ","element":"span"},{"href":"#id-8","referenceIndex":21,"text":"(Mhaskar, ","element":"a"},{"href":"#id-8","referenceIndex":21,"text":"1996)","element":"a"},{"text":"; cf. ","element":"span"},{"href":"#id-9","referenceIndex":24,"text":"(Pinkus, ","element":"a"},{"href":"#id-9","referenceIndex":24,"text":"1999, ","element":"a"},{"text":"Thm. 6.8), ","element":"span"},{"href":"#id-10","referenceIndex":20,"text":"(Maiorov & Pinkus, ","element":"a"},{"href":"#id-10","referenceIndex":20,"text":"1999)","element":"a"},{"text":". As a key highlight of this paper, we show that if ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-8.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is an arbitrary continuous non-polynomial function, then ","element":"span"},{"text":"N ","element":"span"},{"text":"is at most ","element":"span"},{"style":{"height":16},"width":129.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-9.png","element":"img","alt":" O(ε−n)","inline":true,"padRight":true},{"text":"for the entire class of continuous functions. In fact, we give an explicit upper bound for ","element":"span"},{"text":"N ","element":"span"},{"text":"in terms of ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-10.png","element":"img","alt":" ε","inline":true,"padRight":true},{"text":"and the modulus of continuity of ","element":"span"},{"text":"f","element":"span"},{"text":", so better bounds could be obtained for certain subclasses ","element":"span"},{"text":"F","element":"span"},{"text":", which we discuss further in Section ","element":"span"},{"text":"4. ","element":"span"},{"text":"Furthermore, even for the wider class ","element":"span"},{"text":"F ","element":"span"},{"text":"of all continuous functions ","element":"span"},{"style":{"height":14},"width":212.32,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-11.png","element":"img","alt":" f : X → Rm","inline":true},{"text":", the bound is still ","element":"span"},{"style":{"height":16},"width":129.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-12.png","element":"img","alt":" O(ε−n)","inline":true},{"text":", independent of ","element":"span"},{"text":"m","element":"span"},{"text":".","element":"span"}],[{"text":"To prove this bound, we shall give a direct algebraic proof of the universal approximation theorem, in its general version as stated by ","element":"span"},{"href":"#id-4","referenceIndex":14,"text":"Leshno et al. ","element":"a"},{"href":"#id-4","referenceIndex":14,"text":"(1993) ","element":"a"},{"text":"(i.e. ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-13.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is continuous and non-polynomial). An important advantage of our algebraic approach is that we are able to glean additional information on sufficient conditions that would imply the UAP. Another key highlight we have is that if ","element":"span"},{"text":"F ","element":"span"},{"text":"is the subclass of polynomial functions ","element":"span"},{"style":{"height":14},"width":220,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-14.png","element":"img","alt":" f : X → Rm","inline":true,"padRight":true},{"text":"with total degree at most ","element":"span"},{"text":"d ","element":"span"},{"text":"for each coordinate function, then","element":"span"},{"style":{"height":20.27},"width":96.72,"height":50.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-15.png","element":"img","alt":"�n+dd �","inline":true},{"text":"hidden units would suffice. In particular, notice that our bound ","element":"span"},{"style":{"height":20.27},"width":196.56,"height":50.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-16.png","element":"img","alt":" N ≤�n+dd �","inline":true,"padRight":true},{"text":"does not depend on the approximation error threshold ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-17.png","element":"img","alt":" ε","inline":true,"padRight":true},{"text":"or the output dimension ","element":"span"},{"text":"m","element":"span"},{"text":".","element":"span"}],[{"text":"We shall also show that the UAP holds even under strong conditions on the weights. Given any ","element":"span"},{"style":{"height":12.4},"width":103.52,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-18.png","element":"img","alt":"δ > 0","inline":true},{"text":", we can always choose the non-bias weights in the last layer to have small magnitudes no larger than ","element":"span"},{"style":{"height":11.6},"width":19,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-19.png","element":"img","alt":" δ","inline":true},{"text":". Furthermore, we show that there exists some ","element":"span"},{"style":{"height":11.6},"width":106.88,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-20.png","element":"img","alt":" λ > 0","inline":true,"padRight":true},{"text":"(depending only on ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-21.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"and the function ","element":"span"},{"text":"f ","element":"span"},{"text":"to be approximated), such that the non-bias weights in the first layer can always be chosen to have magnitudes greater than ","element":"span"},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-22.png","element":"img","alt":" λ","inline":true},{"text":". Even with these seemingly strong restrictions on the weights, we show that the UAP still holds. Thus, our main results can be collectively interpreted as a quantitative refinement of the universal approximation theorem, with extensions to restricted weight values.","element":"span"}],[{"text":"Outline: Section ","element":"span"},{"text":"2 ","element":"span"},{"text":"covers the preliminaries, including relevant details on arguments involving dense sets. Section ","element":"span"},{"text":"3 ","element":"span"},{"text":"gives precise statements of our results, while Section ","element":"span"},{"text":"4 ","element":"span"},{"text":"discusses the consequences of our results. Section ","element":"span"},{"text":"5 ","element":"span"},{"text":"introduces our algebraic approach and includes most details of the proofs of our results; details omitted from Section ","element":"span"},{"text":"5 ","element":"span"},{"text":"can be found in the appendix. Finally, Section ","element":"span"},{"text":"6 ","element":"span"},{"text":"concludes our paper with further remarks.","element":"span"}]]},{"heading":"2 PRELIMINARIES","paragraphs":[[{"text":"2.1 ","element":"span"},{"text":"N","element":"span"},{"text":"OTATION AND ","element":"span"},{"text":"D","element":"span"},{"text":"EFINITIONS","element":"span"}],[{"text":"Let ","element":"span"},{"text":"N ","element":"span"},{"text":"be the set of non-negative integers, let ","element":"span"},{"style":{"height":12.7},"width":43.04,"height":31.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-23.png","element":"img","alt":" 0n","inline":true,"padRight":true},{"text":"be the zero vector in ","element":"span"},{"style":{"height":10.8},"width":48.8,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-24.png","element":"img","alt":" Rn","inline":true},{"text":", and let ","element":"span"},{"style":{"height":16},"width":160,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-25.png","element":"img","alt":" Mat(k, ℓ)","inline":true,"padRight":true},{"text":"be the vector space of all ","element":"span"},{"text":"k","element":"span"},{"text":"-by-","element":"span"},{"style":{"height":0},"width":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-26.png","element":"img","alt":"ℓ","inline":true,"padRight":true},{"text":"matrices with real entries. For any function ","element":"span"},{"style":{"height":14},"width":226.24,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-27.png","element":"img","alt":" f : Rn → Rm","inline":true},{"text":", let ","element":"span"},{"style":{"height":17.55},"width":54.12,"height":43.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-28.png","element":"img","alt":" f [t]","inline":true,"padRight":true},{"text":"denote the ","element":"span"},{"text":"t","element":"span"},{"text":"-th coordinate function of ","element":"span"},{"text":"f ","element":"span"},{"text":"(for each ","element":"span"},{"style":{"height":13.2},"width":192.44,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-29.png","element":"img","alt":" 1 ≤ t ≤ m","inline":true},{"text":"). Given ","element":"span"},{"style":{"height":16},"width":403.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-30.png","element":"img","alt":" α = (α1, . . . , αn) ∈ Nn","inline":true,"padRight":true},{"text":"and any ","element":"span"},{"text":"n","element":"span"},{"text":"-tuple ","element":"span"},{"style":{"height":16},"width":285.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-31.png","element":"img","alt":" x = (x1, . . . , xn)","inline":true},{"text":", we write ","element":"span"},{"style":{"height":10.56},"width":43.56,"height":26.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-32.png","element":"img","alt":" xα","inline":true,"padRight":true},{"text":"to mean ","element":"span"},{"style":{"height":16.13},"width":182.16,"height":40.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-33.png","element":"img","alt":" xα11 · · · xαnn","inline":true,"padRight":true},{"text":". If ","element":"span"},{"style":{"height":11.6},"width":125.12,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-34.png","element":"img","alt":" x ∈ Rn","inline":true},{"text":", then ","element":"span"},{"style":{"height":10.56},"width":43.56,"height":26.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-35.png","element":"img","alt":" xα","inline":true,"padRight":true},{"text":"is a real number, ","element":"span"},{"text":"while if ","element":"span"},{"text":"x ","element":"span"},{"text":"is a sequence of variables, then ","element":"span"},{"style":{"height":10.75},"width":43.56,"height":26.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-36.png","element":"img","alt":" xα","inline":true,"padRight":true},{"text":"is a monomial, i.e. an ","element":"span"},{"text":"n","element":"span"},{"text":"-variate polynomial with a single term. Let ","element":"span"},{"style":{"height":17.09},"width":866.24,"height":42.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-37.png","element":"img","alt":" Wn,mN := {W ∈ Mat(n + 1, N) × Mat(N + 1, m)}","inline":true,"padRight":true},{"text":"for each ","element":"span"},{"style":{"height":13.2},"width":112.64,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-38.png","element":"img","alt":" N ≥ 1","inline":true},{"text":", and define ","element":"span"},{"style":{"height":19.68},"width":365.92,"height":49.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-39.png","element":"img","alt":"Wn,m = �N≥1 Wn,mN","inline":true,"padRight":true},{"text":". If the context is clear, we supress the superscripts ","element":"span"},{"text":"n, m ","element":"span"},{"text":"in ","element":"span"},{"style":{"height":17.28},"width":100,"height":43.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-40.png","element":"img","alt":" Wn,mN","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":11.6},"width":100,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-41.png","element":"img","alt":" Wn,m","inline":true},{"text":".","element":"span"}],[{"text":"Given any ","element":"span"},{"style":{"height":13.2},"width":141.92,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-42.png","element":"img","alt":" X ⊆ Rn","inline":true},{"text":", let ","element":"span"},{"text":"C","element":"span"},{"text":"(","element":"span"},{"text":"X","element":"span"},{"text":") ","element":"span"},{"text":"be the vector space of all continuous functions ","element":"span"},{"style":{"height":14},"width":192.2,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-43.png","element":"img","alt":" f : X → R","inline":true},{"text":". We use the convention that every ","element":"span"},{"style":{"height":16},"width":163.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-44.png","element":"img","alt":" f ∈ C(X)","inline":true,"padRight":true},{"text":"is a function ","element":"span"},{"style":{"height":16},"width":229.12,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-45.png","element":"img","alt":" f(x1, . . . , xn)","inline":true,"padRight":true},{"text":"in terms of the variables ","element":"span"},{"style":{"height":10},"width":172.16,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-46.png","element":"img","alt":" x1, . . . , xn","inline":true},{"text":", unless ","element":"span"},{"text":"n ","element":"span"},{"text":"= 1","element":"span"},{"text":", in which case ","element":"span"},{"text":"f ","element":"span"},{"text":"is in terms of a single variable ","element":"span"},{"text":"x ","element":"span"},{"text":"(or ","element":"span"},{"text":"y","element":"span"},{"text":"). We say ","element":"span"},{"text":"f ","element":"span"},{"text":"is non-zero if ","element":"span"},{"text":"f ","element":"span"},{"text":"is not identically the zero function on ","element":"span"},{"text":"X","element":"span"},{"text":". Let ","element":"span"},{"text":"P","element":"span"},{"text":"(","element":"span"},{"text":"X","element":"span"},{"text":") ","element":"span"},{"text":"be the subspace of all polynomial functions in ","element":"span"},{"text":"C","element":"span"},{"text":"(","element":"span"},{"text":"X","element":"span"},{"text":")","element":"span"},{"text":". For each ","element":"span"},{"style":{"height":11.6},"width":101.48,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-47.png","element":"img","alt":" d ∈ N","inline":true},{"text":", let ","element":"span"},{"style":{"height":16.7},"width":138.88,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-48.png","element":"img","alt":" P≤d(X)","inline":true,"padRight":true},{"text":"(resp. ","element":"span"},{"style":{"height":16},"width":114.4,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-49.png","element":"img","alt":" Pd(X)","inline":true},{"text":") be the subspace consisting of all polynomial functions of total degree ","element":"span"},{"style":{"height":13.2},"width":71.88,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-50.png","element":"img","alt":" ≤ d","inline":true,"padRight":true},{"text":"(resp. exactly ","element":"span"},{"text":"d","element":"span"},{"text":"). More generally, let ","element":"span"},{"style":{"height":16},"width":165.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-51.png","element":"img","alt":" C(X, Rm)","inline":true,"padRight":true},{"text":"be the vector space of all continuous functions ","element":"span"},{"style":{"height":14},"width":211.84,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-52.png","element":"img","alt":" f : X → Rm","inline":true},{"text":", and define ","element":"span"},{"style":{"height":16.7},"width":614.08,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-53.png","element":"img","alt":" P(X, Rm), P≤d(X, Rm), Pd(X, Rm)","inline":true,"padRight":true},{"text":"analogously.","element":"span"}],[{"text":"Throughout, we assume that ","element":"span"},{"style":{"height":16},"width":163.36,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-54.png","element":"img","alt":" σ ∈ C(R)","inline":true},{"text":". For every ","element":"span"},{"style":{"height":18.16},"width":421.2,"height":45.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-55.png","element":"img","alt":" W = (W (1), W (2)) ∈ W","inline":true},{"text":", let ","element":"span"},{"style":{"height":20.59},"width":75.36,"height":51.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-56.png","element":"img","alt":" w(k)j","inline":true,"padRight":true},{"text":"be the ","element":"span"},{"text":"j","element":"span"},{"text":"-th column vector of ","element":"span"},{"style":{"height":14.16},"width":84.96,"height":35.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-57.png","element":"img","alt":" W (k)","inline":true},{"text":", and let ","element":"span"},{"style":{"height":20.59},"width":71.04,"height":51.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-58.png","element":"img","alt":" w(k)i,j","inline":true,"padRight":true},{"text":"be the ","element":"span"},{"text":"(","element":"span"},{"text":"i, j","element":"span"},{"text":")","element":"span"},{"text":"-th entry of ","element":"span"},{"style":{"height":14.16},"width":84.96,"height":35.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-59.png","element":"img","alt":" W (k)","inline":true,"padRight":true},{"text":"(for ","element":"span"},{"text":"k ","element":"span"},{"text":"= 1","element":"span"},{"text":", ","element":"span"},{"text":"2","element":"span"},{"text":"). The index ","element":"span"},{"text":"i ","element":"span"},{"text":"begins at ","element":"span"},{"text":"i ","element":"span"},{"text":"= 0","element":"span"},{"text":", while the indices ","element":"span"},{"text":"j","element":"span"},{"text":", ","element":"span"},{"text":"k ","element":"span"},{"text":"begin at ","element":"span"},{"text":"j ","element":"span"},{"text":"= 1","element":"span"},{"text":", ","element":"span"},{"text":"k ","element":"span"},{"text":"= 1 ","element":"span"},{"text":"respectively. For convenience, let ","element":"span"},{"style":{"height":23.47},"width":75.36,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-60.png","element":"img","alt":"�w(k)j","inline":true,"padRight":true},{"text":"denote the truncation of ","element":"span"},{"style":{"height":23.47},"width":75.36,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-61.png","element":"img","alt":" w(k)j","inline":true,"padRight":true},{"text":"obtained by removing the first entry ","element":"span"},{"style":{"height":23.47},"width":71.04,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-62.png","element":"img","alt":" w(k)0,j","inline":true,"padRight":true},{"text":". Define the function ","element":"span"},{"style":{"height":15.22},"width":259.36,"height":38.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-63.png","element":"img","alt":"ρσW : Rn → Rm","inline":true,"padRight":true},{"text":"so that for each ","element":"span"},{"style":{"height":14},"width":179.96,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-64.png","element":"img","alt":" 1 ≤ j ≤ m","inline":true},{"text":", the ","element":"span"},{"text":"j","element":"span"},{"text":"-th coordinate function ","element":"span"},{"style":{"height":21.46},"width":78.12,"height":53.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-65.png","element":"img","alt":" ρσ [j]W","inline":true,"padRight":true},{"text":"is given by the map","element":"span"}],[{"style":{"width":"40%"},"width":640,"height":60,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/1-66.png","element":"img"}],[{"text":"where “","element":"span"},{"style":{"height":4.8},"width":11,"height":12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-0.png","element":"img","alt":"·","inline":true},{"text":"” denotes dot product, and ","element":"span"},{"text":"(1","element":"span"},{"text":", x","element":"span"},{"text":") ","element":"span"},{"text":"denotes a column vector in ","element":"span"},{"style":{"height":13.36},"width":88.96,"height":33.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-1.png","element":"img","alt":" Rn+1","inline":true,"padRight":true},{"text":"formed by concatenating ","element":"span"},{"text":"1 ","element":"span"},{"text":"before ","element":"span"},{"text":"x","element":"span"},{"text":". The class of functions that neural networks ","element":"span"},{"text":"N ","element":"span"},{"text":"with one hidden layer can compute is precisely ","element":"span"},{"style":{"height":16.41},"width":264.8,"height":41.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-2.png","element":"img","alt":" {ρσW : W ∈ W}","inline":true},{"text":", where ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-3.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is called the activation function of ","element":"span"},{"text":"N ","element":"span"},{"text":"(or of ","element":"span"},{"style":{"height":14.97},"width":53.64,"height":37.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-4.png","element":"img","alt":" ρσW","inline":true,"padRight":true},{"text":"). Functions ","element":"span"},{"style":{"height":14.97},"width":53.64,"height":37.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-5.png","element":"img","alt":" ρσW","inline":true,"padRight":true},{"text":"satisfying ","element":"span"},{"style":{"height":13.1},"width":158.04,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-6.png","element":"img","alt":" W ∈ WN","inline":true,"padRight":true},{"text":"correspond to neural networks with ","element":"span"},{"text":"N ","element":"span"},{"text":"hidden units (in its single hidden layer). Every ","element":"span"},{"style":{"height":23.47},"width":71.04,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-7.png","element":"img","alt":" w(k)i,j","inline":true,"padRight":true},{"text":"is called a weight in the ","element":"span"},{"text":"k","element":"span"},{"text":"-th layer, where ","element":"span"},{"style":{"height":23.47},"width":64.68,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-8.png","element":"img","alt":" w[k]i,j","inline":true,"padRight":true},{"text":"is called a bias weight (resp. non-bias ","element":"span"},{"text":"weight) if ","element":"span"},{"text":"i ","element":"span"},{"text":"= 0 ","element":"span"},{"text":"(resp. ","element":"span"},{"style":{"height":15.2},"width":86.72,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-9.png","element":"img","alt":" i ̸= 0","inline":true},{"text":").","element":"span"}],[{"text":"Notice that we do not apply the activation function ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-10.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"to the output layer. This is consistent with previous approximation results for neural networks. The reason is simple: ","element":"span"},{"style":{"height":21.26},"width":143.88,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-11.png","element":"img","alt":" σ ◦ ρσ [j]W","inline":true,"padRight":true},{"text":"(restricted to domain ","element":"span"},{"style":{"height":13.2},"width":152.96,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-12.png","element":"img","alt":" X ⊆ Rn","inline":true},{"text":") cannot possibly approximate ","element":"span"},{"style":{"height":14},"width":213.8,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-13.png","element":"img","alt":" f : X → R","inline":true,"padRight":true},{"text":"if there exists some ","element":"span"},{"style":{"height":13.1},"width":138.68,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-14.png","element":"img","alt":" x0 ∈ X","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":16},"width":91.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-15.png","element":"img","alt":" σ(X)","inline":true,"padRight":true},{"text":"is bounded away from ","element":"span"},{"style":{"height":16},"width":96.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-16.png","element":"img","alt":" f(x0)","inline":true},{"text":". If instead ","element":"span"},{"text":"f","element":"span"},{"text":"(","element":"span"},{"text":"X","element":"span"},{"text":") ","element":"span"},{"text":"is contained in the closure of ","element":"span"},{"style":{"height":16},"width":91.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-17.png","element":"img","alt":" σ(X)","inline":true},{"text":", then applying ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-18.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"to ","element":"span"},{"style":{"height":21.46},"width":78.12,"height":53.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-19.png","element":"img","alt":" ρσ [j]W","inline":true,"padRight":true},{"text":"has essentially the same effect as allowing for bias weights ","element":"span"},{"style":{"height":23.47},"width":69.6,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-20.png","element":"img","alt":" w(2)0,j","inline":true},{"text":".","element":"span"}],[{"text":"Although some authors, e.g. ","element":"span"},{"href":"#id-4","referenceIndex":14,"text":"(Leshno et al.","element":"a"},{"text":", ","element":"span"},{"href":"#id-4","referenceIndex":14,"text":"1993)","element":"a"},{"text":", do not explicitly include bias weights in the output layer, the reader should check that if ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-21.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"is not identically zero, say ","element":"span"},{"style":{"height":16},"width":169.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-22.png","element":"img","alt":" σ(y0) ̸= 0","inline":true},{"text":", then having a bias weight ","element":"span"},{"style":{"height":23.47},"width":147.12,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-23.png","element":"img","alt":" w(2)0,j = c","inline":true,"padRight":true},{"text":"is equivalent to setting ","element":"span"},{"style":{"height":23.47},"width":149.6,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-24.png","element":"img","alt":" w(2)0,j = 0","inline":true,"padRight":true},{"text":"(i.e. no bias weight in the output layer) and ","element":"span"},{"text":"introducing an ","element":"span"},{"text":"(","element":"span"},{"text":"N ","element":"span"},{"text":"+ 1)","element":"span"},{"text":"-th hidden unit, with corresponding weights ","element":"span"},{"style":{"height":20.98},"width":434.72,"height":52.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-25.png","element":"img","alt":" w(1)0,N+1 = y0, w(1)i,N+1 = 0","inline":true,"padRight":true},{"text":"for ","element":"span"},{"text":"all ","element":"span"},{"style":{"height":13.2},"width":176.64,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-26.png","element":"img","alt":" 1 ≤ i ≤ n","inline":true},{"text":", and ","element":"span"},{"style":{"height":22.53},"width":263.52,"height":56.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-27.png","element":"img","alt":" w(2)N+1,j = cσ(y0)","inline":true},{"text":"; this means our results also apply to neural networks without ","element":"span"},{"text":"bias weights in the output layer (but with one additional hidden unit).","element":"span"}],[{"text":"2.2 ","element":"span"},{"text":"A","element":"span"},{"text":"RGUMENTS INVOLVING DENSE SUBSETS","element":"span"}],[{"text":"A key theme in this paper is the use of dense subsets of metric spaces. We shall consider several notions of “dense”. First, recall that a metric on a set ","element":"span"},{"text":"S ","element":"span"},{"text":"is any function ","element":"span"},{"style":{"height":11.6},"width":245,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-28.png","element":"img","alt":" d : S × S → R","inline":true,"padRight":true},{"text":"such that for","element":"span"}],[{"text":"all ","element":"span"},{"style":{"height":14},"width":173.84,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-29.png","element":"img","alt":" x, y, z ∈ S","inline":true},{"text":", the following conditions hold:","element":"span"}],[{"style":{"width":"60%"},"width":958,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-30.png","element":"img"}],[{"text":"The set ","element":"span"},{"text":"S","element":"span"},{"text":", together with a metric on ","element":"span"},{"text":"S","element":"span"},{"text":", is called a metric space. For example, the usual Euclidean norm for vectors in ","element":"span"},{"style":{"height":10.8},"width":48.8,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-31.png","element":"img","alt":" Rn","inline":true,"padRight":true},{"text":"gives the Euclidean metric ","element":"span"},{"style":{"height":16},"width":304.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-32.png","element":"img","alt":" (u, v) �→ ∥u − v∥2","inline":true},{"text":", hence ","element":"span"},{"style":{"height":10.8},"width":48.8,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-33.png","element":"img","alt":" Rn","inline":true,"padRight":true},{"text":"is a metric space. In particular, every pair in ","element":"span"},{"style":{"height":13.11},"width":66.36,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-34.png","element":"img","alt":" WN","inline":true,"padRight":true},{"text":"can be identified with a vector in ","element":"span"},{"style":{"height":14.16},"width":193.56,"height":35.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-35.png","element":"img","alt":" R(m+n+1)N","inline":true},{"text":", so ","element":"span"},{"style":{"height":13.11},"width":66.36,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-36.png","element":"img","alt":" WN","inline":true},{"text":", together with the Euclidean metric, is a metric space.","element":"span"}],[{"text":"Given a metric space ","element":"span"},{"text":"X ","element":"span"},{"text":"(with metric ","element":"span"},{"text":"d","element":"span"},{"text":"), and some subset ","element":"span"},{"style":{"height":13.2},"width":119.48,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-37.png","element":"img","alt":" U ⊆ X","inline":true},{"text":", we say that ","element":"span"},{"text":"U ","element":"span"},{"text":"is dense in ","element":"span"},{"text":"X ","element":"span"},{"text":"(w.r.t. ","element":"span"},{"text":"d","element":"span"},{"text":") if for all ","element":"span"},{"style":{"height":11.6},"width":94.88,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-38.png","element":"img","alt":" ε > 0","inline":true,"padRight":true},{"text":"and all ","element":"span"},{"style":{"height":11.6},"width":109.88,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-39.png","element":"img","alt":" x ∈ X","inline":true},{"text":", there exists some ","element":"span"},{"style":{"height":11.6},"width":105.88,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-40.png","element":"img","alt":" u ∈ U","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":16},"width":189.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-41.png","element":"img","alt":" d(x, u) < ε","inline":true},{"text":". Arbitrary unions of dense subsets are dense. If ","element":"span"},{"style":{"height":13.2},"width":215.48,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-42.png","element":"img","alt":" U ⊆ U ′ ⊆ X","inline":true,"padRight":true},{"text":"and ","element":"span"},{"text":"U ","element":"span"},{"text":"is dense in ","element":"span"},{"text":"X","element":"span"},{"text":", then ","element":"span"},{"style":{"height":10.8},"width":45.68,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-43.png","element":"img","alt":" U ′","inline":true,"padRight":true},{"text":"must also be dense in ","element":"span"},{"text":"X","element":"span"},{"text":".","element":"span"}],[{"text":"A basic result in algebraic geometry says that if ","element":"span"},{"style":{"height":16},"width":182.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-44.png","element":"img","alt":" p ∈ P(Rn)","inline":true,"padRight":true},{"text":"is non-zero, then ","element":"span"},{"style":{"height":16},"width":344,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-45.png","element":"img","alt":" {x ∈ Rn : p(x) ̸= 0}","inline":true,"padRight":true},{"text":"is a dense subset of ","element":"span"},{"style":{"height":10.8},"width":48.8,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-46.png","element":"img","alt":" Rn","inline":true,"padRight":true},{"text":"(w.r.t. the Euclidean metric). This subset is in fact an open set in the Zariski topology, hence any finite intersection of such Zariski-dense open sets is dense; see ","element":"span"},{"href":"#id-11","referenceIndex":6,"text":"(Eisenbud, ","element":"a"},{"href":"#id-11","referenceIndex":6,"text":"1995)","element":"a"},{"text":". More generally, the following is true: Let ","element":"span"},{"style":{"height":16},"width":328,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-47.png","element":"img","alt":" p1, . . . , pk ∈ P(Rn)","inline":true},{"text":", and suppose that ","element":"span"},{"style":{"height":16},"width":181.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-48.png","element":"img","alt":" X := {x ∈","inline":true},{"style":{"height":16},"width":250.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-49.png","element":"img","alt":"Rn : pi(x) = 0","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":16},"width":188.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-50.png","element":"img","alt":" 1 ≤ i ≤ k}","inline":true},{"text":". If ","element":"span"},{"style":{"height":16},"width":171.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-51.png","element":"img","alt":" p ∈ P(X)","inline":true,"padRight":true},{"text":"is non-zero, then ","element":"span"},{"style":{"height":16},"width":338.24,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-52.png","element":"img","alt":" {x ∈ X : p(x) ̸= 0}","inline":true,"padRight":true},{"text":"is a dense subset of ","element":"span"},{"text":"X ","element":"span"},{"text":"(w.r.t. the Euclidean metric). In subsequent sections, we shall frequently use these facts.","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":13.2},"width":156.32,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-53.png","element":"img","alt":" X ⊆ Rn","inline":true,"padRight":true},{"text":"be a compact set. (Recall that ","element":"span"},{"text":"X ","element":"span"},{"text":"is compact if it is bounded and contains all of its limit points.) For any real-valued function ","element":"span"},{"text":"f ","element":"span"},{"text":"whose domain contains ","element":"span"},{"text":"X","element":"span"},{"text":", the uniform norm of ","element":"span"},{"text":"f ","element":"span"},{"text":"on ","element":"span"},{"text":"X ","element":"span"},{"text":"is ","element":"span"},{"style":{"height":16.7},"width":572.48,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-54.png","element":"img","alt":" ∥f∥∞,X := sup{|f(x)| : x ∈ X}","inline":true},{"text":". More generally, if ","element":"span"},{"style":{"height":14},"width":233.44,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-55.png","element":"img","alt":" f : X → Rm","inline":true},{"text":", then we define ","element":"span"},{"style":{"height":21.36},"width":696.8,"height":53.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-56.png","element":"img","alt":"∥f∥∞,X := max{∥f [j]∥∞,X : 1 ≤ j ≤ m}","inline":true},{"text":". The uniform norm of functions on ","element":"span"},{"text":"X ","element":"span"},{"text":"gives the uniform metric ","element":"span"},{"style":{"height":16.7},"width":354.36,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-57.png","element":"img","alt":" (f, g) �→ ∥f − g∥∞,X","inline":true},{"text":", hence ","element":"span"},{"text":"C","element":"span"},{"text":"(","element":"span"},{"text":"X","element":"span"},{"text":") ","element":"span"},{"text":"is a metric space.","element":"span"}],[{"text":"2.3 ","element":"span"},{"text":"B","element":"span"},{"text":"ACKGROUND ON APPROXIMATION THEORY","element":"span"}],[{"id":"id-15","text":"Theorem 2.1 ","element":"span"},{"text":"(Stone–Weirstrass theorem). Let ","element":"span"},{"style":{"height":13.2},"width":152,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-58.png","element":"img","alt":" X ⊆ Rn","inline":true,"padRight":true},{"text":"be compact. For any ","element":"span"},{"style":{"height":16},"width":177.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-59.png","element":"img","alt":" f ∈ C(X)","inline":true},{"text":", there exists a sequence ","element":"span"},{"style":{"height":16},"width":138.56,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-60.png","element":"img","alt":" {pk}k∈N","inline":true,"padRight":true},{"text":"of polynomial functions in ","element":"span"},{"text":"P","element":"span"},{"text":"(","element":"span"},{"text":"X","element":"span"},{"text":") ","element":"span"},{"text":"such that ","element":"span"},{"style":{"height":16.7},"width":441.92,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-61.png","element":"img","alt":" limk→∞ ∥f − pk∥∞,X = 0","inline":true},{"text":".","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":13.2},"width":118.28,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-62.png","element":"img","alt":" X ⊆ R","inline":true,"padRight":true},{"text":"be compact. For all ","element":"span"},{"style":{"height":11.6},"width":98.12,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-63.png","element":"img","alt":" d ∈ N","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16},"width":163.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-64.png","element":"img","alt":" f ∈ C(X)","inline":true},{"text":", define","element":"span"}],[{"id":"id-12","style":{"width":"72%"},"width":1145,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/2-65.png","element":"img"}],[{"text":"A central result in approximation theory, due to Chebyshev, says that for fixed ","element":"span"},{"text":"d, f","element":"span"},{"text":", the infimum in ","element":"span"},{"href":"#id-12","text":"(1) ","element":"a"},{"text":"is attained by some unique ","element":"span"},{"style":{"height":16.71},"width":230.08,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-0.png","element":"img","alt":" p∗ ∈ P≤d(R)","inline":true},{"text":"; see ","element":"span"},{"href":"#id-13","referenceIndex":27,"text":"(Rivlin, ","element":"a"},{"href":"#id-13","referenceIndex":27,"text":"1981, ","element":"a"},{"text":"Chap. 1). (Notice here that we define ","element":"span"},{"style":{"height":14.16},"width":36.16,"height":35.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-1.png","element":"img","alt":" p∗","inline":true,"padRight":true},{"text":"to have domain ","element":"span"},{"text":"R","element":"span"},{"text":".) This unique polynomial ","element":"span"},{"style":{"height":14.16},"width":36.16,"height":35.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-2.png","element":"img","alt":" p∗","inline":true,"padRight":true},{"text":"is called the best polynomial approximant to ","element":"span"},{"text":"f ","element":"span"},{"text":"of degree ","element":"span"},{"text":"d","element":"span"},{"text":".","element":"span"}],[{"text":"Given a metric space ","element":"span"},{"text":"X ","element":"span"},{"text":"with metric ","element":"span"},{"text":"d","element":"span"},{"text":", and any uniformly continuous function ","element":"span"},{"style":{"height":14},"width":210.44,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-3.png","element":"img","alt":" f : X → R","inline":true},{"text":", the modulus of continuity of ","element":"span"},{"text":"f ","element":"span"},{"text":"is a function ","element":"span"},{"style":{"height":16.7},"width":339.8,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-4.png","element":"img","alt":" ωf : [0, ∞] → [0, ∞]","inline":true,"padRight":true},{"text":"defined by","element":"span"}],[{"style":{"width":"54%"},"width":871,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-5.png","element":"img"}],[{"text":"By the Heine–Cantor theorem, any continuous ","element":"span"},{"text":"f ","element":"span"},{"text":"with a compact domain is uniformly continuous.","element":"span"}],[{"id":"id-47","text":"Theorem 2.2 ","element":"span"},{"text":"(Jackson’s theorem; see ","element":"span"},{"href":"#id-13","referenceIndex":27,"text":"(Rivlin, ","element":"a"},{"href":"#id-13","referenceIndex":27,"text":"1981, ","element":"a"},{"text":"Cor. 1.4.1)). Let ","element":"span"},{"style":{"height":13.2},"width":98.24,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-6.png","element":"img","alt":" d ≥ 1","inline":true,"padRight":true},{"text":"be an integer, and let ","element":"span"},{"style":{"height":13.2},"width":116.84,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-7.png","element":"img","alt":"Y ⊆ R","inline":true,"padRight":true},{"text":"be a closed interval of length ","element":"span"},{"style":{"height":13.2},"width":95.36,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-8.png","element":"img","alt":" r ≥ 0","inline":true},{"text":". Suppose ","element":"span"},{"style":{"height":16},"width":161.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-9.png","element":"img","alt":" f ∈ C(Y )","inline":true},{"text":", and let ","element":"span"},{"style":{"height":14.16},"width":36.16,"height":35.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-10.png","element":"img","alt":" p∗","inline":true,"padRight":true},{"text":"be the best polynomial approximant to ","element":"span"},{"text":"f ","element":"span"},{"text":"of degree ","element":"span"},{"text":"d","element":"span"},{"text":". Then ","element":"span"},{"style":{"height":17.38},"width":567.53,"height":43.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-11.png","element":"img","alt":" ∥f − p∗∥∞,Y = Ed(f) ≤ 6ωf( r2d)","inline":true},{"text":".","element":"span"}]]},{"heading":"3 MAIN RESULTS","paragraphs":[[{"text":"Throughout this section, let ","element":"span"},{"style":{"height":13.2},"width":138.08,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-12.png","element":"img","alt":" X ⊆ Rn","inline":true,"padRight":true},{"text":"be a compact set.","element":"span"}],[{"id":"id-14","style":{"width":"100%"},"width":1588,"height":1100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-13.png","element":"img"}]]},{"heading":"4 DISCUSSION","paragraphs":[[{"text":"The universal approximation theorem (version of ","element":"span"},{"href":"#id-4","referenceIndex":14,"text":"Leshno et al. ","element":"a"},{"href":"#id-4","referenceIndex":14,"text":"(1993)","element":"a"},{"text":") is an immediate consequence of Theorem ","element":"span"},{"href":"#id-14","text":"3.2 ","element":"a"},{"text":"and the observation that ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-14.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"must be non-polynomial for the UAP to hold, which follows from the fact that the uniform closure of ","element":"span"},{"style":{"height":16.7},"width":138.88,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-15.png","element":"img","alt":" P≤d(X)","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"height":16.7},"width":138.88,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-16.png","element":"img","alt":" P≤d(X)","inline":true,"padRight":true},{"text":"itself, for every integer ","element":"span"},{"style":{"height":13.2},"width":93.92,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-17.png","element":"img","alt":" d ≥ 1","inline":true},{"text":". Alternatively, we could infer the universal approximation theorem by applying the Stone–Weirstrass theorem (Theorem ","element":"span"},{"href":"#id-15","text":"2.1) ","element":"a"},{"text":"to Theorem ","element":"span"},{"href":"#id-14","text":"3.1.","element":"a"}],[{"text":"Given fixed ","element":"span"},{"text":"n, m, d","element":"span"},{"text":", a compact set ","element":"span"},{"style":{"height":13.2},"width":152.96,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-18.png","element":"img","alt":" X ⊆ Rn","inline":true},{"text":", and ","element":"span"},{"style":{"height":16.7},"width":362.56,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-19.png","element":"img","alt":" σ ∈ C(R)\\P≤d−1(R)","inline":true},{"text":", Theorem ","element":"span"},{"href":"#id-14","text":"3.1 ","element":"a"},{"text":"says that we could use a fixed number ","element":"span"},{"text":"N ","element":"span"},{"text":"of hidden units (independent of ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-20.png","element":"img","alt":" ε","inline":true},{"text":") and still be able to approximate any function ","element":"span"},{"style":{"height":16.7},"width":299.2,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-21.png","element":"img","alt":" f ∈ P≤d(X, Rm)","inline":true,"padRight":true},{"text":"to any desired approximation error threshold ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-22.png","element":"img","alt":" ε","inline":true},{"text":". Our ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-23.png","element":"img","alt":" ε","inline":true},{"text":"-free bound, although possibly surprising to some readers, is not the first instance of an ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-24.png","element":"img","alt":" ε","inline":true},{"text":"-free bound: Neural networks with two hidden layers of sizes ","element":"span"},{"text":"2","element":"span"},{"text":"n ","element":"span"},{"text":"+ 1 ","element":"span"},{"text":"and ","element":"span"},{"text":"4","element":"span"},{"text":"n ","element":"span"},{"text":"+ 3 ","element":"span"},{"text":"respectively are able to uniformly approximate any ","element":"span"},{"style":{"height":16},"width":177.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/3-25.png","element":"img","alt":" f ∈ C(X)","inline":true},{"text":", provided we use a (somewhat pathological) activation function ","element":"span"},{"href":"#id-10","referenceIndex":20,"text":"(Maiorov & Pinkus, ","element":"a"},{"href":"#id-10","referenceIndex":20,"text":"1999)","element":"a"},{"text":"; cf. ","element":"span"},{"href":"#id-9","referenceIndex":24,"text":"(Pinkus, ","element":"a"},{"href":"#id-9","referenceIndex":24,"text":"1999)","element":"a"},{"text":". ","element":"span"},{"href":"#id-16","referenceIndex":17,"text":"Lin et al. ","element":"a"},{"href":"#id-16","referenceIndex":17,"text":"(2017) ","element":"a"},{"text":"showed that for fixed ","element":"span"},{"text":"n, d","element":"span"},{"text":", and a fixed smooth non-linear ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-0.png","element":"img","alt":" σ","inline":true},{"text":", there is a fixed ","element":"span"},{"text":"N ","element":"span"},{"text":"(i.e. ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-1.png","element":"img","alt":" ε","inline":true},{"text":"-free), such that a neural network with ","element":"span"},{"text":"N ","element":"span"},{"text":"hidden units is able to approximate any ","element":"span"},{"style":{"height":16.71},"width":220.48,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-2.png","element":"img","alt":" f ∈ P≤d(X)","inline":true},{"text":". An explicit expression for ","element":"span"},{"text":"N ","element":"span"},{"text":"is not given, but we were able to infer from their constructive proof that ","element":"span"},{"style":{"height":20.27},"width":331.52,"height":50.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-3.png","element":"img","alt":" N = 4�n+d+1d �− 4","inline":true,"padRight":true},{"text":"hidden units are required, over ","element":"span"},{"style":{"height":10.8},"width":92,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-4.png","element":"img","alt":" d − 1","inline":true,"padRight":true},{"text":"hidden layers (for ","element":"span"},{"style":{"height":16},"width":116.32,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-5.png","element":"img","alt":" d ≥ 2)","inline":true},{"text":". In comparison, we require less hidden units and a single hidden layer.","element":"span"}],[{"style":{"width":"100%"},"width":1585,"height":657,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-6.png","element":"img"}],[{"text":"An interesting consequence of Theorem ","element":"span"},{"href":"#id-14","text":"3.3 ","element":"a"},{"text":"is the following: The freezing of lower layers of a neural network, even in the extreme case that all frozen layers are randomly initialized and the last layer is the only “non-frozen” layer, does not necessarily reduce the representability of the resulting model. Specifically, in the single-hidden-layer case, we have shown that if the non-bias weights in the first layer are fixed and randomly chosen from some suitable fixed range, then the UAP holds with probability ","element":"span"},{"text":"1","element":"span"},{"text":", provided that there are sufficiently many hidden units. Of course, this representability does not reveal anything about the learnability of such a model. In practice, layers are already pre-trained before being frozen. It would be interesting to understand quantitatively the difference between having pre-trained frozen layers and having randomly initialized frozen layers.","element":"span"}],[{"text":"Theorem ","element":"span"},{"href":"#id-14","text":"3.3 ","element":"a"},{"text":"can be viewed as a result on random features, which were formally studied in relation to kernel methods ","element":"span"},{"href":"#id-17","referenceIndex":26,"text":"(Rahimi & Recht, ","element":"a"},{"href":"#id-17","referenceIndex":26,"text":"2007)","element":"a"},{"text":". In the case of ReLU activation functions, ","element":"span"},{"href":"#id-18","referenceIndex":29,"text":"Sun et al. ","element":"a"},{"href":"#id-18","referenceIndex":29,"text":"(2019) ","element":"a"},{"text":"proved an analog of Theorem ","element":"span"},{"href":"#id-14","text":"3.3 ","element":"a"},{"text":"for the approximation of functions in a reproducing kernel Hilbert space; cf. ","element":"span"},{"href":"#id-19","referenceIndex":25,"text":"(Rahimi & Recht, ","element":"a"},{"href":"#id-19","referenceIndex":25,"text":"2008)","element":"a"},{"text":". For a good discussion on the role of random features in the representability of neural networks, see ","element":"span"},{"href":"#id-20","referenceIndex":33,"text":"(Yehudai & Shamir, ","element":"a"},{"href":"#id-20","referenceIndex":33,"text":"2019)","element":"a"},{"text":".","element":"span"}],[{"text":"The UAP is also studied in other contexts, most notably in relation to the depth and width of neural networks. ","element":"span"},{"href":"#id-21","referenceIndex":18,"text":"Lu et al. ","element":"a"},{"href":"#id-21","referenceIndex":18,"text":"(2017) ","element":"a"},{"text":"proved the UAP for neural networks with hidden layers of bounded width, under the assumption that ReLU is used as the activation function. Soon after, ","element":"span"},{"href":"#id-22","referenceIndex":9,"text":"Hanin ","element":"a"},{"href":"#id-22","referenceIndex":9,"text":"(2017) ","element":"a"},{"text":"strengthened the bounded-width UAP result by considering the approximation of continuous convex functions. Recently, the role of depth in the expressive power of neural networks has gathered much interest ","element":"span"},{"href":"#id-23","referenceIndex":4,"text":"(Delalleau & Bengio, ","element":"a"},{"href":"#id-23","referenceIndex":4,"text":"2011; ","element":"a"},{"href":"#id-24","referenceIndex":7,"text":"Eldan & Shamir, ","element":"a"},{"href":"#id-24","referenceIndex":7,"text":"2016; ","element":"a"},{"href":"#id-25","referenceIndex":22,"text":"Mhaskar et al., ","element":"a"},{"href":"#id-25","referenceIndex":22,"text":"2017; ","element":"a"},{"href":"#id-26","referenceIndex":23,"text":"Mont´ufar et al., ","element":"a"},{"href":"#id-26","referenceIndex":23,"text":"2014; ","element":"a"},{"href":"#id-27","referenceIndex":30,"text":"Telgarsky, ","element":"a"},{"href":"#id-27","referenceIndex":30,"text":"2016)","element":"a"},{"text":". We do not address depth in this paper, but we believe it is possible that our results can be applied iteratively to deeper neural networks, perhaps in particular for the approximation of compositional functions; cf. ","element":"span"},{"href":"#id-25","referenceIndex":22,"text":"(Mhaskar et al., ","element":"a"},{"href":"#id-25","referenceIndex":22,"text":"2017)","element":"a"},{"text":".","element":"span"}]]},{"heading":"5 AN ALGEBRAIC APPROACH FOR PROVING UAP","paragraphs":[[{"text":"We begin with a “warm-up” result. Subsequent results, even if they seem complicated, are actually multivariate extensions of this “warm-up” result, using very similar ideas.","element":"span"}],[{"id":"id-30","text":"Theorem 5.1. ","element":"span"},{"text":"Let ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") ","element":"span"},{"text":"be a real polynomial of degree ","element":"span"},{"text":"d ","element":"span"},{"text":"with all-non-zero coefficients, and let ","element":"span"},{"style":{"height":10.7},"width":206.08,"height":26.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-7.png","element":"img","alt":"a1, . . . , ad+1","inline":true,"padRight":true},{"text":"be real numbers. For each ","element":"span"},{"style":{"height":14},"width":242.72,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-8.png","element":"img","alt":" 1 ≤ j ≤ d + 1","inline":true},{"text":", define ","element":"span"},{"style":{"height":15.5},"width":196.52,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-9.png","element":"img","alt":" fj : R → R","inline":true,"padRight":true},{"text":"by ","element":"span"},{"style":{"height":16.7},"width":200.32,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-10.png","element":"img","alt":" x �→ p(ajx)","inline":true},{"text":". Then ","element":"span"},{"style":{"height":14.7},"width":202.72,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-11.png","element":"img","alt":"f1, . . . , fd+1","inline":true,"padRight":true},{"text":"are linearly independent if and only if ","element":"span"},{"style":{"height":10.7},"width":206.08,"height":26.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-12.png","element":"img","alt":" a1, . . . , ad+1","inline":true,"padRight":true},{"text":"are distinct.","element":"span"}],[{"text":"Proof. For each ","element":"span"},{"style":{"height":14},"width":200.52,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-13.png","element":"img","alt":" 0 ≤ i, k ≤ d","inline":true,"padRight":true},{"text":"and each ","element":"span"},{"style":{"height":14},"width":227.36,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-14.png","element":"img","alt":" 1 ≤ j ≤ d+1","inline":true},{"text":", let ","element":"span"},{"style":{"height":20.78},"width":59.52,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-15.png","element":"img","alt":" f (i)j","inline":true,"padRight":true},{"text":"(resp. ","element":"span"},{"style":{"height":17.36},"width":55.68,"height":43.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-16.png","element":"img","alt":" p(i)","inline":true},{"text":") be the ","element":"span"},{"text":"i","element":"span"},{"text":"-th derivative of ","element":"span"},{"style":{"height":15.5},"width":32.68,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-17.png","element":"img","alt":" fj","inline":true,"padRight":true},{"text":"(resp. ","element":"span"},{"text":"p","element":"span"},{"text":"), and let ","element":"span"},{"style":{"height":18.58},"width":60.96,"height":46.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-18.png","element":"img","alt":" α(i)k","inline":true,"padRight":true},{"text":"be the coefficient of ","element":"span"},{"style":{"height":13.36},"width":39.56,"height":33.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-19.png","element":"img","alt":" xk","inline":true,"padRight":true},{"text":"in ","element":"span"},{"style":{"height":18.16},"width":112.48,"height":45.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-20.png","element":"img","alt":" p(i)(x)","inline":true},{"text":". Recall that the Wronskian of ","element":"span"},{"style":{"height":16},"width":235.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-21.png","element":"img","alt":" (f1, . . . , fd+1)","inline":true,"padRight":true},{"text":"is defined to be the determinant of the matrix ","element":"span"},{"style":{"height":20.78},"width":496.96,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-22.png","element":"img","alt":" M(x) := [f (i−1)j (x)]1≤i,j≤d+1","inline":true},{"text":". Since ","element":"span"},{"style":{"height":14.7},"width":202.72,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/4-23.png","element":"img","alt":" f1, . . . , fd+1","inline":true,"padRight":true},{"text":"are polynomial functions, it follows that ","element":"span"},{"style":{"height":16},"width":235.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-0.png","element":"img","alt":" (f1, . . . , fd+1)","inline":true,"padRight":true},{"text":"is a sequence of linearly independent functions if and only if its Wronskian is not the zero function ","element":"span"},{"href":"#id-28","referenceIndex":15,"text":"(LeVeque, ","element":"a"},{"href":"#id-28","referenceIndex":15,"text":"1956, ","element":"a"},{"text":"Thm. 4.7(a)). Clearly, if ","element":"span"},{"style":{"height":11.5},"width":125.32,"height":28.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-1.png","element":"img","alt":"ai = aj","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"height":15.2},"width":87.56,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-2.png","element":"img","alt":" i ̸= j","inline":true},{"text":", then ","element":"span"},{"text":"det ","element":"span"},{"text":"M","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") ","element":"span"},{"text":"is identically zero. Thus, it suffices to show that if ","element":"span"},{"style":{"height":10.7},"width":205.6,"height":26.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-3.png","element":"img","alt":" a1, . . . , ad+1","inline":true,"padRight":true},{"text":"are distinct, then the evaluation ","element":"span"},{"text":"det ","element":"span"},{"text":"M","element":"span"},{"text":"(1) ","element":"span"},{"text":"of this Wronskian at ","element":"span"},{"text":"x ","element":"span"},{"text":"= 1 ","element":"span"},{"text":"gives a non-zero value.","element":"span"}],[{"text":"Now, the ","element":"span"},{"text":"(","element":"span"},{"text":"i, j","element":"span"},{"text":")","element":"span"},{"text":"-th entry of ","element":"span"},{"text":"M","element":"span"},{"text":"(1) ","element":"span"},{"text":"equals ","element":"span"},{"style":{"height":20.78},"width":243.52,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-4.png","element":"img","alt":" ai−1j p(i−1)(aj)","inline":true},{"text":", so ","element":"span"},{"style":{"height":16},"width":268.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-5.png","element":"img","alt":" M(1) = M ′M ′′","inline":true},{"text":", where ","element":"span"},{"style":{"height":10.8},"width":57.2,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-6.png","element":"img","alt":" M ′","inline":true,"padRight":true},{"text":"is an upper triangular matrix whose ","element":"span"},{"text":"(","element":"span"},{"text":"i, j","element":"span"},{"text":")","element":"span"},{"text":"-th entry equals ","element":"span"},{"style":{"height":20.78},"width":101.76,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-7.png","element":"img","alt":" α(i−1)j−i","inline":true,"padRight":true},{"text":", and ","element":"span"},{"style":{"height":19.98},"width":371.68,"height":49.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-8.png","element":"img","alt":" M ′′ = [ai−1j ]1≤i,j≤d+1","inline":true,"padRight":true},{"text":"is the transpose of a Vandermonde matrix, whose determinant is","element":"span"}],[{"style":{"width":"30%"},"width":483,"height":88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-9.png","element":"img"}],[{"text":"Note that the ","element":"span"},{"text":"k","element":"span"},{"text":"-th diagonal entry of ","element":"span"},{"style":{"height":10.8},"width":57.2,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-10.png","element":"img","alt":" M ′","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"height":18.77},"width":369.76,"height":46.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-11.png","element":"img","alt":" α(k−1)0 = (k−1)!α(0)k−1","inline":true},{"text":", which is non-zero by assumption, ","element":"span"},{"text":"so ","element":"span"},{"style":{"height":16},"width":213.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-12.png","element":"img","alt":" det(M ′) ̸= 0","inline":true},{"text":". Thus, if ","element":"span"},{"style":{"height":10.7},"width":206.08,"height":26.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-13.png","element":"img","alt":" a1, . . . , ad+1","inline":true,"padRight":true},{"text":"are distinct, then ","element":"span"},{"style":{"height":16},"width":579.2,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-14.png","element":"img","alt":" det M(1) = det(M ′) det(M ′′) ̸= 0","inline":true},{"text":".","element":"span"}],[{"id":"id-29","text":"Definition 5.2. ","element":"span"},{"text":"Given ","element":"span"},{"style":{"height":17.09},"width":496.16,"height":42.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-15.png","element":"img","alt":" N ≥ 1, W ∈ Wn,mN , x0 ∈ Rn","inline":true},{"text":", and any function ","element":"span"},{"style":{"height":14},"width":181.64,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-16.png","element":"img","alt":" g : R → R","inline":true},{"text":", let ","element":"span"},{"style":{"height":16.71},"width":164.8,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-17.png","element":"img","alt":" Fg,x0(W)","inline":true,"padRight":true},{"text":"denote the sequence of functions ","element":"span"},{"style":{"height":16},"width":207.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-18.png","element":"img","alt":" (f1, . . . , fN)","inline":true},{"text":", such that each ","element":"span"},{"style":{"height":15.5},"width":220.52,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-19.png","element":"img","alt":" fj : Rn → R","inline":true,"padRight":true},{"text":"is defined by the map ","element":"span"},{"style":{"height":20.78},"width":385.12,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-20.png","element":"img","alt":"x �→ g(�w(1)j · (x − x0))","inline":true},{"text":". Also, define the set","element":"span"}],[{"style":{"height":21.41},"width":636.16,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-21.png","element":"img","alt":"gWindn,N;x0 := {W ∈ Wn,mN : Fg,x0(W)","inline":true,"padRight":true},{"text":"is linearly independent","element":"span"},{"text":"}","element":"span"},{"text":". ","element":"span"},{"text":"Note that the value of ","element":"span"},{"text":"m ","element":"span"},{"text":"is irrelevant for defining","element":"span"},{"style":{"height":21.22},"width":157.04,"height":53.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-22.png","element":"img","alt":"gWindn,N;x0","inline":true},{"text":".","element":"span"}],[{"id":"id-42","text":"Remark 5.3. ","element":"span"},{"text":"Given ","element":"span"},{"style":{"height":16},"width":374.24,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-23.png","element":"img","alt":" a = (a1, . . . , an) ∈ Rn","inline":true},{"text":", consider the ring automorphism ","element":"span"},{"style":{"height":16},"width":346.72,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-24.png","element":"img","alt":" ϕ : P(Rn) → P(Rn)","inline":true,"padRight":true},{"text":"induced by ","element":"span"},{"style":{"height":10.71},"width":216.44,"height":26.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-25.png","element":"img","alt":" xi �→ xi − ai","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.2},"width":166.08,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-26.png","element":"img","alt":" 1 ≤ i ≤ n","inline":true},{"text":". For any ","element":"span"},{"style":{"height":16},"width":328,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-27.png","element":"img","alt":" f1, . . . , fk ∈ P(Rn)","inline":true,"padRight":true},{"text":"and scalars ","element":"span"},{"style":{"height":12},"width":215.64,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-28.png","element":"img","alt":" α1, . . . , αk ∈","inline":true,"padRight":true},{"text":"R","element":"span"},{"text":", note that ","element":"span"},{"style":{"height":14},"width":408.8,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-29.png","element":"img","alt":" α1f1 + · · · + αkfk = 0","inline":true,"padRight":true},{"text":"if and only if ","element":"span"},{"style":{"height":16},"width":522.56,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-30.png","element":"img","alt":" α1ϕ(f1) + · · · + αkϕ(fk) = 0","inline":true},{"text":", thus linear independence is preserved under ","element":"span"},{"style":{"height":10},"width":26,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-31.png","element":"img","alt":" ϕ","inline":true},{"text":". Consequently, if the function ","element":"span"},{"text":"g ","element":"span"},{"text":"in Definition ","element":"span"},{"href":"#id-29","text":"5.2 ","element":"a"},{"text":"is polynomial, then","element":"span"},{"style":{"height":21.22},"width":373.68,"height":53.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-32.png","element":"img","alt":"gWindn,N;x0 =gWindn,N;0n","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.11},"width":138.08,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-33.png","element":"img","alt":" x0 ∈ Rn","inline":true},{"text":".","element":"span"}],[{"id":"id-31","text":"Corollary 5.4. ","element":"span"},{"text":"Let ","element":"span"},{"text":"m ","element":"span"},{"text":"be arbitrary. If ","element":"span"},{"style":{"height":16},"width":177.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-34.png","element":"img","alt":" p ∈ Pd(R)","inline":true,"padRight":true},{"text":"has all-non-zero coefficients, then","element":"span"},{"style":{"height":21.41},"width":165.28,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-35.png","element":"img","alt":"pWind1,d+1;0","inline":true,"padRight":true},{"text":"is a ","element":"span"},{"text":"dense subset of ","element":"span"},{"style":{"height":19.57},"width":96.64,"height":48.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-36.png","element":"img","alt":" W1,md+1","inline":true,"padRight":true},{"text":"(in the Euclidean metric).","element":"span"}],[{"text":"Proof. For all ","element":"span"},{"style":{"height":14},"width":268.32,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-37.png","element":"img","alt":" 1 ≤ j < j′ ≤ N","inline":true},{"text":", let ","element":"span"},{"style":{"height":20.98},"width":692,"height":52.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-38.png","element":"img","alt":" Aj,j′ := {W ∈ W1,md+1 : w(1)1,j′ − w(1)1,j ̸= 0}","inline":true},{"text":", and note that ","element":"span"},{"style":{"height":16.3},"width":78.64,"height":40.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-39.png","element":"img","alt":" Aj,j′","inline":true,"padRight":true},{"text":"is dense in ","element":"span"},{"style":{"height":19.38},"width":96.64,"height":48.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-40.png","element":"img","alt":" W1,md+1","inline":true},{"text":". So by Theorem ","element":"span"},{"href":"#id-30","text":"5.1,","element":"a"},{"style":{"height":21.6},"width":500.08,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/5-41.png","element":"img","alt":"pWind1,d+1;0 = �1≤j 0","inline":true},{"text":".","element":"span"}],[{"text":"For the rest of this section, let ","element":"span"},{"style":{"height":16},"width":141.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-7.png","element":"img","alt":" {λk}k∈N","inline":true,"padRight":true},{"text":"be a divergent increasing sequence of positive real numbers, and let ","element":"span"},{"style":{"height":16},"width":141.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-8.png","element":"img","alt":" {Yk}k∈N","inline":true,"padRight":true},{"text":"be a sequence of closed intervals of ","element":"span"},{"text":"R","element":"span"},{"text":", such that ","element":"span"},{"style":{"height":13.2},"width":150.92,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-9.png","element":"img","alt":" Yk′ ⊆ Yk","inline":true,"padRight":true},{"text":"whenever ","element":"span"},{"style":{"height":13.2},"width":111.72,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-10.png","element":"img","alt":" k′ ≤ k","inline":true},{"text":", and such that each interval ","element":"span"},{"style":{"height":16.61},"width":216.44,"height":41.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-11.png","element":"img","alt":" Yk = [y′k, y′′k]","inline":true,"padRight":true},{"text":"has length ","element":"span"},{"style":{"height":13.1},"width":40.04,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-12.png","element":"img","alt":" λk","inline":true},{"text":". Let ","element":"span"},{"style":{"height":13.2},"width":93.92,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-13.png","element":"img","alt":" d ≥ 1","inline":true,"padRight":true},{"text":"be an integer, and suppose ","element":"span"},{"style":{"height":16},"width":156.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-14.png","element":"img","alt":" σ ∈ C(R)","inline":true},{"text":". ","element":"span"},{"text":"For each ","element":"span"},{"style":{"height":11.6},"width":105.32,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-15.png","element":"img","alt":" k ∈ N","inline":true},{"text":", let ","element":"span"},{"style":{"height":9.1},"width":39.56,"height":22.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-16.png","element":"img","alt":" σk","inline":true,"padRight":true},{"text":"be the best polynomial approximant to ","element":"span"},{"style":{"height":16.03},"width":68.76,"height":40.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-17.png","element":"img","alt":" σ|Yk","inline":true,"padRight":true},{"text":"of degree ","element":"span"},{"text":"d","element":"span"},{"text":". Given ","element":"span"},{"text":"r > ","element":"span"},{"text":"0 ","element":"span"},{"text":"and any integer ","element":"span"},{"style":{"height":13.2},"width":109.76,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-18.png","element":"img","alt":" N ≥ 1","inline":true},{"text":", define the closed ball ","element":"span"},{"style":{"height":17.36},"width":482.72,"height":43.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-19.png","element":"img","alt":" BNr := {x ∈ RN : ∥x∥2 ≤ r}","inline":true},{"text":".","element":"span"}],[{"id":"id-39","text":"Lemma 5.9. ","element":"span"},{"text":"If ","element":"span"},{"style":{"height":16.03},"width":519.04,"height":40.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-20.png","element":"img","alt":" d ≥ 2, limk→∞ Ed(σ|Yk) = ∞","inline":true},{"text":", and ","element":"span"},{"style":{"height":16},"width":200.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-21.png","element":"img","alt":" λk ∈ Ω(kγ)","inline":true,"padRight":true},{"text":"for some ","element":"span"},{"style":{"height":14.4},"width":102.08,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-22.png","element":"img","alt":" γ > 0","inline":true},{"text":", then for every ","element":"span"},{"style":{"height":11.6},"width":100.16,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-23.png","element":"img","alt":"ε > 0","inline":true},{"text":", there is a subsequence ","element":"span"},{"style":{"height":16},"width":128.48,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-24.png","element":"img","alt":" {kt}t∈N","inline":true,"padRight":true},{"text":"of ","element":"span"},{"text":"N","element":"span"},{"text":", and a sequence ","element":"span"},{"style":{"height":16},"width":145.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-25.png","element":"img","alt":" {ykt}t∈N","inline":true,"padRight":true},{"text":"of real numbers, such that ","element":"span"},{"style":{"height":17.95},"width":579.52,"height":44.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-26.png","element":"img","alt":"y′kt < ykt < y′′kt, σ(ykt) = σkt(ykt)","inline":true},{"text":", and","element":"span"}],[{"style":{"width":"44%"},"width":706,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-27.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":11.6},"width":92.36,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-28.png","element":"img","alt":" t ∈ N","inline":true},{"text":". (See Appendix ","element":"span"},{"href":"#id-37","referenceIndex":74,"text":"B ","element":"a"},{"text":"for proof details.)","element":"span"}],[{"id":"id-40","style":{"width":"66%"},"width":1046,"height":211,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-29.png","element":"img"}],[{"id":"id-43","text":"Lemma 5.11. ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":13.2},"width":202.88,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-30.png","element":"img","alt":" K ≥ N ≥ 1","inline":true,"padRight":true},{"text":"be integers, let ","element":"span"},{"style":{"height":14},"width":247.52,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-31.png","element":"img","alt":" r0, . . . , rN ≥ 1","inline":true,"padRight":true},{"text":"be fixed real numbers, and let ","element":"span"},{"style":{"height":16},"width":81.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-32.png","element":"img","alt":" S(λ)","inline":true,"padRight":true},{"text":"be a set ","element":"span"},{"style":{"height":16},"width":324.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-33.png","element":"img","alt":" {p0(λ), . . . , pN(λ)}","inline":true,"padRight":true},{"text":"of ","element":"span"},{"text":"N ","element":"span"},{"text":"+ 1 ","element":"span"},{"text":"affinely independent points in ","element":"span"},{"style":{"height":13.36},"width":56.8,"height":33.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-34.png","element":"img","alt":" RK","inline":true},{"text":", parametrized by ","element":"span"},{"style":{"height":11.6},"width":101.6,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-35.png","element":"img","alt":" λ > 0","inline":true},{"text":", where each point ","element":"span"},{"style":{"height":16},"width":87.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-36.png","element":"img","alt":" pi(λ)","inline":true,"padRight":true},{"text":"has (Cartesian) coordinates ","element":"span"},{"style":{"height":16.7},"width":355.36,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-37.png","element":"img","alt":" (λripi,1, . . . , λripi,K)","inline":true,"padRight":true},{"text":"for some fixed non-zero scalars ","element":"span"},{"style":{"height":11.5},"width":216.16,"height":28.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-38.png","element":"img","alt":" pi,1, . . . , pi,K","inline":true},{"text":". Let ","element":"span"},{"style":{"height":16},"width":87.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-39.png","element":"img","alt":" ∆(λ)","inline":true,"padRight":true},{"text":"be the convex hull of ","element":"span"},{"style":{"height":16},"width":81.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-40.png","element":"img","alt":" S(λ)","inline":true},{"text":", i.e. ","element":"span"},{"style":{"height":16},"width":87.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-41.png","element":"img","alt":" ∆(λ)","inline":true,"padRight":true},{"text":"is an ","element":"span"},{"text":"N","element":"span"},{"text":"-simplex, and for each ","element":"span"},{"style":{"height":13.2},"width":176.16,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-42.png","element":"img","alt":"0 ≤ i ≤ N","inline":true},{"text":", let ","element":"span"},{"style":{"height":16},"width":90.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-43.png","element":"img","alt":" hi(λ)","inline":true,"padRight":true},{"text":"be the height of ","element":"span"},{"style":{"height":16},"width":87.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-44.png","element":"img","alt":" ∆(λ)","inline":true,"padRight":true},{"text":"w.r.t. apex ","element":"span"},{"style":{"height":16},"width":88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-45.png","element":"img","alt":" pi(λ)","inline":true},{"text":". Let ","element":"span"},{"style":{"height":16},"width":555.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-46.png","element":"img","alt":" h(λ) := max{hi(λ) : 0 ≤ i ≤ N}","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16},"width":423.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-47.png","element":"img","alt":" rmin := min{r1, . . . , rN}","inline":true},{"text":". If ","element":"span"},{"style":{"height":13.5},"width":166.8,"height":33.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-48.png","element":"img","alt":" rj > rmin","inline":true,"padRight":true},{"text":"for some ","element":"span"},{"style":{"height":14},"width":196.8,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-49.png","element":"img","alt":" 0 ≤ j ≤ N","inline":true},{"text":", then there exists some ","element":"span"},{"style":{"height":14.4},"width":103.52,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-50.png","element":"img","alt":" γ > 0","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":16.96},"width":316,"height":42.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-51.png","element":"img","alt":" h(λ) ∈ Ω(λrmin+γ)","inline":true},{"text":".","element":"span"}],[{"id":"id-38","text":"Lemma 5.12. ","element":"span"},{"text":"Let ","element":"span"},{"style":{"height":14},"width":169.76,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-52.png","element":"img","alt":" M, N ≥ 1","inline":true,"padRight":true},{"text":"be integers, let ","element":"span"},{"style":{"height":11.6},"width":96.32,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-53.png","element":"img","alt":" τ > 0","inline":true},{"text":", and let ","element":"span"},{"style":{"height":11.6},"width":169.28,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-54.png","element":"img","alt":" 0 < θ < 1","inline":true},{"text":". Suppose ","element":"span"},{"style":{"height":16.56},"width":243.96,"height":41.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-55.png","element":"img","alt":" ϕ : RM → RN","inline":true,"padRight":true},{"text":"is a continuous open map such that ","element":"span"},{"style":{"height":16},"width":222.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-56.png","element":"img","alt":" ϕ(0M) = 0N","inline":true},{"text":", and ","element":"span"},{"style":{"height":16},"width":263.2,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-57.png","element":"img","alt":" ϕ(λx) ≥ λϕ(x)","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":15.76},"width":261.92,"height":39.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-58.png","element":"img","alt":" x ∈ RM, λ > 0","inline":true},{"text":". Let ","element":"span"},{"style":{"height":16},"width":145.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-59.png","element":"img","alt":"{Uk}k∈N","inline":true,"padRight":true},{"text":"be a sequence where each ","element":"span"},{"style":{"height":13.1},"width":44.36,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-60.png","element":"img","alt":" Uk","inline":true,"padRight":true},{"text":"is a dense subspace of ","element":"span"},{"style":{"height":19.7},"width":169.08,"height":49.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-61.png","element":"img","alt":" BMλk\\BMθλk","inline":true},{"text":". Then for every ","element":"span"},{"style":{"height":12.4},"width":99.68,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-62.png","element":"img","alt":" δ > 0","inline":true},{"text":", ","element":"span"},{"text":"there exists some (sufficiently large) ","element":"span"},{"style":{"height":11.6},"width":105.32,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-63.png","element":"img","alt":" k ∈ N","inline":true},{"text":", and some points ","element":"span"},{"style":{"height":10},"width":179.64,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-64.png","element":"img","alt":" u0, . . . , uN","inline":true,"padRight":true},{"text":"in ","element":"span"},{"style":{"height":13.1},"width":44.36,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-65.png","element":"img","alt":" Uk","inline":true},{"text":", such that for each point ","element":"span"},{"style":{"height":17.39},"width":130.68,"height":43.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-66.png","element":"img","alt":" p ∈ BNτ","inline":true,"padRight":true},{"text":", there are scalars ","element":"span"},{"style":{"height":14},"width":246.56,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-67.png","element":"img","alt":" b0, . . . , bN ≥ 0","inline":true,"padRight":true},{"text":"satisfying ","element":"span"},{"style":{"height":20.27},"width":627.2,"height":50.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-68.png","element":"img","alt":" p = �Ni=0 biϕ(ui), b0 + · · · + bN = 1","inline":true},{"text":", ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":19.31},"width":211.49,"height":48.28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-69.png","element":"img","alt":" |bi − 1N | < δ","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.2},"width":176.16,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-70.png","element":"img","alt":" 0 ≤ i ≤ N","inline":true},{"text":".","element":"span"}],[{"text":"Outline of strategy for proving Theorem ","element":"span"},{"href":"#id-14","text":"3.1. ","element":"a"},{"text":"The first crucial insight is that ","element":"span"},{"style":{"height":16.7},"width":153.28,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-71.png","element":"img","alt":" P≤d(Rn)","inline":true},{"text":", as a real vector space, has dimension","element":"span"},{"style":{"height":20.27},"width":97.2,"height":50.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-72.png","element":"img","alt":"�n+dd �","inline":true},{"text":". Our strategy is to consider ","element":"span"},{"style":{"height":20.27},"width":198.48,"height":50.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-73.png","element":"img","alt":" N =�n+dd �","inline":true},{"text":"hidden units. Every hidden unit represents a continuous function ","element":"span"},{"style":{"height":15.5},"width":221.96,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-74.png","element":"img","alt":" gj : X → R","inline":true,"padRight":true},{"text":"determined by its weights ","element":"span"},{"text":"W ","element":"span"},{"text":"and the activation function ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-75.png","element":"img","alt":" σ","inline":true},{"text":". If ","element":"span"},{"style":{"height":10},"width":171.48,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-76.png","element":"img","alt":" g1, . . . , gN","inline":true,"padRight":true},{"text":"can be well-approximated (on ","element":"span"},{"text":"X","element":"span"},{"text":") by linearly independent polynomial functions in ","element":"span"},{"style":{"height":16.71},"width":153.28,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-77.png","element":"img","alt":" P≤d(Rn)","inline":true},{"text":", then we can choose suitable linear combinations of these ","element":"span"},{"text":"N ","element":"span"},{"text":"functions to approximate all coordinate functions ","element":"span"},{"style":{"height":17.36},"width":54.12,"height":43.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-78.png","element":"img","alt":" f [t]","inline":true,"padRight":true},{"text":"(independent of how large ","element":"span"},{"text":"m ","element":"span"},{"text":"is). To approximate each ","element":"span"},{"style":{"height":11.5},"width":32.2,"height":28.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-79.png","element":"img","alt":"gj","inline":true},{"text":", we consider a suitable sequence ","element":"span"},{"style":{"height":16.41},"width":159.04,"height":41.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-80.png","element":"img","alt":" {σλk}∞k=1","inline":true,"padRight":true},{"text":"of degree ","element":"span"},{"text":"d ","element":"span"},{"text":"polynomial approximations to ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-81.png","element":"img","alt":" σ","inline":true},{"text":", so that ","element":"span"},{"style":{"height":11.51},"width":32.2,"height":28.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-82.png","element":"img","alt":"gj","inline":true,"padRight":true},{"text":"is approximated by a sequence of degree ","element":"span"},{"text":"d ","element":"span"},{"text":"polynomial functions ","element":"span"},{"style":{"height":20.18},"width":159.04,"height":50.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-83.png","element":"img","alt":" {�gWj,k}∞k=1","inline":true},{"text":". We shall also vary ","element":"span"},{"text":"W ","element":"span"},{"text":"concurrently with ","element":"span"},{"text":"k","element":"span"},{"text":", so that ","element":"span"},{"style":{"height":20.78},"width":132.64,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-84.png","element":"img","alt":" ∥ �w(1)j ∥2","inline":true,"padRight":true},{"text":"increases together with ","element":"span"},{"text":"k","element":"span"},{"text":". By Corollary ","element":"span"},{"href":"#id-36","text":"5.7, ","element":"a"},{"text":"the weights ","element":"span"},{"text":"can always be chosen so that ","element":"span"},{"style":{"height":20.37},"width":225.32,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-85.png","element":"img","alt":" �gW1,k, . . . , �gWN,k","inline":true,"padRight":true},{"text":"are linearly independent.","element":"span"}],[{"text":"The second crucial insight is that every function in ","element":"span"},{"style":{"height":16.7},"width":153.28,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-86.png","element":"img","alt":" P≤d(Rn)","inline":true,"padRight":true},{"text":"can be identified geometrically as a point in Euclidean","element":"span"},{"style":{"height":20.27},"width":96.72,"height":50.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-87.png","element":"img","alt":"�n+dd �","inline":true},{"text":"-space. We shall choose the bias weights so that ","element":"span"},{"style":{"height":20.37},"width":225.32,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/6-88.png","element":"img","alt":" �gW1,k, . . . , �gWN,k","inline":true,"padRight":true},{"text":"correspond ","element":"span"},{"text":"to points on a hyperplane, and we shall consider the barycentric coordinates of the projections of both ","element":"span"},{"style":{"height":17.36},"width":54.12,"height":43.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-0.png","element":"img","alt":" f [t]","inline":true,"padRight":true},{"text":"and the constant function onto this hyperplane, with respect to ","element":"span"},{"style":{"height":20.37},"width":225.32,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-1.png","element":"img","alt":" �gW1,k, . . . , �gWN,k","inline":true},{"text":". As the ","element":"span"},{"text":"values of ","element":"span"},{"text":"k ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":20.78},"width":132.64,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-2.png","element":"img","alt":" ∥ �w(1)j ∥2","inline":true,"padRight":true},{"text":"increase, both projection points have barycentric coordinates that approach ","element":"span"},{"style":{"height":19.5},"width":195.52,"height":48.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-3.png","element":"img","alt":"( 1N , . . . , 1N )","inline":true},{"text":", and their difference approaches ","element":"span"},{"text":"0","element":"span"},{"text":"; cf. Lemma ","element":"span"},{"href":"#id-38","text":"5.12. ","element":"a"},{"text":"This last observation, in particular, ","element":"span"},{"text":"when combined with Lemma ","element":"span"},{"href":"#id-39","text":"5.9 ","element":"a"},{"text":"and Lemma ","element":"span"},{"href":"#id-40","text":"5.10, ","element":"a"},{"text":"is a key reason why the minimum number ","element":"span"},{"text":"N ","element":"span"},{"text":"of hidden units required for the UAP to hold is independent of the approximation error threshold ","element":"span"},{"style":{"height":7.2},"width":19,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-4.png","element":"img","alt":" ε","inline":true},{"text":".","element":"span"}],[{"text":"Proof of Theorem ","element":"span"},{"href":"#id-14","text":"3.1. ","element":"a"},{"text":"Fix some ","element":"span"},{"style":{"height":11.6},"width":98.24,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-5.png","element":"img","alt":" ε > 0","inline":true},{"text":", and for brevity, let ","element":"span"},{"style":{"height":20.08},"width":193.2,"height":50.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-6.png","element":"img","alt":" N =�n+dd �","inline":true},{"text":". Theorem ","element":"span"},{"href":"#id-14","text":"3.1 ","element":"a"},{"text":"is trivially true when ","element":"span"},{"text":"f ","element":"span"},{"text":"is constant, so assume ","element":"span"},{"text":"f ","element":"span"},{"text":"is non-constant. Fix a point ","element":"span"},{"style":{"height":13.11},"width":124.28,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-7.png","element":"img","alt":" x0 ∈ X","inline":true},{"text":", and define ","element":"span"},{"style":{"height":16},"width":254.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-8.png","element":"img","alt":" f0 ∈ C(X, Rm)","inline":true,"padRight":true},{"text":"by ","element":"span"},{"style":{"height":20.88},"width":359.68,"height":52.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-9.png","element":"img","alt":" f [t]0 := f [t] − f [t](x0)","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.2},"width":185.72,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-10.png","element":"img","alt":" 1 ≤ t ≤ m","inline":true},{"text":". Next, let ","element":"span"},{"style":{"height":16},"width":612.32,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-11.png","element":"img","alt":" rX(x0) := sup{∥x − x0∥2 : x ∈ X}","inline":true},{"text":", and ","element":"span"},{"text":"note that ","element":"span"},{"style":{"height":16},"width":213.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-12.png","element":"img","alt":" rX(x0) < ∞","inline":true},{"text":", since ","element":"span"},{"text":"X ","element":"span"},{"text":"is compact. By replacing ","element":"span"},{"text":"X ","element":"span"},{"text":"with a closed tubular neighborhood of ","element":"span"},{"text":"X ","element":"span"},{"text":"if necessary, we may assume without loss of generality that ","element":"span"},{"style":{"height":16},"width":193.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-13.png","element":"img","alt":" rX(x0) > 0","inline":true},{"text":".","element":"span"}],[{"text":"Define ","element":"span"},{"style":{"height":16},"width":308.48,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-14.png","element":"img","alt":" {λk}k∈N, {Yk}k∈N","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16},"width":141.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-15.png","element":"img","alt":" {σk}k∈N","inline":true,"padRight":true},{"text":"as before, with an additional condition that ","element":"span"},{"style":{"height":16},"width":203.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-16.png","element":"img","alt":" λk ∈ Ω(kτ)","inline":true,"padRight":true},{"text":"for some ","element":"span"},{"style":{"height":11.6},"width":106.4,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-17.png","element":"img","alt":" τ > 0","inline":true},{"text":". Assume without loss of generality that there exists a sequence ","element":"span"},{"style":{"height":16},"width":138.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-18.png","element":"img","alt":" {yk}k∈N","inline":true,"padRight":true},{"text":"of real numbers, such that ","element":"span"},{"style":{"height":16.61},"width":506.08,"height":41.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-19.png","element":"img","alt":" y′k < yk < y′′k, σ(yk) = σk(yk)","inline":true},{"text":", and","element":"span"}],[{"id":"id-41","style":{"width":"83%"},"width":1329,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-20.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":11.6},"width":115.4,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-21.png","element":"img","alt":" k ∈ N","inline":true},{"text":". The validity of this assumption in the case ","element":"span"},{"style":{"height":16.03},"width":406.72,"height":40.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-22.png","element":"img","alt":" limk→∞ Ed(σ|Yk) = ∞","inline":true,"padRight":true},{"text":"is given by Lemma ","element":"span"},{"href":"#id-39","text":"5.9. ","element":"a"},{"text":"If instead ","element":"span"},{"style":{"height":16.03},"width":394.24,"height":40.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-23.png","element":"img","alt":" limk→∞ Ed(σ|Yk) < ∞","inline":true},{"text":", then as ","element":"span"},{"style":{"height":11.2},"width":126.88,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-24.png","element":"img","alt":" k → ∞","inline":true},{"text":", the sequence ","element":"span"},{"style":{"height":16},"width":141.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-25.png","element":"img","alt":" {σk}k∈N","inline":true,"padRight":true},{"text":"converges to some ","element":"span"},{"style":{"height":16.7},"width":212.32,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-26.png","element":"img","alt":" �σ ∈ P≤d(R)","inline":true},{"text":". Hence, the assumption is also valid in this case, since for any ","element":"span"},{"style":{"height":14},"width":106.76,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-27.png","element":"img","alt":" �y ∈ R","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":16},"width":209.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-28.png","element":"img","alt":" σ(�y) = �σ(�y)","inline":true},{"text":", we can always choose ","element":"span"},{"style":{"height":16},"width":141.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-29.png","element":"img","alt":" {Yk}k∈N","inline":true,"padRight":true},{"text":"to satisfy ","element":"span"},{"style":{"height":20.64},"width":178.4,"height":51.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-30.png","element":"img","alt":"y′k+y′′k2 = �y","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":11.6},"width":103.88,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-31.png","element":"img","alt":" k ∈ N","inline":true},{"text":", which then allows us to choose ","element":"span"},{"style":{"height":16},"width":138.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-32.png","element":"img","alt":" {yk}k∈N","inline":true,"padRight":true},{"text":"that satisfies ","element":"span"},{"style":{"height":23.76},"width":668.32,"height":59.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-33.png","element":"img","alt":" limk→∞min{|yk−y′k|,|yk−y′′k |}λk = 12 > 1d+2","inline":true},{"text":".","element":"span"}],[{"text":"By Lemma ","element":"span"},{"href":"#id-40","text":"5.10, ","element":"a"},{"text":"we may further assume that ","element":"span"},{"style":{"height":23.65},"width":411.48,"height":59.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-34.png","element":"img","alt":" ∥σk − σ∥∞,Yk < ε(λk)1+γC","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":11.6},"width":100.04,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-35.png","element":"img","alt":" k ∈ N","inline":true},{"text":", where ","element":"span"},{"text":"C > ","element":"span"},{"text":"0 ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":14.4},"width":97.76,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-36.png","element":"img","alt":" γ > 0","inline":true,"padRight":true},{"text":"are constants whose precise definitions we give later. Also, for any ","element":"span"},{"style":{"height":17.09},"width":194.08,"height":42.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-37.png","element":"img","alt":" W ∈ Wn,mN","inline":true,"padRight":true},{"text":", we can choose ","element":"span"},{"style":{"height":16},"width":171.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-38.png","element":"img","alt":" σ′ ∈ C(R)","inline":true,"padRight":true},{"text":"that is arbitrarily close to ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-39.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"in the uniform metric, such that ","element":"span"},{"style":{"height":16.7},"width":272.28,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-40.png","element":"img","alt":" ∥ρσW − ρσ′W ∥∞,X","inline":true,"padRight":true},{"text":"is ","element":"span"},{"text":"arbitrarily small. Since ","element":"span"},{"style":{"height":16.71},"width":349.12,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-41.png","element":"img","alt":" σ ∈ C(R)\\P≤d−1(R)","inline":true,"padRight":true},{"text":"by assumption, we may hence perturb ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-42.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"if necessary, and assume without loss of generality that every ","element":"span"},{"style":{"height":9.1},"width":39.56,"height":22.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-43.png","element":"img","alt":" σk","inline":true,"padRight":true},{"text":"is a polynomial of degree ","element":"span"},{"text":"d ","element":"span"},{"text":"with all-non-zero coefficients, such that ","element":"span"},{"style":{"height":16},"width":185.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-44.png","element":"img","alt":" σk(yk) ̸= 0","inline":true},{"text":".","element":"span"}],[{"style":{"width":"99%"},"width":1577,"height":160,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-45.png","element":"img"}],[{"text":"Each ","element":"span"},{"style":{"height":15.41},"width":40.04,"height":38.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-46.png","element":"img","alt":" λ′k","inline":true,"padRight":true},{"text":"is well-defined, since ","element":"span"},{"style":{"height":16},"width":213.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-47.png","element":"img","alt":" rX(x0) < ∞","inline":true},{"text":". Note also that ","element":"span"},{"style":{"height":16.61},"width":632,"height":41.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-48.png","element":"img","alt":" λ′krX(x0) = min{|yk − y′k|, |yk − y′′k|}","inline":true,"padRight":true},{"text":"by definition, hence it follows from ","element":"span"},{"href":"#id-41","text":"(3) ","element":"a"},{"text":"that ","element":"span"},{"style":{"height":23.54},"width":360.16,"height":58.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-49.png","element":"img","alt":"λkλ′k < (d + 2)rX(x0)","inline":true},{"text":". In particular, ","element":"span"},{"style":{"height":16.42},"width":141.92,"height":41.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-50.png","element":"img","alt":" {λ′k}k∈N","inline":true,"padRight":true},{"text":"is a ","element":"span"},{"text":"divergent increasing sequence of positive real numbers.","element":"span"}],[{"text":"Given any ","element":"span"},{"style":{"height":16.71},"width":226.24,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-51.png","element":"img","alt":" p ∈ P≤d(Rn)","inline":true},{"text":", let ","element":"span"},{"style":{"height":17.55},"width":182.04,"height":43.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-52.png","element":"img","alt":" ν(p) ∈ RN","inline":true,"padRight":true},{"text":"denote the vector of coefficients with respect to the basis ","element":"span"},{"style":{"height":16},"width":501.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-53.png","element":"img","alt":"{q1(x − x0), . . . , qN(x − x0)}","inline":true,"padRight":true},{"text":"(i.e. if ","element":"span"},{"style":{"height":16},"width":338.56,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-54.png","element":"img","alt":" ν(p) = (ν1, . . . , νN)","inline":true},{"text":", then ","element":"span"},{"style":{"height":19.2},"width":496.48,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-55.png","element":"img","alt":" p(x) = �1≤i≤N νiqi(x − x0)","inline":true},{"text":"), ","element":"span"},{"text":"and let ","element":"span"},{"style":{"height":17.36},"width":229.12,"height":43.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-56.png","element":"img","alt":" �ν(p) ∈ RN−1","inline":true,"padRight":true},{"text":"be the truncation of ","element":"span"},{"style":{"height":16},"width":73.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-57.png","element":"img","alt":" ν(p)","inline":true,"padRight":true},{"text":"by removing the first coordinate. Note that ","element":"span"},{"style":{"height":16},"width":89.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-58.png","element":"img","alt":" q1(x)","inline":true,"padRight":true},{"text":"is the constant monomial, so this first coordinate ","element":"span"},{"style":{"height":9.1},"width":35.68,"height":22.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-59.png","element":"img","alt":" ν1","inline":true,"padRight":true},{"text":"is the coefficient of the constant term. For convenience, let ","element":"span"},{"style":{"height":16},"width":84.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-60.png","element":"img","alt":" νi(p)","inline":true,"padRight":true},{"text":"(resp. ","element":"span"},{"style":{"height":16},"width":84.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-61.png","element":"img","alt":" �νi(p)","inline":true},{"text":") be the ","element":"span"},{"text":"i","element":"span"},{"text":"-th entry of ","element":"span"},{"style":{"height":16},"width":73.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-62.png","element":"img","alt":" ν(p)","inline":true,"padRight":true},{"text":"(resp. ","element":"span"},{"style":{"height":16},"width":74.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-63.png","element":"img","alt":" �ν(p)","inline":true},{"text":").","element":"span"}],[{"text":"For each ","element":"span"},{"style":{"height":18.1},"width":488.16,"height":45.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-64.png","element":"img","alt":" k ∈ N, W ∈ W′λ′k, 1 ≤ j ≤ N","inline":true},{"text":", define functions ","element":"span"},{"style":{"height":20.37},"width":138.92,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-65.png","element":"img","alt":" gWj,k, �gWj,k","inline":true,"padRight":true},{"text":"in ","element":"span"},{"text":"C","element":"span"},{"text":"(","element":"span"},{"text":"X","element":"span"},{"text":") ","element":"span"},{"text":"by ","element":"span"},{"style":{"height":20.78},"width":328.96,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-66.png","element":"img","alt":" x �→ σ(w(1)j ·(1, x))","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":20.78},"width":364.48,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-67.png","element":"img","alt":" x �→ σk(w(1)j · (1, x))","inline":true,"padRight":true},{"text":"respectively. By definition, ","element":"span"},{"style":{"height":20.37},"width":125.92,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-68.png","element":"img","alt":" νi(�gWj,k)","inline":true,"padRight":true},{"text":"can be treated as a function of ","element":"span"},{"text":"W","element":"span"},{"text":", ","element":"span"},{"text":"and note that ","element":"span"},{"style":{"height":20.37},"width":424,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-69.png","element":"img","alt":" νi(�gλWj,k ) = λdeg qiνi(�gWj,k)","inline":true,"padRight":true},{"text":"for any ","element":"span"},{"style":{"height":11.6},"width":96.8,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-70.png","element":"img","alt":" λ > 0","inline":true},{"text":". (Here, ","element":"span"},{"style":{"height":14},"width":95.96,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-71.png","element":"img","alt":" deg qi","inline":true,"padRight":true},{"text":"denotes the total degree of ","element":"span"},{"style":{"height":10},"width":28.76,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-72.png","element":"img","alt":" qi","inline":true},{"text":".) ","element":"span"},{"text":"Since ","element":"span"},{"style":{"height":14},"width":171.2,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-73.png","element":"img","alt":" deg qi = 0","inline":true,"padRight":true},{"text":"only if ","element":"span"},{"text":"i ","element":"span"},{"text":"= 1","element":"span"},{"text":", it then follows that ","element":"span"},{"style":{"height":19.6},"width":341.44,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-74.png","element":"img","alt":" �νi(�gλWj,k ) ≥ λ�νi(�gWj,k)","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":11.6},"width":96.32,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-75.png","element":"img","alt":" λ > 0","inline":true},{"text":".","element":"span"}],[{"text":"For each ","element":"span"},{"style":{"height":11.6},"width":119.72,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-76.png","element":"img","alt":" k ∈ N","inline":true},{"text":", define the “shifted” function ","element":"span"},{"style":{"height":15.22},"width":249.32,"height":38.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-77.png","element":"img","alt":" σ′k : Yk → R","inline":true,"padRight":true},{"text":"by ","element":"span"},{"style":{"height":16},"width":293.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-78.png","element":"img","alt":" y �→ σk(y + yk)","inline":true},{"text":". Next, let ","element":"span"},{"style":{"height":22.19},"width":594.88,"height":55.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-79.png","element":"img","alt":"W′′k := σ′kWindn,N;x0 ∩ (W′λ′k\\W′0.5λ′k)","inline":true},{"text":", and suppose ","element":"span"},{"style":{"height":15.41},"width":158,"height":38.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-80.png","element":"img","alt":" W ∈ W′′k","inline":true,"padRight":true},{"text":". Note that in the definition of ","element":"span"},{"style":{"height":15.41},"width":65.84,"height":38.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-81.png","element":"img","alt":" W′′k","inline":true,"padRight":true},{"text":", we ","element":"span"},{"text":"do not impose any restrictions on the bias weights. Thus, given any such ","element":"span"},{"text":"W","element":"span"},{"text":", we could choose the bias weights of ","element":"span"},{"style":{"height":14.16},"width":83.52,"height":35.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-82.png","element":"img","alt":" W (1)","inline":true,"padRight":true},{"text":"to be ","element":"span"},{"style":{"height":20.59},"width":367.84,"height":51.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-83.png","element":"img","alt":" w(1)j,0 = yk − �w(1)j · x0","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":14},"width":192.48,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-84.png","element":"img","alt":" 1 ≤ j ≤ N","inline":true},{"text":". This implies that each ","element":"span"},{"style":{"height":20.18},"width":58.76,"height":50.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-85.png","element":"img","alt":" �gWj,k","inline":true,"padRight":true},{"text":"represents the map ","element":"span"},{"style":{"height":20.78},"width":464.8,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-86.png","element":"img","alt":" x �→ σk(�w(1)j ·(x−x0)+yk)","inline":true},{"text":", hence ","element":"span"},{"style":{"height":20.37},"width":446.56,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/7-87.png","element":"img","alt":" �gWj,k(x0) = σk(yk) = σ(yk)","inline":true},{"text":". Consequently,","element":"span"}],[{"style":{"width":"99%"},"width":1583,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-0.png","element":"img"}],[{"text":"By Corollary ","element":"span"},{"href":"#id-36","text":"5.7 ","element":"a"},{"text":"and Remark ","element":"span"},{"href":"#id-42","text":"5.3, ","element":"a"},{"style":{"height":15.22},"width":65.84,"height":38.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-1.png","element":"img","alt":" W′′k","inline":true,"padRight":true},{"text":"is dense in ","element":"span"},{"style":{"height":19.1},"width":248.32,"height":47.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-2.png","element":"img","alt":" (W′λ′k\\W′0.5λ′k)","inline":true},{"text":", so such a ","element":"span"},{"text":"W ","element":"span"},{"text":"exists (with its bias ","element":"span"},{"text":"weights given as above). By the definition of ","element":"span"},{"style":{"height":20.18},"width":176.24,"height":50.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-3.png","element":"img","alt":"σ′kWindn,N;x0","inline":true},{"text":", we infer that ","element":"span"},{"style":{"height":20.37},"width":266.72,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-4.png","element":"img","alt":" {�gW1,k, . . . , �gWN,k}","inline":true,"padRight":true},{"text":"is linearly ","element":"span"},{"text":"independent and hence spans ","element":"span"},{"style":{"height":16.7},"width":138.88,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-5.png","element":"img","alt":" P≤d(X)","inline":true},{"text":". Thus, for every ","element":"span"},{"style":{"height":13.2},"width":178.04,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-6.png","element":"img","alt":" 1 ≤ t ≤ m","inline":true},{"text":", there exist ","element":"span"},{"style":{"height":23.86},"width":310.28,"height":59.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-7.png","element":"img","alt":" a[t]1,k, . . . , a[t]N,k ∈ R","inline":true},{"text":", ","element":"span"},{"text":"which are uniquely determined once ","element":"span"},{"text":"k ","element":"span"},{"text":"is fixed, such that ","element":"span"},{"style":{"height":23.86},"width":499.88,"height":59.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-8.png","element":"img","alt":" f [t]0 = a[t]1,k�gW1,k+· · ·+a[t]N,k�gWN,k","inline":true},{"text":". Evaluating ","element":"span"},{"text":"both sides of this equation at ","element":"span"},{"style":{"height":9.1},"width":114.4,"height":22.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-9.png","element":"img","alt":" x = x0","inline":true},{"text":", we then get","element":"span"}],[{"id":"id-44","style":{"width":"61%"},"width":975,"height":60,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-10.png","element":"img"}],[{"text":"For each ","element":"span"},{"style":{"height":11.6},"width":101.48,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-11.png","element":"img","alt":" ℓ ∈ R","inline":true},{"text":", define the hyperplane ","element":"span"},{"style":{"height":17.36},"width":646.88,"height":43.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-12.png","element":"img","alt":" Hℓ := {(u1, . . . , uN) ∈ RN : u1 = ℓ}","inline":true},{"text":". Recall that ","element":"span"},{"style":{"height":16},"width":89.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-13.png","element":"img","alt":" q1(x)","inline":true,"padRight":true},{"text":"is the constant monomial, so the first coordinate of each ","element":"span"},{"style":{"height":20.18},"width":114.88,"height":50.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-14.png","element":"img","alt":" ν(�gWj,k)","inline":true,"padRight":true},{"text":"equals ","element":"span"},{"style":{"height":16},"width":95.2,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-15.png","element":"img","alt":" σ(yk)","inline":true},{"text":", which implies that ","element":"span"},{"style":{"height":20.17},"width":334.24,"height":50.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-16.png","element":"img","alt":"ν(�gW1,k), . . . , ν(�gWN,k)","inline":true,"padRight":true},{"text":"are ","element":"span"},{"text":"N ","element":"span"},{"text":"points on ","element":"span"},{"style":{"height":19.04},"width":268,"height":47.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-17.png","element":"img","alt":" Hσ(yk) ∼= RN−1","inline":true},{"text":". Let ","element":"span"},{"style":{"height":18.86},"width":605.12,"height":47.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-18.png","element":"img","alt":" cf := max{∥�ν(f [t])∥2 : 1 ≤ t ≤ m}","inline":true},{"text":". ","element":"span"},{"text":"(This is non-zero, since ","element":"span"},{"text":"f ","element":"span"},{"text":"is non-constant.) Note that ","element":"span"},{"style":{"height":12.7},"width":92.32,"height":31.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-19.png","element":"img","alt":" 0N−1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":18.16},"width":109.6,"height":45.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-20.png","element":"img","alt":" �ν(f [t])","inline":true,"padRight":true},{"text":"(for all ","element":"span"},{"text":"t","element":"span"},{"text":") are points in ","element":"span"},{"style":{"height":20.72},"width":101.44,"height":51.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-21.png","element":"img","alt":"BN−1cf","inline":true,"padRight":true},{"text":". So for any ","element":"span"},{"style":{"height":12.4},"width":94.88,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-22.png","element":"img","alt":" δ > 0","inline":true},{"text":", Lemma ","element":"span"},{"href":"#id-38","text":"5.12 ","element":"a"},{"text":"implies that there exists some sufficiently large ","element":"span"},{"style":{"height":11.6},"width":102.92,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-23.png","element":"img","alt":" k ∈ N","inline":true,"padRight":true},{"text":"such that we can choose some ","element":"span"},{"style":{"height":15.21},"width":157.52,"height":38.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-24.png","element":"img","alt":" W ∈ W′′k","inline":true,"padRight":true},{"text":", so that there are non-negative scalars ","element":"span"},{"style":{"height":21.17},"width":134.12,"height":52.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-25.png","element":"img","alt":" b[t]j,k, b′j,k","inline":true,"padRight":true},{"text":"(for ","element":"span"},{"style":{"height":14},"width":180.48,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-26.png","element":"img","alt":" 1 ≤ j ≤ N","inline":true},{"text":", ","element":"span"},{"style":{"height":13.2},"width":175.64,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-27.png","element":"img","alt":"1 ≤ t ≤ m","inline":true},{"text":") contained in the interval ","element":"span"},{"style":{"height":19.31},"width":258.4,"height":48.28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-28.png","element":"img","alt":" ( 1N − δ, 1N + δ)","inline":true,"padRight":true},{"text":"that satisfy the following:","element":"span"}],[{"style":{"width":"79%"},"width":1265,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-29.png","element":"img"}],[{"text":"Note that ","element":"span"},{"style":{"height":23.86},"width":834.88,"height":59.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-30.png","element":"img","alt":" ν(f [t]0 + σ(yk)) = b[t]1,kν(�gW1,k) + · · · + b[t]N,kν(�gWN,k)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":20.37},"width":511.96,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-31.png","element":"img","alt":" (0N−1, σ(yk)) = b′1,kν(�gW1,k) +","inline":true},{"style":{"height":20.17},"width":296.32,"height":50.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-32.png","element":"img","alt":"· · · + b′N,kν(�gWN,k)","inline":true},{"text":", so we get","element":"span"}],[{"style":{"width":"52%"},"width":839,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-33.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":23.86},"width":229.16,"height":59.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-34.png","element":"img","alt":" a[t]1,k, . . . , a[t]N,k","inline":true,"padRight":true},{"text":"are unique (for fixed ","element":"span"},{"text":"k","element":"span"},{"text":"), we infer that ","element":"span"},{"style":{"height":23.86},"width":284.84,"height":59.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-35.png","element":"img","alt":" a[t]j,k = b[t]j,k − b′j,k","inline":true,"padRight":true},{"text":"for each ","element":"span"},{"style":{"height":14},"width":185.76,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-36.png","element":"img","alt":" 1 ≤ j ≤ N","inline":true},{"text":". ","element":"span"},{"text":"Thus, for this sufficiently large ","element":"span"},{"text":"k","element":"span"},{"text":", it follows from ","element":"span"},{"style":{"height":20.98},"width":443.68,"height":52.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-37.png","element":"img","alt":" b[t]j,k, b′j,k ∈ ( 1N − δ, 1N + δ)","inline":true,"padRight":true},{"text":"that","element":"span"}],[{"id":"id-45","style":{"width":"67%"},"width":1075,"height":60,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-38.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":20.37},"width":496.64,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-39.png","element":"img","alt":" Sk := {�ν(�gW1,k), . . . , �ν(�gWN,k)}","inline":true},{"text":", let ","element":"span"},{"style":{"height":13.91},"width":50.12,"height":34.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-40.png","element":"img","alt":" ∆k","inline":true,"padRight":true},{"text":"be the convex hull of ","element":"span"},{"style":{"height":13.11},"width":41.48,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-41.png","element":"img","alt":" Sk","inline":true},{"text":", and for each ","element":"span"},{"text":"j","element":"span"},{"text":", let ","element":"span"},{"style":{"height":16.71},"width":124,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-42.png","element":"img","alt":" hj(∆k)","inline":true,"padRight":true},{"text":"be the height of ","element":"span"},{"style":{"height":13.9},"width":50.12,"height":34.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-43.png","element":"img","alt":" ∆k","inline":true,"padRight":true},{"text":"w.r.t. apex ","element":"span"},{"style":{"height":20.37},"width":114.88,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-44.png","element":"img","alt":" �ν(�gWj,k)","inline":true},{"text":". Let ","element":"span"},{"style":{"height":16.7},"width":697.28,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-45.png","element":"img","alt":" h(∆k) := max{hj(∆k) : 1 ≤ j ≤ N}","inline":true},{"text":". Since ","element":"span"},{"style":{"height":20.37},"width":424.48,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-46.png","element":"img","alt":"�νi(�gλWj,k ) = λdeg qi�νi(�gWj,k)","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"text":"i","element":"span"},{"text":", and since ","element":"span"},{"style":{"height":13.2},"width":93.92,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-47.png","element":"img","alt":" d ≥ 2","inline":true,"padRight":true},{"text":"(i.e. ","element":"span"},{"style":{"height":14},"width":188.48,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-48.png","element":"img","alt":" deg qN > 1","inline":true},{"text":"), it follows from Lemma ","element":"span"},{"href":"#id-43","text":"5.11 ","element":"a"},{"text":"that there exists some ","element":"span"},{"style":{"height":14.4},"width":98.24,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-49.png","element":"img","alt":" γ > 0","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":17.78},"width":352.48,"height":44.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-50.png","element":"img","alt":" h(∆k) ∈ Ω((λ′k)1+γ)","inline":true},{"text":". Using this particular ","element":"span"},{"style":{"height":14.4},"width":98.24,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-51.png","element":"img","alt":" γ > 0","inline":true},{"text":", we infer ","element":"span"},{"text":"that there exists some constant ","element":"span"},{"style":{"height":11.6},"width":208.96,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-52.png","element":"img","alt":" 0 < C′ < ∞","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":26.83},"width":216.55,"height":67.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-53.png","element":"img","alt":"(λ′k)1+γh(∆k) < C′","inline":true,"padRight":true},{"text":"for all sufficiently large ","element":"span"},{"text":"k","element":"span"},{"text":".","element":"span"}],[{"text":"Note that ","element":"span"},{"style":{"height":11.6},"width":39.16,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-54.png","element":"img","alt":" 2δ","inline":true,"padRight":true},{"text":"is an upper bound of the normalized difference for each barycentric coordinate of the two points ","element":"span"},{"style":{"height":18.16},"width":109.6,"height":45.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-55.png","element":"img","alt":" �ν(f [t])","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":12.7},"width":92.32,"height":31.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-56.png","element":"img","alt":" 0N−1","inline":true,"padRight":true},{"text":"(contained in ","element":"span"},{"style":{"height":20.72},"width":101.44,"height":51.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-57.png","element":"img","alt":" BN−1cf","inline":true,"padRight":true},{"text":"), which satisfies","element":"span"}],[{"style":{"width":"92%"},"width":1466,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-58.png","element":"img"}],[{"text":"Now, define ","element":"span"},{"style":{"height":18.06},"width":636.32,"height":45.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-59.png","element":"img","alt":" C := 2Ncf[(d + 2)rX(x0)]1+γC′ > 0","inline":true},{"text":". Thus, for sufficiently large ","element":"span"},{"text":"k","element":"span"},{"text":", it follows from ","element":"span"},{"href":"#id-44","text":"(5)","element":"a"},{"text":", ","element":"span"},{"href":"#id-45","text":"(6) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-45","text":"(7) ","element":"a"},{"text":"that","element":"span"}],[{"id":"id-46","style":{"width":"85%"},"width":1347,"height":93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-60.png","element":"img"}],[{"text":"For this sufficiently large ","element":"span"},{"text":"k","element":"span"},{"text":", define ","element":"span"},{"style":{"height":16},"width":239.2,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-61.png","element":"img","alt":" g ∈ C(X, Rm)","inline":true,"padRight":true},{"text":"by ","element":"span"},{"style":{"height":23.86},"width":537.32,"height":59.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-62.png","element":"img","alt":" g[t] = a[t]1,kgW1,k + · · · + a[t]N,kgWN,k","inline":true,"padRight":true},{"text":"for each ","element":"span"},{"text":"t","element":"span"},{"text":". ","element":"span"},{"text":"Using ","element":"span"},{"href":"#id-44","text":"(4) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-46","text":"(8)","element":"a"},{"text":", it follows that","element":"span"}],[{"style":{"width":"81%"},"width":1299,"height":184,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/8-63.png","element":"img"}],[{"text":"Finally, for all ","element":"span"},{"style":{"height":13.2},"width":177.08,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-0.png","element":"img","alt":" 1 ≤ t ≤ m","inline":true},{"text":", let ","element":"span"},{"style":{"height":23.47},"width":186.92,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-1.png","element":"img","alt":" w(2)j,t = a[t]j,k","inline":true,"padRight":true},{"text":"for each ","element":"span"},{"style":{"height":14},"width":181.92,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-2.png","element":"img","alt":" 1 ≤ j ≤ N","inline":true},{"text":", and let ","element":"span"},{"style":{"height":23.28},"width":254.08,"height":58.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-3.png","element":"img","alt":" w(2)0,t = f [t](x0)","inline":true},{"text":". This gives ","element":"span"},{"style":{"height":18.57},"width":360.16,"height":46.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-4.png","element":"img","alt":"ρσ [t]W = g[t] + f [t](x0)","inline":true},{"text":". Therefore, the identity ","element":"span"},{"style":{"height":18.19},"width":341.44,"height":45.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-5.png","element":"img","alt":" f [t] = f [t]0 + f [t](x0)","inline":true,"padRight":true},{"text":"implies ","element":"span"},{"style":{"height":16.7},"width":313.24,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-6.png","element":"img","alt":" ∥f − ρσW ∥∞,X < ε","inline":true},{"text":".","element":"span"}],[{"text":"Notice that for all ","element":"span"},{"style":{"height":12.4},"width":94.4,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-7.png","element":"img","alt":" δ > 0","inline":true},{"text":", we showed in ","element":"span"},{"href":"#id-45","text":"(6) ","element":"a"},{"text":"that there is a sufficiently large ","element":"span"},{"text":"k ","element":"span"},{"text":"such that ","element":"span"},{"style":{"height":20.4},"width":188.44,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-8.png","element":"img","alt":" a[t]j,k ≥ −2δ","inline":true},{"text":". ","element":"span"},{"text":"A symmetric argument yields ","element":"span"},{"style":{"height":21.17},"width":156.28,"height":52.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-9.png","element":"img","alt":" a[t]j,k ≤ 2δ","inline":true},{"text":". Thus, for all ","element":"span"},{"style":{"height":11.6},"width":97.28,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-10.png","element":"img","alt":" λ > 0","inline":true},{"text":", we can choose ","element":"span"},{"text":"W ","element":"span"},{"text":"so that all non-bias ","element":"span"},{"text":"weights in ","element":"span"},{"style":{"height":14.35},"width":83.52,"height":35.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-11.png","element":"img","alt":" W (2)","inline":true,"padRight":true},{"text":"are contained in the interval ","element":"span"},{"style":{"height":16},"width":126.4,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-12.png","element":"img","alt":" (−λ, λ)","inline":true},{"text":"; this proves assertion ","element":"span"},{"href":"#id-14","text":"(i) ","element":"a"},{"text":"of the theorem.","element":"span"}],[{"text":"Note also that we do not actually require ","element":"span"},{"style":{"height":12.4},"width":104.96,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-13.png","element":"img","alt":" δ > 0","inline":true,"padRight":true},{"text":"to be arbitrarily small. Suppose instead that we choose ","element":"span"},{"style":{"height":11.6},"width":101.96,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-14.png","element":"img","alt":" k ∈ N","inline":true,"padRight":true},{"text":"sufficiently large, so that the convex hull of ","element":"span"},{"style":{"height":13.1},"width":41.48,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-15.png","element":"img","alt":" Sk","inline":true,"padRight":true},{"text":"contains ","element":"span"},{"style":{"height":12.7},"width":92.32,"height":31.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-16.png","element":"img","alt":" 0N−1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":18.16},"width":109.6,"height":45.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-17.png","element":"img","alt":" �ν(f [t])","inline":true,"padRight":true},{"text":"(for all ","element":"span"},{"text":"t","element":"span"},{"text":"). In this case, observe that our choice of ","element":"span"},{"text":"k ","element":"span"},{"text":"depends only on ","element":"span"},{"text":"f ","element":"span"},{"text":"(via ","element":"span"},{"style":{"height":18.16},"width":109.6,"height":45.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-18.png","element":"img","alt":" �ν(f [t])","inline":true},{"text":") and ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-19.png","element":"img","alt":" σ","inline":true,"padRight":true},{"text":"(via the definition of ","element":"span"},{"style":{"height":16},"width":141.92,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-20.png","element":"img","alt":"{λk}k∈N","inline":true},{"text":"). The inequality ","element":"span"},{"href":"#id-45","text":"(7) ","element":"a"},{"text":"still holds for any ","element":"span"},{"style":{"height":11.6},"width":19,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-21.png","element":"img","alt":" δ","inline":true,"padRight":true},{"text":"satisfying ","element":"span"},{"style":{"height":20.98},"width":445.12,"height":52.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-22.png","element":"img","alt":" b[t]j,k, b′j,k ∈ ( 1N − δ, 1N + δ)","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"text":"j, t","element":"span"},{"text":". ","element":"span"},{"text":"Thus, our argument to show ","element":"span"},{"style":{"height":16.71},"width":312.76,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-23.png","element":"img","alt":" ∥f − ρσW ∥∞,X < ε","inline":true,"padRight":true},{"text":"holds verbatim, which proves assertion ","element":"span"},{"href":"#id-14","text":"(ii).","element":"a"}],[{"text":"Proof of Theorem ","element":"span"},{"href":"#id-14","text":"3.2. ","element":"a"},{"text":"Fix some ","element":"span"},{"style":{"height":11.6},"width":91.52,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-24.png","element":"img","alt":" ε > 0","inline":true},{"text":", and consider an arbitrary ","element":"span"},{"style":{"height":16},"width":247.04,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-25.png","element":"img","alt":" t ∈ {1, . . . , m}","inline":true},{"text":". For each integer ","element":"span"},{"style":{"height":13.2},"width":103.04,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-26.png","element":"img","alt":"d ≥ 1","inline":true},{"text":", let ","element":"span"},{"style":{"height":18.77},"width":50.28,"height":46.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-27.png","element":"img","alt":" p[t]d","inline":true,"padRight":true},{"text":"be the best polynomial approximant to ","element":"span"},{"style":{"height":17.36},"width":54.12,"height":43.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-28.png","element":"img","alt":" f [t]","inline":true,"padRight":true},{"text":"of degree ","element":"span"},{"text":"d","element":"span"},{"text":". By Theorem ","element":"span"},{"href":"#id-47","text":"2.2, ","element":"a"},{"text":"we have ","element":"span"},{"style":{"height":20.21},"width":489.28,"height":50.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-29.png","element":"img","alt":"∥f [t] − p[t]d ∥∞,X ≤ 6ωf [t]( D2d)","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.2},"width":93.92,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-30.png","element":"img","alt":" d ≥ 1","inline":true},{"text":", hence it follows from the definition of ","element":"span"},{"style":{"height":13.11},"width":35.64,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-31.png","element":"img","alt":" dε","inline":true,"padRight":true},{"text":"that","element":"span"}],[{"style":{"width":"36%"},"width":582,"height":61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-32.png","element":"img"}],[{"text":"Define ","element":"span"},{"style":{"height":20.85},"width":651.68,"height":52.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-33.png","element":"img","alt":" ε′ := ε−max{6ωf [t]( D2dε ) : 1 ≤ t ≤ m}","inline":true},{"text":". Note that ","element":"span"},{"style":{"height":11.6},"width":103.04,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-34.png","element":"img","alt":" ε′ > 0","inline":true},{"text":", and ","element":"span"},{"style":{"height":20.11},"width":392.24,"height":50.28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-35.png","element":"img","alt":" ∥f [t]−p[t]dε∥∞,X ≤ ε−ε′","inline":true,"padRight":true},{"text":"(for ","element":"span"},{"text":"all ","element":"span"},{"style":{"height":13.2},"width":175.64,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-36.png","element":"img","alt":" 1 ≤ t ≤ m","inline":true},{"text":"). By Theorem ","element":"span"},{"href":"#id-14","text":"3.1, ","element":"a"},{"text":"there exists some ","element":"span"},{"style":{"height":19.63},"width":224.28,"height":49.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-37.png","element":"img","alt":" W ∈ W�n+dεdε �","inline":true,"padRight":true},{"text":"satisfying ","element":"span"},{"style":{"height":22.61},"width":370.16,"height":56.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-38.png","element":"img","alt":" ∥p[t]dε −ρσ [t]W ∥∞,X < ε′","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.2},"width":175.64,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-39.png","element":"img","alt":" 1 ≤ t ≤ m","inline":true},{"text":", which implies","element":"span"}],[{"style":{"width":"81%"},"width":1293,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-40.png","element":"img"}],[{"text":"therefore ","element":"span"},{"style":{"height":16.7},"width":323.32,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-41.png","element":"img","alt":" ∥f − ρσW ∥∞,X < ε","inline":true},{"text":". Conditions ","element":"span"},{"href":"#id-14","text":"(i) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-14","text":"(ii) ","element":"a"},{"text":"follow from Theorem ","element":"span"},{"href":"#id-14","text":"3.1. ","element":"a"},{"text":"Finally, note that ","element":"span"},{"style":{"height":19.98},"width":287.2,"height":49.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-42.png","element":"img","alt":"ωf [t]( D2d) ∈ O( 1d)","inline":true,"padRight":true},{"text":"(for fixed ","element":"span"},{"text":"D","element":"span"},{"text":"), i.e. ","element":"span"},{"style":{"height":19.5},"width":176.32,"height":48.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-43.png","element":"img","alt":" dε ∈ O( 1ε)","inline":true},{"text":", hence","element":"span"},{"style":{"height":23.18},"width":628.96,"height":57.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-44.png","element":"img","alt":"�n+dεdε �= n(n−1)...(n−dε+1)n! ∈ O(ε−n)","inline":true},{"text":".","element":"span"}],[{"text":"Proof of Theorem ","element":"span"},{"href":"#id-14","text":"3.3. ","element":"a"},{"text":"Most of the work has already been done earlier in the proofs of Theorem ","element":"span"},{"href":"#id-14","text":"3.1 ","element":"a"},{"text":"and Theorem ","element":"span"},{"href":"#id-14","text":"3.2. ","element":"a"},{"text":"The key observation is that ","element":"span"},{"text":"det(","element":"span"},{"text":"Q","element":"span"},{"text":"[","element":"span"},{"text":"W","element":"span"},{"text":"]) ","element":"span"},{"text":"is a non-zero polynomial in terms of the weights ","element":"span"},{"text":"W","element":"span"},{"text":", hence ","element":"span"},{"style":{"height":19.42},"width":572,"height":48.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-45.png","element":"img","alt":" {det(Q[W]) ̸= 0 : W ∈ W�n+dd�}","inline":true,"padRight":true},{"text":"is dense in ","element":"span"},{"style":{"height":18.22},"width":116.76,"height":45.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/9-46.png","element":"img","alt":" W�n+dd�","inline":true},{"text":", or equivalently, its complement has Lebesgue measure zero.","element":"span"}]]},{"heading":"6 CONCLUSION AND FURTHER REMARKS","paragraphs":[[{"text":"Theorem ","element":"span"},{"href":"#id-34","text":"5.6 ","element":"a"},{"text":"is rather general, and could potentially be used to prove analogs of the universal approximation theorem for other classes of neural networks, such as convolutional neural networks and recurrent neural networks. In particular, finding a single suitable set of weights (as a representative of the infinitely many possible sets of weights in the given class of neural networks), with the property that its corresponding “non-bias Vandermonde matrix” (see Definition ","element":"span"},{"href":"#id-48","text":"5.5) ","element":"a"},{"text":"is non-singular, would serve as a straightforward criterion for showing that the UAP holds for the given class of neural networks (with certain weight constraints). We formulated this criterion to be as general as we could, with the hope that it would applicable to future classes of “neural-like” networks.","element":"span"}],[{"text":"We believe our algebraic approach could be emulated to eventually yield a unified understanding of how depth, width, constraints on weights, and other architectural choices, would influence the approximation capabilities of arbitrary neural networks.","element":"span"}],[{"text":"Finally, we end our paper with an open-ended question. The proofs of our results in Section ","element":"span"},{"text":"5 ","element":"span"},{"text":"seem to suggest that non-bias weights and bias weights play very different roles. We could impose very strong restrictions on the non-bias weights and still have the UAP. What about the bias weights?","element":"span"}]]},{"heading":"ACKNOWLEDGMENTS","paragraphs":[[{"text":"This research is supported by the National Research Foundation, Singapore, under its NRFF program (NRFFAI1-2019-0005).","element":"span"}]]},{"heading":"REFERENCES","paragraphs":[[{"id":"id-7","text":"Raman Arora, Amitabh Basu, Poorya Mianjy, and Anirbit Mukherjee. Understanding deep neural ","element":"span"},{"text":"networks with rectified linear units. In International Conference on Learning Representations, 2018. URL ","element":"span"},{"href":"https://openreview.net/forum?id=B1J_rgWRW","text":"https://openreview.net/forum?id=B1J","element":"a"},{"text":"rgWRW.","element":"span"}],[{"id":"id-0","text":"G. Cybenko. ","element":"span"},{"text":"Approximation by superpositions of a sigmoidal function. ","element":"span"},{"text":"Mathematics of Control, Signals and Systems, 2(4):303–314, December 1989. doi: 10.1007/BF02551274. URL ","element":"span"},{"href":"https://doi.org/10.1007/BF02551274","text":"https://doi.org/10.1007/BF02551274.","element":"a"}],[{"id":"id-32","text":"Carlos D’Andrea and Luis Felipe Tabera. ","element":"span"},{"text":"Tropicalization and irreducibility of generalized ","element":"span"},{"text":"Vandermonde ","element":"span"},{"text":"determinants. ","element":"span"},{"text":"Proc. ","element":"span"},{"text":"Amer. ","element":"span"},{"text":"Math. ","element":"span"},{"text":"Soc., ","element":"span"},{"text":"137(11):3647– 3656, ","element":"span"},{"text":"2009. ","element":"span"},{"text":"ISSN ","element":"span"},{"text":"0002-9939. ","element":"span"},{"text":"doi: ","element":"span"},{"text":"10.1090/S0002-9939-09-09951-1. ","element":"span"},{"text":"URL ","element":"span"},{"href":"https://doi-org.library.sutd.edu.sg:2443/10.1090/S0002-9939-09-09951-1","text":"https://doi-org.library.sutd.edu.sg:2443/10.1090/S0002-9939-09-09951-1.","element":"a"}],[{"id":"id-23","text":"Olivier Delalleau and Yoshua Bengio. ","element":"span"},{"text":"Shallow vs. deep sum-product networks. ","element":"span"},{"text":"In J. ShaweTaylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (eds.), Advances in Neural Information Processing Systems 24, pp. 666–674. Curran Associates, Inc., 2011. ","element":"span"},{"text":"URL ","element":"span"},{"href":"http://papers.nips.cc/paper/4350-shallow-vs-deep-sum-product-networks.pdf","text":"http://papers.nips.cc/paper/4350-shallow-vs-deep-sum-product-networks.pdf.","element":"a"}],[{"id":"id-56","text":"Ronald A. DeVore, Ralph Howard, and Charles Micchelli. ","element":"span"},{"text":"Optimal nonlinear approximation. Manuscripta Math., 63(4):469–478, 1989. ISSN 0025-2611. doi: 10.1007/BF01171759. URL ","element":"span"},{"href":"https://doi-org.library.sutd.edu.sg:2443/10.1007/BF01171759","text":"https://doi-org.library.sutd.edu.sg:2443/10.1007/BF01171759.","element":"a"}],[{"id":"id-11","text":"David Eisenbud. ","element":"span"},{"text":"Commutative algebra, volume 150 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995. ","element":"span"},{"text":"ISBN 0-387-94268-8; 0-387-94269-6. ","element":"span"},{"text":"doi: ","element":"span"},{"text":"10.1007/ 978-1-4612-5350-1. URL ","element":"span"},{"href":"http://dx.doi.org/10.1007/978-1-4612-5350-1","text":"http://dx.doi.org/10.1007/978-1-4612-5350-1. ","element":"a"},{"text":"With a view toward algebraic geometry.","element":"span"}],[{"id":"id-24","text":"Ronen Eldan and Ohad Shamir. ","element":"span"},{"text":"The power of depth for feedforward neural networks. ","element":"span"},{"text":"In Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir (eds.), 29th Annual Conference on Learning Theory, volume 49 of Proceedings of Machine Learning Research, pp. 907– 940, Columbia University, New York, New York, USA, 23–26 Jun 2016. PMLR. ","element":"span"},{"text":"URL ","element":"span"},{"href":"http://proceedings.mlr.press/v49/eldan16.html","text":"http://proceedings.mlr.press/v49/eldan16.html.","element":"a"}],[{"id":"id-2","text":"K. Funahashi. On the approximate realization of continuous mappings by neural networks. ","element":"span"},{"text":"Neural Netw., 2(3):183–192, May 1989. ISSN 0893-6080. doi: 10.1016/0893-6080(89)90003-8. URL ","element":"span"},{"href":"http://dx.doi.org/10.1016/0893-6080(89)90003-8","text":"http://dx.doi.org/10.1016/0893-6080(89)90003-8.","element":"a"}],[{"id":"id-22","text":"Boris Hanin. Universal function approximation by deep neural nets with bounded width and relu ","element":"span"},{"text":"activations. 08 2017. Preprint arXiv:1708.02691 [stat.ML].","element":"span"}],[{"id":"id-1","text":"K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approxi- ","element":"span"},{"text":"mators. Neural Netw., 2(5):359–366, July 1989. ISSN 0893-6080. doi: 10.1016/0893-6080(89) 90020-8. URL ","element":"span"},{"href":"http://dx.doi.org/10.1016/0893-6080(89)90020-8","text":"http://dx.doi.org/10.1016/0893-6080(89)90020-8.","element":"a"}],[{"id":"id-3","text":"Kurt Hornik. Approximation capabilities of multilayer feedforward networks. ","element":"span"},{"text":"Neural Networks, 4(2):251 – 257, 1991. ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(91)90009-T. URL ","element":"span"},{"href":"http://www.sciencedirect.com/science/article/pii/089360809190009T","text":"http://www.sciencedirect.com/science/article/pii/089360809190009T.","element":"a"}],[{"id":"id-53","text":"M. I. Kadec. On the distribution of points of maximum deviation in the approximation of continuous ","element":"span"},{"text":"functions by polynomials. Uspehi Mat. Nauk, 15(1 (91)):199–202, 1960. ISSN 0042-1316.","element":"span"}],[{"id":"id-54","text":"M. I. Kadec. On the distribution of points of maximum deviation in the approximation of continuous ","element":"span"},{"text":"functions by polynomials. Amer. Math. Soc. Transl. (2), 26:231–234, 1963. ISSN 0065-9290. doi: 10.1090/trans2/026/09. URL ","element":"span"},{"href":"https://doi-org.library.sutd.edu.sg:2443/10.1090/trans2/026/09","text":"https://doi-org.library.sutd.edu.sg:2443/10.1090/trans2/026/09.","element":"a"}],[{"id":"id-4","text":"Moshe Leshno, Vladimir Ya. Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward net- ","element":"span"},{"text":"works with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6):861 – 867, 1993. ISSN 0893-6080. doi: https://doi.org/10.1016/S0893-6080(05)80131-5. URL ","element":"span"},{"href":"http://www.sciencedirect.com/science/article/pii/S0893608005801315","text":"http://www.sciencedirect.com/science/article/pii/S0893608005801315.","element":"a"}],[{"id":"id-28","text":"William Judson LeVeque. ","element":"span"},{"text":"Topics in number theory. Vols. 1 and 2. Addison-Wesley Publishing Co., Inc., Reading, Mass., 1956.","element":"span"}],[{"id":"id-6","text":"Shiyu Liang and R. Srikant. Why deep neural networks? ","element":"span"},{"text":"CoRR, abs/1610.04161, 2016. URL ","element":"span"},{"href":"http://arxiv.org/abs/1610.04161","text":"http://arxiv.org/abs/1610.04161.","element":"a"}],[{"id":"id-16","text":"Henry W. Lin, Max Tegmark, and David Rolnick. Why does deep and cheap learning work so well? ","element":"span"},{"text":"Journal of Statistical Physics, 168(6):1223–1247, Sep 2017.","element":"span"}],[{"id":"id-21","text":"Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. ","element":"span"},{"text":"The expressive power of neural networks: A view from the width. ","element":"span"},{"text":"In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems 30, pp. 6231–6239. Curran Associates, Inc., 2017. ","element":"span"},{"text":"URL ","element":"span"},{"href":"http://papers.nips.cc/paper/7203-the-expressive-power-of-neural-networks-a-view-from-the-width.pdf","text":"http://papers.nips.cc/paper/7203-the-expressive-power-of-neural-networks-a-view-from-the-width.pdf.","element":"a"}],[{"id":"id-57","text":"V. ","element":"span"},{"text":"E. ","element":"span"},{"text":"Maiorov ","element":"span"},{"text":"and ","element":"span"},{"text":"R. ","element":"span"},{"text":"Meir. ","element":"span"},{"text":"On ","element":"span"},{"text":"the ","element":"span"},{"text":"near ","element":"span"},{"text":"optimality ","element":"span"},{"text":"of ","element":"span"},{"text":"the ","element":"span"},{"text":"stochastic ","element":"span"},{"text":"approximation ","element":"span"},{"text":"of ","element":"span"},{"text":"smooth ","element":"span"},{"text":"functions ","element":"span"},{"text":"by ","element":"span"},{"text":"neural ","element":"span"},{"text":"networks. ","element":"span"},{"text":"Adv. ","element":"span"},{"text":"Comput. ","element":"span"},{"text":"Math., ","element":"span"},{"text":"13(1): 79–103, ","element":"span"},{"text":"2000. ","element":"span"},{"text":"ISSN ","element":"span"},{"text":"1019-7168. ","element":"span"},{"text":"doi: ","element":"span"},{"text":"10.1023/A:1018993908478. ","element":"span"},{"text":"URL ","element":"span"},{"href":"https://doi-org.library.sutd.edu.sg:2443/10.1023/A:1018993908478","text":"https://doi-org.library.sutd.edu.sg:2443/10.1023/A:1018993908478.","element":"a"}],[{"id":"id-10","text":"Vitaly Maiorov and Allan Pinkus. Lower bounds for approximation by mlp neural networks. ","element":"span"},{"text":"Neurocomputing, 25(1):81 – 91, 1999. ISSN 0925-2312. doi: https://doi.org/10.1016/S0925-2312(98) 00111-8. URL ","element":"span"},{"href":"http://www.sciencedirect.com/science/article/pii/S0925231298001118","text":"http://www.sciencedirect.com/science/article/pii/S0925231298001118.","element":"a"}],[{"id":"id-8","text":"H. N. Mhaskar. Neural networks for optimal approximation of smooth and analytic functions. ","element":"span"},{"text":"Neural Computation, 8(1):164–177, Jan 1996. doi: 10.1162/neco.1996.8.1.164.","element":"span"}],[{"id":"id-25","text":"Hrushikesh Mhaskar, Qianli Liao, and Tomaso Poggio. When and why are deep networks better than ","element":"span"},{"text":"shallow ones?, 2017. URL ","element":"span"},{"href":"https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14849","text":"https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14849.","element":"a"}],[{"id":"id-26","text":"Guido Mont´ufar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. On the number of linear ","element":"span"},{"text":"regions of deep neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, pp. 2924–2932, Cambridge, MA, USA, 2014. MIT Press. URL ","element":"span"},{"href":"http://dl.acm.org/citation.cfm?id=2969033.2969153","text":"http://dl.acm.org/citation.cfm?id=2969033.2969153.","element":"a"}],[{"id":"id-9","text":"Allan Pinkus. Approximation theory of the mlp model in neural networks. ","element":"span"},{"text":"ACTA NUMERICA, 8: 143–195, 1999.","element":"span"}],[{"id":"id-19","text":"A. Rahimi and B. Recht. Uniform approximation of functions with random bases. In ","element":"span"},{"text":"2008 46th Annual Allerton Conference on Communication, Control, and Computing, pp. 555–561, Sep. 2008. doi: 10.1109/ALLERTON.2008.4797607.","element":"span"}],[{"id":"id-17","text":"Ali Rahimi and Benjamin Recht. ","element":"span"},{"text":"Random features for large-scale kernel machines. ","element":"span"},{"text":"In Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, pp. 1177–1184, USA, 2007. Curran Associates Inc. ISBN 978-1-60560-352-0. URL ","element":"span"},{"href":"http://dl.acm.org/citation.cfm?id=2981562.2981710","text":"http://dl.acm.org/citation.cfm?id=2981562.2981710.","element":"a"}],[{"id":"id-13","text":"Theodore J. Rivlin. ","element":"span"},{"text":"An introduction to the approximation of functions. Dover Publications, Inc., New York, 1981. ISBN 0-486-64069-8. Corrected reprint of the 1969 original, Dover Books on Advanced Mathematics.","element":"span"}],[{"id":"id-50","text":"Richard ","element":"span"},{"text":"P. ","element":"span"},{"text":"Stanley. ","element":"span"},{"text":"Enumerative ","element":"span"},{"text":"combinatorics. ","element":"span"},{"text":"Vol. ","element":"span"},{"text":"2, ","element":"span"},{"text":"volume ","element":"span"},{"text":"62 ","element":"span"},{"text":"of ","element":"span"},{"text":"Cambridge Studies in Advanced Mathematics. ","element":"span"},{"text":"Cambridge University Press, ","element":"span"},{"text":"Cambridge, ","element":"span"},{"text":"1999. ISBN ","element":"span"},{"text":"0-521-56069-1; ","element":"span"},{"text":"0-521-78987-7. ","element":"span"},{"text":"doi: ","element":"span"},{"text":"10.1017/CBO9780511609589. ","element":"span"},{"text":"URL ","element":"span"},{"href":"https://doi-org.library.sutd.edu.sg:2443/10.1017/CBO9780511609589","text":"https://doi-org.library.sutd.edu.sg:2443/10.1017/CBO9780511609589. ","element":"a"},{"text":"With a foreword by GianCarlo Rota and appendix 1 by Sergey Fomin.","element":"span"}],[{"id":"id-18","text":"Yitong Sun, Anna Gilbert, and Ambuj Tewari. On the approximation properties of random ReLU ","element":"span"},{"text":"features. Preprint arXiv:1810.04374v3 [stat.ML], August 2019.","element":"span"}],[{"id":"id-27","text":"Matus Telgarsky. benefits of depth in neural networks. In Vitaly Feldman, Alexander Rakhlin, and ","element":"span"},{"text":"Ohad Shamir (eds.), 29th Annual Conference on Learning Theory, volume 49 of Proceedings of Machine Learning Research, pp. 1517–1539, Columbia University, New York, New York, USA, 23–26 Jun 2016. PMLR. URL ","element":"span"},{"href":"http://proceedings.mlr.press/v49/telgarsky16.html","text":"http://proceedings.mlr.press/v49/telgarsky16.html.","element":"a"}],[{"id":"id-49","text":"K. Wolsson. Linear dependence of a function set of ","element":"span"},{"text":"m ","element":"span"},{"text":"variables with vanishing generalized Wronskians. Linear Algebra Appl., 117:73–80, 1989. ISSN 0024-3795. doi: 10.1016/0024-3795(89) 90548-X. URL ","element":"span"},{"href":"https://doi-org.library.sutd.edu.sg:2443/10.1016/0024-3795(89)90548-X","text":"https://doi-org.library.sutd.edu.sg:2443/10.1016/0024-3795(89)90548-X.","element":"a"}],[{"id":"id-5","text":"Dmitry Yarotsky. Error bounds for approximations with deep relu networks. ","element":"span"},{"text":"Neural Networks, 94:103 – 114, 2017. ISSN 0893-6080. doi: https://doi.org/10.1016/j.neunet.2017.07.002. URL ","element":"span"},{"href":"http://www.sciencedirect.com/science/article/pii/S0893608017301545","text":"http://www.sciencedirect.com/science/article/pii/S0893608017301545.","element":"a"}],[{"id":"id-20","text":"Gilad Yehudai and Ohad Shamir. On the power and limitations of random features for understanding ","element":"span"},{"text":"neural networks. Preprint arXiv:1904.00687v2 [stat.ML], June 2019.","element":"span"}],[{"text":"A ","element":"span"},{"text":"G","element":"span"},{"text":"ENERALIZED ","element":"span"},{"text":"W","element":"span"},{"text":"RONSKIANS AND THE PROOF OF ","element":"span"},{"text":"T","element":"span"},{"text":"HEOREM ","element":"span"},{"href":"#id-34","text":"5.6","element":"a"}],[{"style":{"width":"100%"},"width":1585,"height":154,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-0.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":16},"width":337.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-1.png","element":"img","alt":" f1, . . . , fN ∈ P(Rn)","inline":true},{"text":". The generalized Wronskian of ","element":"span"},{"style":{"height":16},"width":207.04,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-2.png","element":"img","alt":" (f1, . . . , fN)","inline":true,"padRight":true},{"text":"associated to ","element":"span"},{"style":{"height":14.8},"width":242.08,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-3.png","element":"img","alt":" ∆0, . . . , ∆N−1","inline":true}],[{"text":"is defined as the determinant of the matrix ","element":"span"},{"style":{"height":16.7},"width":423.48,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-4.png","element":"img","alt":" M = [∆i−1fj(x)]1≤i,j≤N","inline":true,"padRight":true},{"text":". In general, ","element":"span"},{"style":{"height":16},"width":207.04,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-5.png","element":"img","alt":" (f1, . . . , fN)","inline":true,"padRight":true},{"text":"has","element":"span"}],[{"id":"id-33","text":"multiple generalized Wronskians, corresponding to multiple choices for ","element":"span"},{"style":{"height":14.8},"width":242.08,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-6.png","element":"img","alt":" ∆0, . . . , ∆N−1","inline":true},{"text":".","element":"span"}],[{"text":"A.1 ","element":"span"},{"text":"P","element":"span"},{"text":"ROOF OF ","element":"span"},{"text":"T","element":"span"},{"text":"HEOREM ","element":"span"},{"href":"#id-34","text":"5.6","element":"a"}],[{"text":"For brevity, let ","element":"span"},{"style":{"height":20.24},"width":202.8,"height":50.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-7.png","element":"img","alt":" N =�n+dd �","inline":true},{"text":"and let ","element":"span"},{"style":{"height":16},"width":298.72,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-8.png","element":"img","alt":" x = (x1, . . . , xn)","inline":true},{"text":". Recall that ","element":"span"},{"style":{"height":13.1},"width":275.64,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-9.png","element":"img","alt":" λ1 < · · · < λN","inline":true,"padRight":true},{"text":"are all the","element":"span"}],[{"text":"n","element":"span"},{"text":"-tuples in ","element":"span"},{"style":{"height":18.61},"width":69.8,"height":46.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-10.png","element":"img","alt":" Λn≤d","inline":true,"padRight":true},{"text":"in the colexicographic order. For each ","element":"span"},{"style":{"height":14},"width":218.88,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-11.png","element":"img","alt":" 1 ≤ i, k ≤ N","inline":true},{"text":", write ","element":"span"},{"style":{"height":16.7},"width":357.76,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-12.png","element":"img","alt":" λk = (λk,1, . . . , λk,n)","inline":true},{"text":",","element":"span"}],[{"text":"define the differential operator ","element":"span"},{"style":{"height":24.3},"width":520.08,"height":60.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-13.png","element":"img","alt":" ∆λk =� ∂∂x1�λk,1 · · ·� ∂∂xn�λk,n","inline":true},{"text":", and let ","element":"span"},{"style":{"height":23.18},"width":60.96,"height":57.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-14.png","element":"img","alt":" α(i)λk","inline":true,"padRight":true},{"text":"be the coefficient of","element":"span"}],[{"text":"the monomial ","element":"span"},{"style":{"height":16},"width":92.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-15.png","element":"img","alt":" qk(x)","inline":true,"padRight":true},{"text":"in ","element":"span"},{"style":{"height":16},"width":142.24,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-16.png","element":"img","alt":" ∆λip(x)","inline":true},{"text":". Consider an arbitrary ","element":"span"},{"style":{"height":11.6},"width":128.36,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-17.png","element":"img","alt":" W ∈ U","inline":true},{"text":", and for each ","element":"span"},{"style":{"height":14},"width":195.36,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-18.png","element":"img","alt":" 1 ≤ j ≤ N","inline":true},{"text":", define","element":"span"}],[{"style":{"height":16.7},"width":253.6,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-19.png","element":"img","alt":"fj ∈ P≤d(Rn)","inline":true,"padRight":true},{"text":"by the map ","element":"span"},{"style":{"height":23.47},"width":473.44,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-20.png","element":"img","alt":" x �→ p(w(1)1,jx1, . . . , w(1)n,jxn)","inline":true},{"text":". Note that ","element":"span"},{"style":{"height":16.7},"width":444.16,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-21.png","element":"img","alt":" Fp,0n(W) = (f1, . . . , fN)","inline":true}],[{"text":"by definition. Next, define the matrix ","element":"span"},{"style":{"height":16.7},"width":487.32,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-22.png","element":"img","alt":" MW (x) := [∆ifj(x)]1≤i,j≤N","inline":true,"padRight":true},{"text":", and note that ","element":"span"},{"style":{"height":16},"width":192.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-23.png","element":"img","alt":" det MW (x)","inline":true,"padRight":true},{"text":"is","element":"span"}],[{"text":"the generalized Wronskian of ","element":"span"},{"style":{"height":16},"width":207.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-24.png","element":"img","alt":" (f1, . . . , fN)","inline":true,"padRight":true},{"text":"associated to ","element":"span"},{"style":{"height":14.8},"width":199.8,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-25.png","element":"img","alt":" ∆1, . . . , ∆N","inline":true},{"text":". In particular, this generalized","element":"span"}],[{"text":"Wronskian is well-defined, since the definition of the colexicographic order implies that ","element":"span"},{"style":{"height":15.5},"width":188.44,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-26.png","element":"img","alt":" λk,1 +· · ·+","inline":true}],[{"style":{"height":15.51},"width":153.48,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-27.png","element":"img","alt":"λk,n ≤ k","inline":true,"padRight":true},{"text":"for all possible ","element":"span"},{"text":"k","element":"span"},{"text":". Similar to the univariate case, ","element":"span"},{"href":"#id-49","referenceIndex":31,"style":{"height":16},"width":207.04,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-28.png","element":"img","alt":" (f1, . . . , fN)","inline":true,"padRight":true},{"href":"#id-49","referenceIndex":31,"text":"is lin","element":"a"},{"text":"early independent if","element":"span"}],[{"text":"(and only if) its generalized Wronskian is not the zero function ","element":"span"},{"href":"#id-49","referenceIndex":31,"text":"(Wolsson, ","element":"a"},{"href":"#id-49","referenceIndex":31,"text":"1989)","element":"a"},{"text":". Thus, to show that","element":"span"}],[{"style":{"height":15.2},"width":192.72,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-29.png","element":"img","alt":"W ∈pUind","inline":true},{"text":", it suffices to show that the evaluation ","element":"span"},{"style":{"height":16},"width":212.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-30.png","element":"img","alt":" det MW (1n)","inline":true,"padRight":true},{"text":"of this generalized Wronskian at","element":"span"}],[{"style":{"height":12.7},"width":120.32,"height":31.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-31.png","element":"img","alt":"x = 1n","inline":true,"padRight":true},{"text":"gives a non-zero value, where ","element":"span"},{"style":{"height":12.7},"width":43.04,"height":31.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-32.png","element":"img","alt":" 1n","inline":true,"padRight":true},{"text":"denotes the all-ones vector in ","element":"span"},{"style":{"height":10.8},"width":48.8,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-33.png","element":"img","alt":" Rn","inline":true},{"text":".","element":"span"}],[{"text":"Observe that the ","element":"span"},{"text":"(","element":"span"},{"text":"i, j","element":"span"},{"text":")","element":"span"},{"text":"-th entry of ","element":"span"},{"style":{"height":16},"width":151.36,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-34.png","element":"img","alt":" MW (1n)","inline":true,"padRight":true},{"text":"equals ","element":"span"},{"style":{"height":23.66},"width":366.4,"height":59.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-35.png","element":"img","alt":" (�w(1)j )λi(∆λip)(�w(1)j )","inline":true},{"text":", hence we can check that","element":"span"}],[{"style":{"height":16},"width":324.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-36.png","element":"img","alt":"MW (1n) = M ′M ′′","inline":true},{"text":", where ","element":"span"},{"style":{"height":10.8},"width":57.2,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-37.png","element":"img","alt":" M ′","inline":true,"padRight":true},{"text":"is an ","element":"span"},{"text":"N","element":"span"},{"text":"-by-","element":"span"},{"text":"N ","element":"span"},{"text":"matrix whose ","element":"span"},{"text":"(","element":"span"},{"text":"i, j","element":"span"},{"text":")","element":"span"},{"text":"-th entry is given by","element":"span"}],[{"style":{"width":"69%"},"width":1096,"height":180,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-38.png","element":"img"}],[{"text":"It follows from the definition of the colexicographic order that ","element":"span"},{"style":{"height":15.5},"width":123.32,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-39.png","element":"img","alt":" λj − λi","inline":true,"padRight":true},{"text":"necessarily contains at least","element":"span"}],[{"text":"one strictly negative entry whenever ","element":"span"},{"text":"j < i","element":"span"},{"text":", hence we infer that ","element":"span"},{"style":{"height":10.8},"width":57.2,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-40.png","element":"img","alt":" M ′","inline":true,"padRight":true},{"text":"is upper triangular. The diagonal","element":"span"}],[{"text":"entries of ","element":"span"},{"style":{"height":10.8},"width":57.2,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-41.png","element":"img","alt":" M ′","inline":true,"padRight":true},{"text":"are ","element":"span"},{"style":{"height":22.42},"width":321.6,"height":56.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-42.png","element":"img","alt":" α(1)0n , α(2)0n , . . . , α(N)0n","inline":true,"padRight":true},{"text":", and note that ","element":"span"},{"style":{"height":22.99},"width":424.32,"height":57.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-43.png","element":"img","alt":" α(i)0n = (λi,1! · · · λi,n!)α(1)λi","inline":true,"padRight":true},{"text":"for each ","element":"span"},{"style":{"height":13.2},"width":175.68,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-44.png","element":"img","alt":" 1 ≤ i ≤ N","inline":true},{"text":",","element":"span"}],[{"text":"where ","element":"span"},{"style":{"height":16.3},"width":209.24,"height":40.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-45.png","element":"img","alt":" λi,1! · · · λi,n!","inline":true,"padRight":true},{"text":"denotes the product of the factorials of the entries of the ","element":"span"},{"text":"n","element":"span"},{"text":"-tuple ","element":"span"},{"style":{"height":13.1},"width":34.04,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-46.png","element":"img","alt":" λi","inline":true},{"text":". In particular,","element":"span"}],[{"style":{"height":16.3},"width":293.12,"height":40.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-47.png","element":"img","alt":"λi,1! · · · λi,n! ̸= 0","inline":true},{"text":", and ","element":"span"},{"style":{"height":22.99},"width":65.76,"height":57.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-48.png","element":"img","alt":" α(1)λi","inline":true,"padRight":true},{"text":", which is the coefficient of the monomial ","element":"span"},{"style":{"height":16},"width":86.56,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-49.png","element":"img","alt":" qi(x)","inline":true,"padRight":true},{"text":"in ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":")","element":"span"},{"text":", is non-zero.","element":"span"}],[{"text":"Thus, ","element":"span"},{"style":{"height":16},"width":213.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-50.png","element":"img","alt":" det(M ′) ̸= 0","inline":true},{"text":".","element":"span"}],[{"text":"We have come to the crucial step of our proof. If we can show that ","element":"span"},{"style":{"height":16},"width":472.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-51.png","element":"img","alt":" det(M ′′) = det(Q[W]) ̸= 0","inline":true},{"text":",","element":"span"}],[{"text":"then ","element":"span"},{"style":{"height":16},"width":692.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-52.png","element":"img","alt":" det(MW (1n)) = det(M ′) det(M ′′) ̸= 0","inline":true},{"text":", and hence we can infer that ","element":"span"},{"style":{"height":15.2},"width":200.4,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-53.png","element":"img","alt":" W ∈pUind","inline":true},{"text":". This","element":"span"}],[{"text":"means that","element":"span"},{"style":{"height":14.8},"width":92.4,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-54.png","element":"img","alt":"pUind","inline":true,"padRight":true},{"text":"contains the subset ","element":"span"},{"style":{"height":13.2},"width":128.84,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-55.png","element":"img","alt":" U′ ⊆ U","inline":true,"padRight":true},{"text":"consisting of all ","element":"span"},{"text":"W ","element":"span"},{"text":"such that ","element":"span"},{"text":"Q","element":"span"},{"text":"[","element":"span"},{"text":"W","element":"span"},{"text":"] ","element":"span"},{"text":"is non-singular.","element":"span"}],[{"text":"Note that ","element":"span"},{"text":"det(","element":"span"},{"text":"Q","element":"span"},{"text":"[","element":"span"},{"text":"W","element":"span"},{"text":"]) ","element":"span"},{"text":"is a polynomial in terms of the non-bias weights in ","element":"span"},{"style":{"height":14.16},"width":83.52,"height":35.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-56.png","element":"img","alt":" W (1)","inline":true,"padRight":true},{"text":"as its variables,","element":"span"}],[{"text":"so we could write this polynomial as ","element":"span"},{"text":"r ","element":"span"},{"text":"= ","element":"span"},{"text":"r","element":"span"},{"text":"(","element":"span"},{"text":"W","element":"span"},{"text":")","element":"span"},{"text":". Consequently, if we can find a single ","element":"span"},{"style":{"height":11.6},"width":133.64,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-57.png","element":"img","alt":" W ∈ U","inline":true}],[{"text":"such that ","element":"span"},{"text":"Q","element":"span"},{"text":"[","element":"span"},{"text":"W","element":"span"},{"text":"] ","element":"span"},{"text":"is non-singular, then ","element":"span"},{"text":"r","element":"span"},{"text":"(","element":"span"},{"text":"W","element":"span"},{"text":") ","element":"span"},{"text":"is not identically zero on ","element":"span"},{"text":"U","element":"span"},{"text":", which then implies that","element":"span"}],[{"style":{"height":16},"width":453.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/12-58.png","element":"img","alt":"U′ = {W ∈ U : r(W) ̸= 0}","inline":true,"padRight":true},{"text":"is dense in ","element":"span"},{"text":"U ","element":"span"},{"text":"(w.r.t. the Euclidean metric).","element":"span"}],[{"id":"id-35","text":"A.2 ","element":"span"},{"text":"P","element":"span"},{"text":"ROOF OF ","element":"span"},{"text":"C","element":"span"},{"text":"OROLLARY ","element":"span"},{"href":"#id-36","text":"5.7","element":"a"}],[{"text":"Let ","element":"span"},{"style":{"height":20.27},"width":201.36,"height":50.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-0.png","element":"img","alt":" N :=�n+dd �","inline":true},{"text":". By Theorem ","element":"span"},{"href":"#id-34","text":"5.6, ","element":"a"},{"text":"it suffices to show that there exists some ","element":"span"},{"style":{"height":17.28},"width":195.52,"height":43.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-1.png","element":"img","alt":" W ∈ Wn,mN","inline":true,"padRight":true},{"text":"such that","element":"span"}],[{"text":"the non-bias Vandermonde matrix of ","element":"span"},{"text":"W ","element":"span"},{"text":"is non-singular. Consider ","element":"span"},{"style":{"height":17.28},"width":201.76,"height":43.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-2.png","element":"img","alt":" W ∈ Wn,mN","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":23.47},"width":119.32,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-3.png","element":"img","alt":" w(1)i,j =","inline":true}],[{"style":{"height":22.78},"width":196.28,"height":56.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-4.png","element":"img","alt":"(w(1)1,j)(d+1)i","inline":true},{"text":". Recall that the monomials in ","element":"span"},{"style":{"height":17.81},"width":89.96,"height":44.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-5.png","element":"img","alt":" Mn≤d","inline":true,"padRight":true},{"text":"are arranged in colexicographic order, i.e.","element":"span"}],[{"style":{"width":"56%"},"width":890,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-6.png","element":"img"}],[{"text":"Thus, there are fixed integers ","element":"span"},{"style":{"height":14.4},"width":442.68,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-7.png","element":"img","alt":" 0 = β1 < β2 < · · · < βN","inline":true},{"text":", such that the ","element":"span"},{"text":"(","element":"span"},{"text":"i, j","element":"span"},{"text":")","element":"span"},{"text":"-th entry of ","element":"span"},{"text":"Q","element":"span"},{"text":"[","element":"span"},{"text":"W","element":"span"},{"text":"] ","element":"span"},{"text":"is","element":"span"}],[{"style":{"height":20.78},"width":132.44,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-8.png","element":"img","alt":"(w(1)1,j)βi","inline":true},{"text":". Such matrices ","element":"span"},{"href":"#id-50","referenceIndex":28,"text":"are well-","element":"a"},{"href":"#id-50","referenceIndex":28,"text":"studie","element":"a"},{"text":"d in algebraic combinatorics, and the determinant of ","element":"span"},{"text":"Q","element":"span"},{"text":"[","element":"span"},{"text":"W","element":"span"},{"text":"] ","element":"span"},{"text":"is","element":"span"}],[{"text":"a Schur polynomial; see ","element":"span"},{"href":"#id-50","referenceIndex":28,"text":"(Stanley, ","element":"a"},{"href":"#id-50","referenceIndex":28,"text":"1999)","element":"a"},{"text":". In particular, if we choose positive pairwise distinct values","element":"span"}],[{"text":"for ","element":"span"},{"style":{"height":23.47},"width":69.6,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-9.png","element":"img","alt":" w(1)1,j","inline":true,"padRight":true},{"text":"(for ","element":"span"},{"style":{"height":14},"width":180.96,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-10.png","element":"img","alt":" 1 ≤ j ≤ N","inline":true},{"text":"), then ","element":"span"},{"text":"Q","element":"span"},{"text":"[","element":"span"},{"text":"W","element":"span"},{"text":"] ","element":"span"},{"text":"is non-sin","element":"span"},{"href":"#id-50","referenceIndex":28,"text":"gular, sin","element":"a"},{"href":"#id-50","referenceIndex":28,"text":"ce a S","element":"a"},{"text":"chur polynomial can be expressed as","element":"span"}],[{"id":"id-37","text":"a (non-negative) sum of certain monomials; see ","element":"span"},{"href":"#id-50","referenceIndex":28,"text":"(Stanley, ","element":"a"},{"href":"#id-50","referenceIndex":28,"text":"1999, ","element":"a"},{"text":"Sec. 7.10) for details.","element":"span"}],[{"text":"B ","element":"span"},{"text":"A","element":"span"},{"text":"N ANALOG OF ","element":"span"},{"text":"K","element":"span"},{"text":"ADEC","element":"span"},{"text":"’","element":"span"},{"text":"S THEOREM AND THE PROOF OF ","element":"span"},{"text":"L","element":"span"},{"text":"EMMA ","element":"span"},{"href":"#id-39","text":"5.9","element":"a"}],[{"text":"Throughout this section, suppose ","element":"span"},{"style":{"height":16},"width":169.12,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-11.png","element":"img","alt":" σ ∈ C(R)","inline":true,"padRight":true},{"text":"and let ","element":"span"},{"style":{"height":13.2},"width":106.4,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-12.png","element":"img","alt":" d ≥ 1","inline":true,"padRight":true},{"text":"be an integer. We shall use the same","element":"span"}],[{"text":"definitions for ","element":"span"},{"style":{"height":16},"width":308,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-13.png","element":"img","alt":" {λk}k∈N, {Yk}k∈N","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16},"width":140.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-14.png","element":"img","alt":" {σk}k∈N","inline":true,"padRight":true},{"text":"as given immediately after Remark ","element":"span"},{"href":"#id-51","text":"5.8. ","element":"a"},{"text":"Our goal","element":"span"}],[{"text":"for this section is to prove Theorem ","element":"span"},{"href":"#id-52","referenceIndex":86,"text":"B.1 ","element":"a"},{"text":"below, so that we can infer Lemma ","element":"span"},{"href":"#id-39","text":"5.9 ","element":"a"},{"text":"as a cons","element":"span"},{"href":"#id-53","referenceIndex":12,"text":"equence","element":"a"}],[{"href":"#id-53","referenceIndex":12,"text":"of Th","element":"a"},{"text":"eorem ","element":"span"},{"href":"#id-52","referenceIndex":86,"text":"B.1. ","element":"a"},{"text":"Note that Theorem ","element":"span"},{"href":"#id-52","referenceIndex":86,"text":"B.1 ","element":"a"},{"text":"is an analog of the well-known Kadec’s theorem ","element":"span"},{"href":"#id-53","referenceIndex":12,"text":"(Kadec,","element":"a"}],[{"href":"#id-53","referenceIndex":12,"text":"1960) ","element":"a"},{"text":"from approximation th","element":"span"},{"href":"#id-54","referenceIndex":13,"text":"eory. T","element":"a"},{"href":"#id-54","referenceIndex":13,"text":"o prov","element":"a"},{"text":"e Theorem ","element":"span"},{"href":"#id-52","referenceIndex":86,"text":"B.1, ","element":"a"},{"text":"we shall essentially follow the proof of","element":"span"}],[{"text":"Kadec’s theorem as given in ","element":"span"},{"href":"#id-54","referenceIndex":13,"text":"(Kadec, ","element":"a"},{"href":"#id-54","referenceIndex":13,"text":"1963)","element":"a"},{"text":".","element":"span"}],[{"text":"We begin with a crucial observation. For every best polynomial approximant ","element":"span"},{"style":{"height":9.1},"width":39.56,"height":22.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-15.png","element":"img","alt":" σk","inline":true,"padRight":true},{"text":"to ","element":"span"},{"style":{"height":16.03},"width":68.76,"height":40.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-16.png","element":"img","alt":" σ|Yk","inline":true,"padRight":true},{"text":"of degree","element":"span"}],[{"text":"d","element":"span"},{"text":", it is known that there are (at least) ","element":"span"},{"text":"d ","element":"span"},{"text":"+ 2 ","element":"span"},{"text":"values","element":"span"}],[{"style":{"width":"38%"},"width":611,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-17.png","element":"img"}],[{"text":"and ","element":"span"},{"href":"#id-13","referenceIndex":27,"text":"some s","element":"a"},{"text":"ign ","element":"span"},{"href":"#id-13","referenceIndex":27,"style":{"height":16},"width":177.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-18.png","element":"img","alt":" δk ∈ {±1}","inline":true},{"text":", such that ","element":"span"},{"style":{"height":21.07},"width":644.8,"height":52.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-19.png","element":"img","alt":" σ(a(k)i ) − σk(a(k)i ) = (−1)iδkEd(σ|Yk)","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.2},"width":227.36,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-20.png","element":"img","alt":" 0 ≤ i ≤ d + 1","inline":true},{"text":";","element":"span"}],[{"text":"see ","element":"span"},{"href":"#id-13","referenceIndex":27,"text":"(Rivlin, ","element":"a"},{"href":"#id-13","referenceIndex":27,"text":"1981, ","element":"a"},{"text":"Thm. 1.7). Define","element":"span"}],[{"style":{"width":"54%"},"width":858,"height":102,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-21.png","element":"img"}],[{"id":"id-52","text":"Theorem B.1. ","element":"span"},{"text":"If ","element":"span"},{"style":{"height":16.03},"width":390.88,"height":40.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-22.png","element":"img","alt":" limk→∞ Ed(σ|Yk) = ∞","inline":true},{"text":", then for any ","element":"span"},{"style":{"height":14.4},"width":95.84,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-23.png","element":"img","alt":" γ > 0","inline":true},{"text":", we have ","element":"span"},{"style":{"height":33.3},"width":295.52,"height":83.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-24.png","element":"img","alt":" lim infk→∞ ∆kλkkγ = 0","inline":true},{"text":".","element":"span"}],[{"text":"Proof. For every ","element":"span"},{"style":{"height":11.6},"width":101.48,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-25.png","element":"img","alt":" k ∈ N","inline":true},{"text":", define the functions ","element":"span"},{"style":{"height":9.1},"width":217.16,"height":22.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-26.png","element":"img","alt":" ek := σ − σk","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":14.7},"width":543.56,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-27.png","element":"img","alt":" φk+1 := σk − σk+1 = ek+1 − ek","inline":true},{"text":".","element":"span"}],[{"text":"Note that ","element":"span"},{"style":{"height":16},"width":177.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-28.png","element":"img","alt":" ek ∈ C(R)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":16.7},"width":270.88,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-29.png","element":"img","alt":" φk+1 ∈ P≤d(R)","inline":true},{"text":". Since ","element":"span"},{"style":{"height":20.37},"width":342.88,"height":50.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-30.png","element":"img","alt":" y′k+1 ≤ a(k)i ≤ y′′k+1","inline":true,"padRight":true},{"text":"by assumption, it follows","element":"span"}],[{"text":"from the definition of ","element":"span"},{"style":{"height":10.7},"width":80.32,"height":26.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-31.png","element":"img","alt":" σk+1","inline":true,"padRight":true},{"text":"that ","element":"span"},{"style":{"height":19.79},"width":704.8,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-32.png","element":"img","alt":" −Ed(σ|Yk+1) ≤ ek+1(a(k)i ) ≤ Ed(σ|Yk+1)","inline":true},{"text":". By the definition of","element":"span"}],[{"style":{"height":18.38},"width":62.88,"height":45.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-33.png","element":"img","alt":"a(k)i","inline":true,"padRight":true},{"text":", we have ","element":"span"},{"style":{"height":18.38},"width":473.44,"height":45.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-34.png","element":"img","alt":" ek(a(k)i ) = (−1)iδkEd(σ|Yk)","inline":true},{"text":". Consequently,","element":"span"}],[{"style":{"width":"82%"},"width":1314,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-35.png","element":"img"}],[{"text":"or equivalently, ","element":"span"},{"style":{"height":19.98},"width":1225.6,"height":49.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-36.png","element":"img","alt":" −Ed(σ|Yk) − Ed(σ|Yk+1) ≤ (−1)iδkφk+1(a(k)i ) ≤ Ed(σ|Yk+1) − Ed(σ|Yk)","inline":true},{"text":".","element":"span"}],[{"text":"Since ","element":"span"},{"style":{"height":14.7},"width":176.8,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-37.png","element":"img","alt":" Yk ⊆ Yk+1","inline":true,"padRight":true},{"text":"implies ","element":"span"},{"style":{"height":17.63},"width":392.8,"height":44.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-38.png","element":"img","alt":" Ed(σ|Yk) ≤ Ed(σ|Yk+1)","inline":true},{"text":", it follows that ","element":"span"},{"style":{"height":18.19},"width":310.52,"height":45.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-39.png","element":"img","alt":" a2i−1 ≤ a(k)i ≤ a2i","inline":true,"padRight":true},{"text":"(for each ","element":"span"},{"style":{"height":13.2},"width":61.72,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-40.png","element":"img","alt":" 0 ≤","inline":true}],[{"style":{"height":13.2},"width":154.4,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-41.png","element":"img","alt":"i ≤ d + 1","inline":true},{"text":"), where ","element":"span"},{"style":{"height":9.11},"width":88.96,"height":22.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-42.png","element":"img","alt":" a2i−1","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":9.11},"width":47.96,"height":22.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-43.png","element":"img","alt":" a2i","inline":true,"padRight":true},{"text":"are the roots of the equation ","element":"span"},{"style":{"height":17.63},"width":597.28,"height":44.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-44.png","element":"img","alt":" |φk+1(y)| = Ed(σ|Yk+1) − Ed(σ|Yk)","inline":true},{"text":".","element":"span"}],[{"text":"If ","element":"span"},{"style":{"height":17.63},"width":396.64,"height":44.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-45.png","element":"img","alt":" Ed(σ|Yk+1) = Ed(σ|Yk)","inline":true},{"text":", then ","element":"span"},{"style":{"height":10.71},"width":179.24,"height":26.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-46.png","element":"img","alt":" σk+1 = σk","inline":true,"padRight":true},{"text":"by definition, so we could set ","element":"span"},{"style":{"height":18.38},"width":226.08,"height":45.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-47.png","element":"img","alt":" a(k+1)i = a(k)i","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"text":"i","element":"span"},{"text":",","element":"span"}],[{"text":"i.e. there is nothing to prove in this case. Henceforth, assume ","element":"span"},{"style":{"height":17.63},"width":392.8,"height":44.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-48.png","element":"img","alt":" Ed(σ|Yk+1) ̸= Ed(σ|Yk)","inline":true},{"text":", and consider","element":"span"}],[{"text":"the polynomial function","element":"span"}],[{"style":{"width":"34%"},"width":544,"height":94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-49.png","element":"img"}],[{"text":"It then follows from ","element":"span"},{"href":"#id-54","referenceIndex":13,"text":"(Kadec, ","element":"a"},{"href":"#id-54","referenceIndex":13,"text":"1963, ","element":"a"},{"text":"Lem. 2) that","element":"span"}],[{"style":{"width":"81%"},"width":1299,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-50.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":10.8},"width":19,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-51.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"is an arbitrary real number satisfying ","element":"span"},{"style":{"height":19.5},"width":166.72,"height":48.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/13-52.png","element":"img","alt":" 0 < θ < 12","inline":true},{"text":".","element":"span"}],[{"style":{"width":"15%"},"width":238,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-0.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":22.77},"width":339.04,"height":56.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-1.png","element":"img","alt":" limk→∞ Ed(σ|Yk) = ∞","inline":true,"padRight":true},{"text":"by assumption, the infinite product","element":"span"}],[{"style":{"width":"27%"},"width":439,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-2.png","element":"img"}],[{"text":"the series","element":"span"}],[{"style":{"width":"83%"},"width":1316,"height":239,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-3.png","element":"img"}],[{"text":"hence","element":"span"}],[{"style":{"width":"85%"},"width":1359,"height":67,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-4.png","element":"img"}],[{"text":"the convergent series","element":"span"}],[{"style":{"width":"39%"},"width":625,"height":148,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-5.png","element":"img"}],[{"text":"Therefore, the assertion follows by letting ","element":"span"},{"style":{"height":19.51},"width":138.44,"height":48.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-6.png","element":"img","alt":" γ = 1+τD","inline":true,"padRight":true},{"text":".","element":"span"}],[{"id":"id-55","style":{"width":"4%"},"width":79,"height":18,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-7.png","element":"img"}],[{"text":"Proof of Lemma ","element":"span"},{"href":"#id-39","text":"5.9. ","element":"a"},{"text":"Fix ","element":"span"},{"style":{"height":11.6},"width":104,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-8.png","element":"img","alt":" ε > 0","inline":true},{"text":". By Theorem ","element":"span"},{"href":"#id-52","referenceIndex":86,"text":"B.1, ","element":"a"},{"text":"we have ","element":"span"},{"text":"lim inf","element":"span"}],[{"text":"Thus, by the definition of ","element":"span"},{"text":"lim inf","element":"span"},{"text":", there exists a subsequence ","element":"span"},{"style":{"height":16.03},"width":128.48,"height":40.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-9.png","element":"img","alt":" {k′t}t∈N","inline":true,"padRight":true},{"text":"of ","element":"span"},{"text":"N ","element":"span"},{"text":"such that","element":"span"}],[{"style":{"width":"14%"},"width":222,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-10.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":11.6},"width":92.36,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-11.png","element":"img","alt":" t ∈ N","inline":true,"padRight":true},{"text":"(given any ","element":"span"},{"style":{"height":14.4},"width":95.84,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-12.png","element":"img","alt":" γ > 0","inline":true},{"text":"). Since ","element":"span"},{"style":{"height":13.1},"width":40.04,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-13.png","element":"img","alt":" λk","inline":true,"padRight":true},{"text":"is at least ","element":"span"},{"style":{"height":16},"width":102.4,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-14.png","element":"img","alt":" Ω(kγ)","inline":true,"padRight":true},{"text":"for some ","element":"span"},{"style":{"height":14.4},"width":95.84,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-15.png","element":"img","alt":" γ > 0","inline":true},{"text":", we can use this particular","element":"span"}],[{"style":{"width":"100%"},"width":1585,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-16.png","element":"img"}],[{"style":{"height":16},"width":159.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-17.png","element":"img","alt":"|∆kt| < ε","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":11.6},"width":91.88,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-18.png","element":"img","alt":" t ∈ N","inline":true},{"text":". Since ","element":"span"},{"style":{"height":13.2},"width":93.92,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-19.png","element":"img","alt":" d ≥ 2","inline":true,"padRight":true},{"text":"by assumption, it then follows that","element":"span"}],[{"style":{"width":"77%"},"width":1226,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-20.png","element":"img"}],[{"text":"Now ","element":"span"},{"style":{"height":10.64},"width":118.52,"height":26.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-21.png","element":"img","alt":" σ − σkt","inline":true,"padRight":true},{"text":"is continuous, so by the definition of ","element":"span"},{"style":{"height":21.07},"width":75.84,"height":52.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-22.png","element":"img","alt":" a(kt)i","inline":true,"padRight":true},{"text":", there is some ","element":"span"},{"style":{"height":20.88},"width":312,"height":52.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-23.png","element":"img","alt":" a(kt)1 < ykt < a(kt)2","inline":true,"padRight":true},{"text":"such that","element":"span"}],[{"style":{"height":16},"width":298.24,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-24.png","element":"img","alt":"σ(ykt) = σkt(ykt)","inline":true},{"text":". From ","element":"span"},{"href":"#id-55","text":"(10)","element":"a"},{"text":", we thus infer that","element":"span"},{"style":{"height":28.18},"width":550.84,"height":70.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-25.png","element":"img","alt":"min{|ykt−y′kt|,|ykt−y′′kt|}λkt > 1d+1 −ε","inline":true,"padRight":true},{"text":"as desired.","element":"span"}],[{"text":"C ","element":"span"},{"text":"P","element":"span"},{"text":"ROOFS OF REMAINING LEMMAS","element":"span"}],[{"text":"C.1 ","element":"span"},{"text":"P","element":"span"},{"text":"ROOF OF ","element":"span"},{"text":"L","element":"span"},{"text":"EMMA ","element":"span"},{"href":"#id-40","text":"5.10","element":"a"}],[{"text":"Theorem ","element":"span"},{"href":"#id-47","text":"2.2 ","element":"a"},{"text":"gives ","element":"span"},{"style":{"height":21.42},"width":669.76,"height":53.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-26.png","element":"img","alt":" ∥σk−σ∥∞,Yk = Ed(σ|Yk) ≤ 6ωσ|Yk ( λk2d )","inline":true},{"text":". Recall that any modulus of continuity","element":"span"}],[{"style":{"height":11.5},"width":41.96,"height":28.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-27.png","element":"img","alt":"ωf","inline":true,"padRight":true},{"text":"is subadditive (i.e. ","element":"span"},{"style":{"height":16.7},"width":471.04,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-28.png","element":"img","alt":" ωf(x + y) ≤ ωf(x) + ωf(y)","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"text":"x, y","element":"span"},{"text":"); see ","element":"span"},{"href":"#id-13","referenceIndex":27,"text":"(Rivlin","element":"a"},{"text":", ","element":"span"},{"href":"#id-13","referenceIndex":27,"text":"1981, ","element":"a"},{"text":"Chap. 1). Thus","element":"span"}],[{"text":"for fixed ","element":"span"},{"text":"d","element":"span"},{"text":", we have ","element":"span"},{"style":{"height":21.62},"width":324.16,"height":54.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-29.png","element":"img","alt":" ωσ|Yk ( λk2d ) ∈ O(λk)","inline":true},{"text":", which implies","element":"span"},{"style":{"height":20.82},"width":538.72,"height":52.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-30.png","element":"img","alt":"�k �→ ∥σk − σ∥∞,Yk�∈ o(λ1+γk )","inline":true},{"text":".","element":"span"}],[{"text":"C.2 ","element":"span"},{"text":"P","element":"span"},{"text":"ROOF OF ","element":"span"},{"text":"L","element":"span"},{"text":"EMMA ","element":"span"},{"href":"#id-43","text":"5.11","element":"a"}],[{"text":"Our proof of Lemma ","element":"span"},{"href":"#id-43","text":"5.11 ","element":"a"},{"text":"is a straightforward application of both the Cayley–Menger determinant","element":"span"}],[{"text":"formula and the Leibniz determinant formula. For each ","element":"span"},{"style":{"height":13.2},"width":175.68,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-31.png","element":"img","alt":" 0 ≤ i ≤ N","inline":true},{"text":", let ","element":"span"},{"style":{"height":16},"width":384.32,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-32.png","element":"img","alt":"�Si(λ) := S(λ)\\{pi(λ)}","inline":true},{"text":", and","element":"span"}],[{"text":"let ","element":"span"},{"style":{"height":16},"width":100.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-33.png","element":"img","alt":"�∆i(λ)","inline":true,"padRight":true},{"text":"be the convex hull of ","element":"span"},{"style":{"height":16},"width":91.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-34.png","element":"img","alt":"�Si(λ)","inline":true},{"text":". Let ","element":"span"},{"style":{"height":16},"width":146.08,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-35.png","element":"img","alt":" V(∆(λ))","inline":true,"padRight":true},{"text":"(resp. ","element":"span"},{"style":{"height":16},"width":159.52,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-36.png","element":"img","alt":" V(�∆i(λ))","inline":true},{"text":") denote the ","element":"span"},{"text":"N","element":"span"},{"text":"-dimensional","element":"span"}],[{"text":"(resp. ","element":"span"},{"style":{"height":16},"width":144.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-37.png","element":"img","alt":" (N − 1)","inline":true},{"text":"-dimensional) volume of ","element":"span"},{"style":{"height":16},"width":83.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-38.png","element":"img","alt":" ∆(δ)","inline":true,"padRight":true},{"text":"(resp. ","element":"span"},{"style":{"height":16},"width":100.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-39.png","element":"img","alt":"�∆i(λ)","inline":true},{"text":"). Define the ","element":"span"},{"text":"(","element":"span"},{"text":"N ","element":"span"},{"text":"+ 2)","element":"span"},{"text":"-by-","element":"span"},{"text":"(","element":"span"},{"text":"N ","element":"span"},{"text":"+ 2)","element":"span"}],[{"text":"matrix ","element":"span"},{"style":{"height":16.7},"width":472,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-40.png","element":"img","alt":" M(λ) = [Mi,j(λ)]0≤i,j≤N+1","inline":true,"padRight":true},{"text":"as follows: ","element":"span"},{"style":{"height":18.06},"width":460.96,"height":45.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-41.png","element":"img","alt":" Mi,j(λ) = ∥pi(λ)− pj(λ)∥22","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":14},"width":223.64,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-42.png","element":"img","alt":" 0 ≤ i, j ≤ N;","inline":true}],[{"style":{"height":16.71},"width":494.72,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-43.png","element":"img","alt":"MN+1,i(λ) = Mi,N+1(λ) = 1","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.2},"width":175.68,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-44.png","element":"img","alt":" 0 ≤ i ≤ N","inline":true},{"text":"; and ","element":"span"},{"style":{"height":16.71},"width":314.72,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-45.png","element":"img","alt":" MN+1,N+1(λ) = 0","inline":true},{"text":".","element":"span"}],[{"text":"The Cayley–Menger determinant formula gives ","element":"span"},{"style":{"height":26.64},"width":570.4,"height":66.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-46.png","element":"img","alt":" [V(∆(λ))]2 = (−1)N+1(N!)22N det(M(λ))","inline":true},{"text":". Analogously,","element":"span"}],[{"text":"if we let ","element":"span"},{"style":{"height":16},"width":108.64,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-47.png","element":"img","alt":" M ′(λ)","inline":true,"padRight":true},{"text":"be the square submatrix of ","element":"span"},{"style":{"height":16},"width":97.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/14-48.png","element":"img","alt":" M(λ)","inline":true,"padRight":true},{"text":"obtained by deleting the first row and column from","element":"span"}],[{"style":{"height":16},"width":81.56,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-0.png","element":"img","alt":"M(λ","inline":true},{"text":")","element":"span"},{"text":", then ","element":"span"},{"text":"[","element":"span"},{"style":{"height":16},"width":132.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-1.png","element":"img","alt":"V(�∆0(λ","inline":true},{"text":"))]","element":"span"},{"style":{"height":7.6},"width":16,"height":19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-2.png","element":"img","alt":"2","inline":true,"padRight":true},{"text":"= ","element":"span"},{"style":{"height":26.45},"width":218.48,"height":66.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-3.png","element":"img","alt":"(−1)N((N−1)!)22N−1","inline":true,"padRight":true},{"text":"det(","element":"span"},{"style":{"height":16},"width":92.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-4.png","element":"img","alt":"M ′(λ","inline":true},{"text":"))","element":"span"},{"text":". Now, ","element":"span"},{"text":"V","element":"span"},{"text":"(∆(","element":"span"},{"style":{"height":10.8},"width":23,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-5.png","element":"img","alt":"λ","inline":true},{"text":")) = ","element":"span"},{"style":{"height":19.31},"width":275.96,"height":48.28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-6.png","element":"img","alt":" 1N V(�∆0(λ))h0(λ","inline":true},{"text":")","element":"span"},{"text":", so","element":"span"}],[{"style":{"height":37.7},"width":457.88,"height":94.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-7.png","element":"img","alt":"[h0(λ)]2 = −12Ndet(M(λ))det M ′(λ) .","inline":true,"padRight":true},{"text":"(11) Without loss of generality, assume that ","element":"span"},{"style":{"height":12.8},"width":224.59,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-8.png","element":"img","alt":" r0 ≥ r1 ≥ . . .","inline":true,"padRight":true},{"text":". Also, for any integer ","element":"span"},{"style":{"height":13.2},"width":95.36,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-9.png","element":"img","alt":" k ≥ 0","inline":true},{"text":", let ","element":"span"},{"style":{"height":13.51},"width":50.12,"height":33.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-10.png","element":"img","alt":" Sk","inline":true,"padRight":true},{"text":"be the set of all permutations on ","element":"span"},{"text":"{","element":"span"},{"text":"0","element":"span"},{"text":", . . . , k","element":"span"},{"text":"}","element":"span"},{"text":", and let ","element":"span"},{"style":{"height":15.81},"width":50.12,"height":39.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-11.png","element":"img","alt":" S′k","inline":true,"padRight":true},{"text":"be the subset of ","element":"span"},{"style":{"height":13.51},"width":50.12,"height":33.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-12.png","element":"img","alt":" Sk","inline":true,"padRight":true},{"text":"consisting of all permutations that ","element":"span"},{"text":"are not derangements. (Recall that ","element":"span"},{"style":{"height":13.5},"width":125,"height":33.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-13.png","element":"img","alt":" τ ∈ Sk","inline":true,"padRight":true},{"text":"is called a derangement if ","element":"span"},{"style":{"height":16},"width":138.32,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-14.png","element":"img","alt":" τ(i) ̸= i","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.2},"width":169.8,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-15.png","element":"img","alt":" 0 ≤ i ≤ k","inline":true},{"text":".) The diagonal entries of ","element":"span"},{"style":{"height":16},"width":97.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-16.png","element":"img","alt":" M(λ)","inline":true,"padRight":true},{"text":"are all zeros, so by the Leibniz determinant formula, we get","element":"span"}],[{"style":{"width":"46%"},"width":736,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-17.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":16},"width":110.56,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-18.png","element":"img","alt":" sgn(τ)","inline":true,"padRight":true},{"text":"denotes the sign of the permutation ","element":"span"},{"style":{"height":6.8},"width":21,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-19.png","element":"img","alt":" τ","inline":true},{"text":". Note that ","element":"span"},{"style":{"height":18.86},"width":452.32,"height":47.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-20.png","element":"img","alt":" Mi,j(λ) ∈ Θ(λ2 max{ri,rj})","inline":true,"padRight":true},{"text":"for all","element":"span"}],[{"style":{"height":14},"width":217.44,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-21.png","element":"img","alt":"0 ≤ i, j ≤ N","inline":true,"padRight":true},{"text":"satisfying ","element":"span"},{"style":{"height":15.2},"width":86.12,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-22.png","element":"img","alt":" i ̸= j","inline":true},{"text":". (Here, ","element":"span"},{"style":{"height":10.8},"width":31,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-23.png","element":"img","alt":" Θ","inline":true,"padRight":true},{"text":"refers to ","element":"span"},{"style":{"height":10.8},"width":31,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-24.png","element":"img","alt":" Θ","inline":true},{"text":"-complexity.) Consequently, using the fact that","element":"span"}],[{"style":{"height":16.7},"width":440.48,"height":41.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-25.png","element":"img","alt":"Mi,N+1(λ) = MN+1,i = 1","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.2},"width":175.68,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-26.png","element":"img","alt":" 0 ≤ i ≤ N","inline":true},{"text":", we get that ","element":"span"},{"style":{"height":17.36},"width":385.6,"height":43.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-27.png","element":"img","alt":" det(M(λ)) ∈ Θ(λ2RN )","inline":true},{"text":", where","element":"span"}],[{"style":{"width":"100%"},"width":1585,"height":428,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-28.png","element":"img"}],[{"text":"C.3 ","element":"span"},{"text":"P","element":"span"},{"text":"ROOF OF ","element":"span"},{"text":"L","element":"span"},{"text":"EMMA ","element":"span"},{"href":"#id-38","text":"5.12","element":"a"}],[{"text":"Consider any open neighborhood ","element":"span"},{"text":"U ","element":"span"},{"text":"of ","element":"span"},{"style":{"height":12.7},"width":55.04,"height":31.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-29.png","element":"img","alt":" 0M","inline":true},{"text":". Since ","element":"span"},{"style":{"height":10},"width":26,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-30.png","element":"img","alt":" ϕ","inline":true,"padRight":true},{"text":"is open and ","element":"span"},{"style":{"height":16},"width":228.6,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-31.png","element":"img","alt":" ϕ(0M) = 0N","inline":true},{"text":", the image ","element":"span"},{"style":{"height":16},"width":88.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-32.png","element":"img","alt":" ϕ(U)","inline":true}],[{"text":"must contain an open neighborhood of ","element":"span"},{"style":{"height":12.7},"width":50.04,"height":31.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-33.png","element":"img","alt":" 0N","inline":true},{"text":". Thus for any ","element":"span"},{"style":{"height":11.6},"width":91.52,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-34.png","element":"img","alt":" ε > 0","inline":true},{"text":", we can always choose ","element":"span"},{"text":"N ","element":"span"},{"text":"+ 1 ","element":"span"},{"text":"points","element":"span"}],[{"style":{"height":10},"width":190.2,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-35.png","element":"img","alt":"w0, . . . , wN","inline":true,"padRight":true},{"text":"in ","element":"span"},{"style":{"height":17.36},"width":185.6,"height":43.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-36.png","element":"img","alt":" BMε \\{0M}","inline":true},{"text":", such that the convex hull of ","element":"span"},{"style":{"height":16},"width":347.84,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-37.png","element":"img","alt":" {ϕ(w0), . . . , ϕ(wN)}","inline":true,"padRight":true},{"text":"contains the point","element":"span"}],[{"style":{"height":12.71},"width":50.04,"height":31.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-38.png","element":"img","alt":"0N","inline":true},{"text":". Since ","element":"span"},{"style":{"height":16},"width":261.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-39.png","element":"img","alt":" ϕ(λx) ≥ λϕ(x)","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":15.76},"width":257.6,"height":39.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-40.png","element":"img","alt":" x ∈ RM, λ > 0","inline":true},{"text":", and since ","element":"span"},{"style":{"height":10},"width":26,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-41.png","element":"img","alt":" ϕ","inline":true,"padRight":true},{"text":"is continuous, it then follows from","element":"span"}],[{"text":"definition that for every ","element":"span"},{"style":{"height":11.6},"width":100.04,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-42.png","element":"img","alt":" k ∈ N","inline":true},{"text":", we can choose ","element":"span"},{"text":"N","element":"span"},{"text":"+1 ","element":"span"},{"text":"points ","element":"span"},{"style":{"height":18.57},"width":220.8,"height":46.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-43.png","element":"img","alt":" u(k)0 , . . . , u(k)N","inline":true,"padRight":true},{"text":"in ","element":"span"},{"style":{"height":13.1},"width":44.36,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-44.png","element":"img","alt":" Uk","inline":true},{"text":", such that the convex","element":"span"}],[{"text":"hull of ","element":"span"},{"style":{"height":18.77},"width":489.44,"height":46.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-45.png","element":"img","alt":" U ′k := {ϕ(u(k)0 ), . . . , ϕ(u(k)N )}","inline":true,"padRight":true},{"text":"contains ","element":"span"},{"style":{"height":12.7},"width":50.04,"height":31.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-46.png","element":"img","alt":" 0N","inline":true},{"text":". Define ","element":"span"},{"style":{"height":19.69},"width":574.4,"height":49.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-47.png","element":"img","alt":" rk := sup{r > 0 : BNr ⊆ ϕ(Bmλk)}","inline":true,"padRight":true},{"text":"for","element":"span"}],[{"text":"each ","element":"span"},{"style":{"height":11.6},"width":99.56,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-48.png","element":"img","alt":" k ∈ N","inline":true},{"text":", and note also that ","element":"span"},{"style":{"height":13.5},"width":275.68,"height":33.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-49.png","element":"img","alt":" limk→∞ rk = ∞","inline":true},{"text":". Thus, given a ball ","element":"span"},{"style":{"height":17.39},"width":59.16,"height":43.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-50.png","element":"img","alt":" BNr","inline":true,"padRight":true},{"text":"of any desired radius, there","element":"span"}],[{"text":"is some (sufficiently large) ","element":"span"},{"text":"k ","element":"span"},{"text":"such that the convex hull of ","element":"span"},{"style":{"height":15.41},"width":45.68,"height":38.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-51.png","element":"img","alt":" U ′k","inline":true,"padRight":true},{"text":"contains ","element":"span"},{"style":{"height":17.39},"width":59.16,"height":43.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-52.png","element":"img","alt":" BNr","inline":true,"padRight":true},{"text":".","element":"span"}],[{"text":"Now, since ","element":"span"},{"style":{"height":20.79},"width":334.28,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-53.png","element":"img","alt":" θλk < ∥u(k)j ∥2 ≤ λk","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":20.79},"width":349.12,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-54.png","element":"img","alt":" ϕ(λu(k)j ) ≥ λϕ(u(k)j )","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":14},"width":297.44,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-55.png","element":"img","alt":" 0 ≤ j ≤ N, λ > 0","inline":true},{"text":", we infer that","element":"span"}],[{"text":"none of the points ","element":"span"},{"style":{"height":18.58},"width":337.6,"height":46.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-56.png","element":"img","alt":" ϕ(u(k)0 ), . . . , ϕ(u(k)N )","inline":true,"padRight":true},{"text":"are contained in the ball ","element":"span"},{"style":{"height":19.7},"width":76.44,"height":49.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-57.png","element":"img","alt":" BNθrk","inline":true},{"text":". Consequently, as ","element":"span"},{"style":{"height":11.2},"width":129.76,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-58.png","element":"img","alt":" k → ∞","inline":true},{"text":",","element":"span"}],[{"text":"we have ","element":"span"},{"style":{"height":13.1},"width":159.04,"height":32.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-59.png","element":"img","alt":" θrk → ∞","inline":true},{"text":", and therefore the barycentric coordinate vector ","element":"span"},{"style":{"height":16},"width":202.24,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-60.png","element":"img","alt":" (b0, . . . , bN)","inline":true,"padRight":true},{"text":"(w.r.t. ","element":"span"},{"style":{"height":15.41},"width":45.68,"height":38.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-61.png","element":"img","alt":" U ′k","inline":true},{"text":") of every","element":"span"}],[{"text":"point in the fixed ball ","element":"span"},{"style":{"height":17.39},"width":59.16,"height":43.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-62.png","element":"img","alt":" BNτ","inline":true,"padRight":true},{"text":"would converge to ","element":"span"},{"style":{"height":19.5},"width":195.52,"height":48.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-63.png","element":"img","alt":" ( 1N , . . . , 1N )","inline":true,"padRight":true},{"text":"(which is the barycentric coordinate vector","element":"span"}],[{"text":"of the barycenter w.r.t. ","element":"span"},{"style":{"height":15.41},"width":45.68,"height":38.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-64.png","element":"img","alt":" U ′k","inline":true},{"text":"); this proves our assertion.","element":"span"}],[{"text":"D ","element":"span"},{"text":"C","element":"span"},{"text":"ONJECTURED OPTIMALITY OF UPPER BOUND ","element":"span"},{"style":{"height":19.6},"width":201.24,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-65.png","element":"img","alt":" O(ε−n) IN","inline":true,"padRight":true},{"text":"T","element":"span"},{"text":"HEOREM ","element":"span"},{"href":"#id-14","text":"3.2","element":"a"}],[{"text":"It was conjectured by ","element":"span"},{"href":"#id-8","referenceIndex":21,"text":"Mhaskar ","element":"a"},{"href":"#id-8","referenceIndex":21,"text":"(1996) ","element":"a"},{"text":"that there exists some smooth non-polynomial function ","element":"span"},{"style":{"height":6.8},"width":23,"height":17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-66.png","element":"img","alt":" σ","inline":true},{"text":",","element":"span"}],[{"text":"such that at least ","element":"span"},{"style":{"height":16},"width":125.44,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-67.png","element":"img","alt":" Ω(ε−n)","inline":true,"padRight":true},{"text":"hidden units is required to uniformly approximate every function in the","element":"span"}],[{"text":"class ","element":"span"},{"text":"S ","element":"span"},{"text":"of ","element":"span"},{"style":{"height":13.36},"width":47.2,"height":33.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-68.png","element":"img","alt":" C1","inline":true,"padRight":true},{"text":"functions with boun","element":"span"},{"href":"#id-8","referenceIndex":21,"text":"ded Sobol","element":"a"},{"href":"#id-8","referenceIndex":21,"text":"ev nor","element":"a"},{"text":"m. As evidence that thi","element":"span"},{"href":"#id-56","referenceIndex":5,"text":"s conjecture is ","element":"a"},{"href":"#id-56","referenceIndex":5,"text":"true, ","element":"a"},{"text":"a","element":"span"}],[{"text":"heur","element":"span"},{"href":"#id-9","referenceIndex":24,"text":"istic arg","element":"a"},{"href":"#id-9","referenceIndex":24,"text":"ument ","element":"a"},{"text":"was provided in ","element":"span"},{"href":"#id-8","referenceIndex":21,"text":"(Mhaskar, ","element":"a"},{"href":"#id-8","referenceIndex":21,"text":"1996)","element":"a"},{"text":", which uses a result by ","element":"span"},{"href":"#id-56","referenceIndex":5,"text":"DeVore et al. ","element":"a"},{"href":"#id-56","referenceIndex":5,"text":"(1989)","element":"a"},{"text":";","element":"span"}],[{"text":"cf. ","element":"span"},{"href":"#id-9","referenceIndex":24,"text":"(Pinkus, ","element":"a"},{"href":"#id-9","referenceIndex":24,"text":"1999, ","element":"a"},{"text":"Thm. 6.5). To the best of our knowledge, this conjecture remains open. If","element":"span"}],[{"text":"this conjecture is indeed true, then our upper bound ","element":"span"},{"style":{"height":16},"width":129.76,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-69.png","element":"img","alt":" O(ε−n)","inline":true,"padRight":true},{"text":"in Theorem ","element":"span"},{"href":"#id-14","text":"3.2 ","element":"a"},{"text":"is optimal for general","element":"span"}],[{"text":"continuous non-polynomial activation functions.","element":"span"}],[{"text":"For specific activation functions, such as the logistic sigmoid function, or any polynomial spline","element":"span"}],[{"text":"function of fixed degree with finitely many knots (e.g. the ReLU function), it is known that the","element":"span"}],[{"text":"minimum number ","element":"span"},{"text":"N ","element":"span"},{"text":"of hidd","element":"span"},{"href":"#id-57","referenceIndex":19,"text":"en units required ","element":"a"},{"href":"#id-57","referenceIndex":19,"text":"to unif","element":"a"},{"text":"orml","element":"span"},{"href":"#id-9","referenceIndex":24,"text":"y appro","element":"a"},{"href":"#id-9","referenceIndex":24,"text":"ximate ","element":"a"},{"text":"every function in ","element":"span"},{"text":"S ","element":"span"},{"text":"must","element":"span"}],[{"text":"satisfy ","element":"span"},{"style":{"height":16},"width":342.4,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.06505/images/15-70.png","element":"img","alt":" (N log N) ∈ Ω(ε−n)","inline":true,"padRight":true},{"href":"#id-57","referenceIndex":19,"text":"(Maiorov & Meir, ","element":"a"},{"href":"#id-57","referenceIndex":19,"text":"2000)","element":"a"},{"text":"; cf. ","element":"span"},{"href":"#id-9","referenceIndex":24,"text":"(Pinkus, ","element":"a"},{"href":"#id-9","referenceIndex":24,"text":"1999, ","element":"a"},{"text":"Thm. 6.7). Hence there is","element":"span"}],[{"text":"still a gap between the lower and upper bounds for ","element":"span"},{"text":"N ","element":"span"},{"text":"in these specific cases. It would be interesting","element":"span"}],[{"text":"to find optimal bounds for these cases.","element":"span"}]]}],"_version":"3.3.2"},"paperNode":"$1b:props:children:props:children:0:props:product"}]]]}]}]