35:[["$","audio",null,{"id":"tts"}],["$","$L3a",null,{"paperID":"1611.01146","publisher":"arxiv","paperJSON":{"title":"Finding Approximate Local Minima Faster than Gradient Descent","paperID":"1611.01146","avgLineHeight":13.56,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"We design a non-convex second-order optimization algorithm that is guaranteed to return an ","element":"span"},{"style":{"fontStyle":"italic"},"text":"approximate ","element":"span"},{"text":"local minimum in time which scales linearly in the underlying dimension and the number of training examples. The time complexity of our algorithm to find an approximate local minimum is even faster than that of gradient descent to find a critical point. Our algorithm applies to a general class of optimization problems including training a neural network and other non-convex objectives arising in machine learning.","element":"span"}]]},{"heading":"1 Introduction","paragraphs":[[{"text":"Finding a global minimizer of a non-convex optimization problem is NP-hard. Thus, the standard goal of efficient non-convex optimization algorithms is instead to find a local minimum. ","element":"span"},{"text":"This problem has become increasingly important as the state-of-the-art in machine learning is attained by non-convex models, many of which are variants of deep neural networks. Experiments in ","element":"span"},{"href":"#id-0","referenceIndex":10,"text":"[10, ","element":"a"},{"href":"#id-1","referenceIndex":11,"text":"11, ","element":"a"},{"href":"#id-2","referenceIndex":21,"text":"21] ","element":"a"},{"text":"suggest that fast convergence to a local minimum is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"sufficient ","element":"span"},{"text":"for training neural nets, while convergence to critical points (points with vanishing gradients) is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"not","element":"span"},{"text":". Theoretical works have also affirmed the same phenomenon for other machine learning problems (see ","element":"span"},{"href":"#id-3","referenceIndex":5,"text":"[5, ","element":"a"},{"href":"#id-4","referenceIndex":6,"text":"6, ","element":"a"},{"href":"#id-5","referenceIndex":18,"text":"18, ","element":"a"},{"href":"#id-6","referenceIndex":19,"text":"19] ","element":"a"},{"text":"and the references therein).","element":"span"}],[{"text":"In this paper we give a provable linear-time algorithm for finding an ","element":"span"},{"style":{"fontStyle":"italic"},"text":"approximate ","element":"span"},{"text":"local minimum in smooth non-convex optimization. It applies to a general setting of machine learning optimization, and in particular to the optimization problem of training deep neural networks. Furthermore, the running time bound of our algorithm is the fastest known even for the more lenient task of computing a point with vanishing gradient (called a critical point), for a wide range of parameters.","element":"span"}],[{"text":"Formally, the problem of unconstrained mathematical optimization is stated in general terms as that of finding the minimum value that a function attains over Euclidean space, i.e.","element":"span"}],[{"style":{"width":"55%"},"width":1033,"height":69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-0.png","element":"img"}],[{"text":"If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is convex, the above formulation is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"convex optimization ","element":"span"},{"text":"and is solvable in (randomized) polynomial time even if only a valuation oracle to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is provided. A crucial property of convex functions is that “local optimality implies global optimality”, allowing for greedy algorithms to reach the global optimum efficiently. Unfortunately, this is no longer the case if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is nonconvex; indeed, even a degree four polynomial can be NP-hard to optimize ","element":"span"},{"href":"#id-7","referenceIndex":23,"text":"[23]","element":"a"},{"text":", or even just to check whether a point is not a local minimum ","element":"span"},{"href":"#id-8","referenceIndex":25,"text":"[25]","element":"a"},{"text":". Thus, for non-convex optimization one has to settle for the more modest goal of reaching approximate local optimality efficiently.","element":"span"}],[{"text":"Note that a particular interest to machine learning is the optimization of functions ","element":"span"},{"style":{"height":19.14},"width":213.33,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-1.png","element":"img","alt":" f : Rd �→ R","inline":true,"padRight":true},{"text":"of the finite-sum form","element":"span"}],[{"style":{"width":"59%"},"width":1123,"height":119,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-2.png","element":"img"}],[{"text":"Such functions arise when minimizing loss over a training set, where each example ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"in the set corresponds to one loss function ","element":"span"},{"style":{"height":16.4},"width":33.38,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-3.png","element":"img","alt":" fi","inline":true,"padRight":true},{"text":"in the summation.","element":"span"}],[{"text":"We say that the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is second-order smooth if it has Lipschitz continuous gradient and Lipschitz continuous Hessian. ","element":"span"},{"text":"We say that a point ","element":"span"},{"style":{"height":12},"width":182.21,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-4.png","element":"img","alt":" x is an ε","inline":true},{"text":"-approximate local minimum if it satisfies (following the tradition of ","element":"span"},{"href":"#id-9","referenceIndex":28,"text":"[28]","element":"a"},{"text":"):","element":"span"}],[{"style":{"width":"39%"},"width":735,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-5.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":78.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-6.png","element":"img","alt":" ∥ · ∥","inline":true,"padRight":true},{"text":"denotes the Euclidean norm of a vector. We say that a point ","element":"span"},{"style":{"height":12},"width":172.63,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-7.png","element":"img","alt":" x is an ε","inline":true},{"text":"-critical point if it satisfies the gradient condition above, but not necessarily the second-order condition. Critical points include saddle points in addition to local minima. We remark that ","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-8.png","element":"img","alt":" ε","inline":true},{"text":"-approximate local minima (even with ","element":"span"},{"style":{"height":17.6},"width":1503.01,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-9.png","element":"img","alt":" ε = 0) are not necessarily close to any local minimum, neither in domain nor in","inline":true,"padRight":true},{"text":"function value. However, if we assume in addition the function satisfies the (robust) strict-saddle property ","element":"span"},{"href":"#id-10","referenceIndex":15,"text":"[15, ","element":"a"},{"href":"#id-11","referenceIndex":24,"text":"24] ","element":"a"},{"text":"(see Section ","element":"span"},{"text":"2 ","element":"span"},{"text":"for the precise definition), then an ","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-10.png","element":"img","alt":" ε","inline":true},{"text":"-approximate local minimum is guaranteed to be close to a local minimum for sufficiently small ","element":"span"},{"style":{"height":8.4},"width":32.36,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-11.png","element":"img","alt":" ε.","inline":true}],[{"text":"Our main theorem below states the time required for the proposed algorithm ","element":"span"},{"href":"#id-12","text":"FastCubic ","element":"a"},{"text":"to find an ","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/1-12.png","element":"img","alt":" ε","inline":true},{"text":"-approximate local minimum for second-order smooth functions.","element":"span"}],[{"style":{"width":"100%"},"width":1890,"height":401,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/2-0.png","element":"img"}],[{"text":"The full statement of ","element":"span"},{"href":"#id-13","text":"Theorem 1 ","element":"a"},{"text":"can be found in ","element":"span"},{"text":"Section 2.","element":"span"}],[{"text":"Hessian-vector products can be computed in linear time —meaning ","element":"span"},{"style":{"height":18.44},"width":457.69,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/2-1.png","element":"img","alt":" Th,1 = O(d) and Th =","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"nd","element":"span"},{"text":")— for many machine learning problems such as generalized linear models and training neural networks ","element":"span"},{"href":"#id-14","referenceIndex":1,"text":"[1, ","element":"a"},{"href":"#id-15","referenceIndex":29,"text":"29]","element":"a"},{"text":". We explain this more generally in ","element":"span"},{"href":"#id-16","text":"Appendix A. ","element":"a"},{"text":"Therefore,","element":"span"}],[{"style":{"width":"100%"},"width":1890,"height":277,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/2-2.png","element":"img"}],[{"text":"Another important aspect of our algorithm is that even in terms of just reaching an ","element":"span"},{"style":{"height":12.8},"width":165.86,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/2-3.png","element":"img","alt":" ε-critical","inline":true,"padRight":true},{"text":"point, i.e. a point that satisfies ","element":"span"},{"style":{"height":17.6},"width":257.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/2-4.png","element":"img","alt":" ∥∇f(x)∥ ≤ ε","inline":true,"padRight":true},{"text":"without any second-order guarantee, ","element":"span"},{"href":"#id-12","text":"FastCubic ","element":"a"},{"text":"is faster than all previous results (see ","element":"span"},{"href":"#id-17","text":"Table 1 ","element":"a"},{"text":"for a comparison).","element":"span"}],[{"text":"The fastest methods to find critical points for a smooth non-convex function are gradient descent and its extensions, jointly known as first-order methods. These methods are extremely efficient in terms of per-iteration complexity; however, they necessarily suffer from a 1","element":"span"},{"style":{"height":19.14},"width":59.18,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/2-5.png","element":"img","alt":"/ε2 ","inline":true,"padRight":true},{"text":"convergence rate ","element":"span"},{"href":"#id-18","referenceIndex":27,"text":"[27]","element":"a"},{"text":", to the best of our knowledge, in previous results only higher-order methods seem capable of breaking this 1","element":"span"},{"style":{"height":19.14},"width":59.19,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/2-6.png","element":"img","alt":"/ε2 ","inline":true,"padRight":true},{"text":"bottleneck ","element":"span"},{"href":"#id-9","referenceIndex":28,"text":"[28]","element":"a"},{"text":". For certain ranges of parameters, our ","element":"span"},{"href":"#id-12","text":"FastCubic ","element":"a"},{"text":"finds local minima even faster than first-order methods, even though they only find critical points. This is depicted in ","element":"span"},{"href":"#id-17","text":"Table 1.","element":"a"}],[{"id":"id-17","style":{"width":"75%"},"width":1848,"height":685,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/2-7.png","element":"img"}],[{"text":"Table 1: Comparison of known methods.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"56%"},"width":1377,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/2-8.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"1.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Related work","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Methods that Provably Reach Critical Points. ","element":"span"},{"text":"Recall that only a gradient oracle is needed to reach a critical point. The most commonly used algorithm in practice for training non-convex learning machines such as deep neural networks is stochastic gradient descent (SGD), also known as stochastic approximation ","element":"span"},{"href":"#id-19","referenceIndex":30,"text":"[30] ","element":"a"},{"text":"and its derivatives. Some practical enhancements widely used in practice are based on Nesterov’s acceleration ","element":"span"},{"href":"#id-20","referenceIndex":26,"text":"[26] ","element":"a"},{"text":"and adaptive regularization ","element":"span"},{"href":"#id-21","referenceIndex":12,"text":"[12]","element":"a"},{"text":". The variance reduction technique, introduced in ","element":"span"},{"href":"#id-22","referenceIndex":32,"text":"[32]","element":"a"},{"text":", was extremely successful in convex optimization, but only recently there was a non-convex counterpart with theoretical benefits introduced ","element":"span"},{"href":"#id-23","referenceIndex":2,"text":"[2]","element":"a"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Methods that Provably Reach Local Minima. ","element":"span"},{"text":"The recent work of Ge ","element":"span"},{"style":{"fontStyle":"italic"},"text":"et al. ","element":"span"},{"href":"#id-24","referenceIndex":17,"text":"[17] ","element":"a"},{"text":"showed that a noise-injected version of SGD in fact converges to local minima instead of critical points, as long as the underlying non-convex function is strict-saddle. Their theoretical running time is a large polynomial in the dimension and not competitive with our method (see ","element":"span"},{"href":"#id-17","text":"Table 1)","element":"a"},{"text":".","element":"span"}],[{"text":"The work of Lee et al. ","element":"span"},{"href":"#id-11","referenceIndex":24,"text":"[24] ","element":"a"},{"text":"shows that gradient descent, starting from a random point, almost surely converges to a local minimum of a strict-saddle function. The rates of convergence and precise step-sizes that are required are, however, yet unknown.","element":"span"}],[{"text":"If second-order information (i.e., the Hessian oracle) is provided, the cubic-regularization method of Nesterov and Polyak ","element":"span"},{"href":"#id-9","referenceIndex":28,"text":"[28] ","element":"a"},{"text":"converges in ","element":"span"},{"style":{"height":22.9},"width":116.82,"height":57.25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-0.png","element":"img","alt":" O( 1ε3/2","inline":true,"padRight":true},{"text":") iterations. However, each iteration of Nesterov- ","element":"span"},{"text":"Polyak requires solving a cubic function which, in general, takes time super-linear in the input representation.","element":"span"}],[{"text":"One natural direction is to apply an approximate trust region solver, such as the linear-time solver of ","element":"span"},{"href":"#id-25","referenceIndex":22,"text":"[22]","element":"a"},{"text":", to approximately solve the cubic regularization subroutine of Nesterov-Polyak. However, the approximation needed by a naive calculation makes this approach even slower than vanilla gradient descent. Our main challenge is to obtain approximate second-order local-minima and simultaneously improve upon gradient descent.","element":"span"}],[{"text":"Independently of this paper and concurrently","element":"span"},{"style":{"height":17.94},"width":298.03,"height":44.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-1.png","element":"img","alt":"1, Carmon et al.","inline":true,"padRight":true},{"href":"#id-26","referenceIndex":7,"text":"[7] ","element":"a"},{"text":"develop an accelerated gradient descent method that achieves the same running time for finding an approximate local minimum as in our paper. Remarkably, the same running time is obtained via a very different technique.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Our Techniques","element":"span"}],[{"text":"Our algorithm is based on the cubic regularization method of Nesterov and Polyak ","element":"span"},{"href":"#id-27","referenceIndex":8,"text":"[8, ","element":"a"},{"href":"#id-28","referenceIndex":9,"text":"9, ","element":"a"},{"href":"#id-9","referenceIndex":28,"text":"28]","element":"a"},{"text":". At a high level, cubic regularization states that if we can minimize a cubic function ","element":"span"},{"style":{"height":17.6},"width":286.37,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-2.png","element":"img","alt":" m(h) ≜ g⊤h +","inline":true}],[{"style":{"height":21.7},"width":308.96,"height":54.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-3.png","element":"img","alt":"2h⊤Hh + L6 ∥h∥3","inline":true,"padRight":true},{"text":"exactly, where ","element":"span"},{"style":{"height":19.14},"width":606.38,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-4.png","element":"img","alt":" g = ∇f(x), H = ∇2f(x), and L","inline":true,"padRight":true},{"text":"is the second-order smoothness of ","element":"span"},{"text":"the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":", then we can iteratively perform updates ","element":"span"},{"style":{"height":14},"width":209.75,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-5.png","element":"img","alt":" x′ ← x + h","inline":true},{"text":", and this algorithm converges to an ","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-6.png","element":"img","alt":" ε","inline":true},{"text":"-approximate local minimum in ","element":"span"},{"style":{"height":20.34},"width":166.4,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-7.png","element":"img","alt":" O(1/ε3/2","inline":true},{"text":") iterations. ","element":"span"},{"text":"Unfortunately, solving this cubic minimization problem exactly, to the best of our knowledge, requires a running time of ","element":"span"},{"style":{"height":17.6},"width":115.42,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-8.png","element":"img","alt":" O(dω)","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"height":8.4},"width":27,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-9.png","element":"img","alt":" ω","inline":true,"padRight":true},{"text":"is the matrix multiplication constant. Getting around this requires five observations.","element":"span"}],[{"text":"The ","element":"span"},{"style":{"fontStyle":"italic"},"text":"first observation ","element":"span"},{"text":"is that, minimizing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":") up to a constant multiplicative approximation (plus a few other constraints) is sufficient for showing an iteration complexity of ","element":"span"},{"style":{"height":20.34},"width":314.16,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-10.png","element":"img","alt":" O(1/ε3/2).2 The","inline":true,"padRight":true},{"text":"proof techniques to show this observation are based on extending Nesterov and Polyak.","element":"span"}],[{"text":"The ","element":"span"},{"style":{"fontStyle":"italic"},"text":"second observation ","element":"span"},{"text":"is that the minimizer ","element":"span"},{"style":{"height":17.6},"width":181.31,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-11.png","element":"img","alt":" h∗ of m(h","inline":true},{"text":") must be of the form ","element":"span"},{"style":{"height":18.74},"width":364.68,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-12.png","element":"img","alt":" h∗ = (H+λ∗I)+g+","inline":true},{"style":{"height":15.6},"width":273.94,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-13.png","element":"img","alt":"v, where λ∗ ≥","inline":true,"padRight":true},{"text":"0 is some constant satisfying ","element":"span"},{"style":{"height":15.6},"width":382.4,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-14.png","element":"img","alt":" H + λ∗I ⪰ 0, and v","inline":true,"padRight":true},{"text":"is the smallest eigenvector of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":9.2},"width":26,"height":23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/3-15.png","element":"img","alt":"+ ","inline":true,"padRight":true},{"text":"denotes the pseudo-inverse of a matrix. This can be viewed as moving in a mixture direction between choosing ","element":"span"},{"style":{"height":12.8},"width":115.06,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-0.png","element":"img","alt":" h ← v","inline":true},{"text":", and choosing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"to follow a shifted Newton’s direction ","element":"span"},{"style":{"height":18.74},"width":347.25,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-1.png","element":"img","alt":" h ← (H + λ∗I)+g.","inline":true,"padRight":true},{"text":"Intuitively, we wish to reduce both the computation of (","element":"span"},{"style":{"height":18.74},"width":315.23,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-2.png","element":"img","alt":"H+λ∗I)+g and v","inline":true,"padRight":true},{"text":"to Hessian-vector products.","element":"span"}],[{"text":"The first task of computing (","element":"span"},{"style":{"height":18.74},"width":223.4,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-3.png","element":"img","alt":"H + λ∗I)+g","inline":true,"padRight":true},{"text":"can be slow, and even if ","element":"span"},{"style":{"height":14},"width":157.03,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-4.png","element":"img","alt":" H + λ∗I","inline":true,"padRight":true},{"text":"is strictly positive-definite, computing it has a complexity depending on the (possibly huge) condition number of ","element":"span"},{"href":"#id-29","referenceIndex":34,"style":{"height":17.6},"width":665.87,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-5.png","element":"img","alt":"H+λ∗I [34]. The third observation","inline":true,"padRight":true},{"text":"is that it suffices to pick some ","element":"span"},{"style":{"height":13.2},"width":137.33,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-6.png","element":"img","alt":" λ′ > λ∗ ","inline":true,"padRight":true},{"text":"so both (1) the condition number of ","element":"span"},{"style":{"height":14},"width":140.9,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-7.png","element":"img","alt":" H+λ′I","inline":true,"padRight":true},{"text":"is small and (2) the vectors (","element":"span"},{"style":{"height":19.14},"width":570.28,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-8.png","element":"img","alt":"H+λ∗I)−1g and (H+λ′I)−1g","inline":true,"padRight":true},{"text":"are close. This relies on the structure of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":").","element":"span"}],[{"text":"The second task of computing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"text":"has a complexity depending on 1","element":"span"},{"style":{"height":20.08},"width":247.64,"height":50.21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-9.png","element":"img","alt":"/√δ where δ","inline":true,"padRight":true},{"text":"is the target additive error ","element":"span"},{"href":"#id-30","referenceIndex":13,"text":"[13, ","element":"a"},{"href":"#id-31","referenceIndex":14,"text":"14]","element":"a"},{"text":". The ","element":"span"},{"style":{"fontStyle":"italic"},"text":"fourth observation ","element":"span"},{"text":"is that the choice ","element":"span"},{"style":{"height":17.6},"width":136.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-10.png","element":"img","alt":" δ = √ε","inline":true,"padRight":true},{"text":"suffices for the outer loop of cubic regularization to make sufficient progress. This reduces the complexity to compute ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v","element":"span"},{"text":".","element":"span"}],[{"text":"Finally, finding the correct value ","element":"span"},{"style":{"height":12.8},"width":42.47,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-11.png","element":"img","alt":" λ∗ ","inline":true,"padRight":true},{"text":"itself is as hard as minimizing ","element":"span"},{"style":{"height":17.6},"width":508.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-12.png","element":"img","alt":" mt(h). The fifth step is to","inline":true,"padRight":true},{"text":"design an iterative scheme that makes only logarithmic number of guesses on ","element":"span"},{"style":{"height":12.8},"width":42.46,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-13.png","element":"img","alt":" λ∗","inline":true},{"text":". This procedure either finds the correct one (via binary search), or finds an approximate one, ","element":"span"},{"style":{"height":12.8},"width":41.46,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-14.png","element":"img","alt":" λ′","inline":true},{"text":", but satisfying (","element":"span"},{"style":{"height":19.14},"width":588.79,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-15.png","element":"img","alt":"H + λ∗I)−1g and (H + λ′I)−1g","inline":true,"padRight":true},{"text":"being sufficiently close.","element":"span"}],[{"text":"Putting all the observations together, and balancing all the parameters, we can obtain a cubic minimization subroutine (see ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"in ","element":"span"},{"href":"#id-32","text":"Algorithm 2) ","element":"a"},{"text":"that runs in time ","element":"span"},{"style":{"height":20.34},"width":379.53,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-16.png","element":"img","alt":" O(nd + n3/4d/ε1/4).","inline":true}]]},{"heading":"2 Preliminaries and Main Theorem","paragraphs":[[{"text":"We use ","element":"span"},{"style":{"height":17.6},"width":78.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-17.png","element":"img","alt":" ∥ · ∥","inline":true,"padRight":true},{"text":"to denote the Euclidean norm of a vector and the spectral norm of a matrix. For a symmetric matrix ","element":"span"},{"style":{"fontWeight":"bold"},"text":"M ","element":"span"},{"text":"we denote by ","element":"span"},{"style":{"height":17.6},"width":413.78,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-18.png","element":"img","alt":" λmax(M) and λmin(M","inline":true},{"text":") respectively the maximum and minimum eigenvalues of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"M","element":"span"},{"text":". We denote by ","element":"span"},{"style":{"height":14.8},"width":387.82,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-19.png","element":"img","alt":" A ⪰ B that A − B","inline":true,"padRight":true},{"text":"is positive semidefinite (PSD). For a PSD matrix ","element":"span"},{"style":{"fontWeight":"bold"},"text":"M","element":"span"},{"text":", we denote by ","element":"span"},{"style":{"height":14.33},"width":73.66,"height":35.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-20.png","element":"img","alt":" M+ ","inline":true,"padRight":true},{"text":"its pseudo-inverse if ","element":"span"},{"style":{"fontWeight":"bold"},"text":"M ","element":"span"},{"text":"is not strictly positive definite.","element":"span"}],[{"text":"We make the following Lipschitz continuity assumptions for the gradient and Hessian of the target function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":". Namely, there exist ","element":"span"},{"style":{"height":15.2},"width":143.9,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-21.png","element":"img","alt":" L2, L >","inline":true,"padRight":true},{"text":"0 such that","element":"span"}],[{"id":"id-62","style":{"width":"87%"},"width":1647,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-22.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Definition 2.1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"We assume the following complexity parameters on the access to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":":","element":"span"}],[{"style":{"width":"80%"},"width":1505,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-23.png","element":"img"}],[{"id":"id-36","style":{"fontWeight":"bold"},"text":"Definition 2.2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"We say that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is of finite-sum form if ","element":"span"},{"style":{"height":21.3},"width":746.94,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-24.png","element":"img","alt":" f = 1n�ni=1 fi(x) and ∥∇2fi(x)∥ ≤ L2","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"for each ","element":"span"},{"style":{"height":17.6},"width":122.27,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-25.png","element":"img","alt":" i ∈ [n]","inline":true},{"style":{"fontStyle":"italic"},"text":". In this case, we define ","element":"span"},{"style":{"height":17.24},"width":75.05,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-26.png","element":"img","alt":" Th,1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"to be the time complexity to compute","element":"span"},{"style":{"height":20.8},"width":286.19,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-27.png","element":"img","alt":"�∇2fi(x)�v for","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"arbitrary ","element":"span"},{"style":{"height":19.54},"width":403.67,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-28.png","element":"img","alt":" x, v ∈ Rd and i ∈ [n].","inline":true}],[{"text":"Next we define the strict-saddle function for which an ","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-29.png","element":"img","alt":" ε","inline":true},{"text":"-approximate local minimum is almost equivalent to a local minimum ","element":"span"},{"href":"#id-10","referenceIndex":15,"text":"[15, ","element":"a"},{"href":"#id-11","referenceIndex":24,"text":"24]","element":"a"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 2.3 ","element":"span"},{"text":"(strict saddle)","element":"span"},{"style":{"height":19.54},"width":478.95,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-30.png","element":"img","alt":". Suppose f(·) : Rd → R","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is twice differentiable. For ","element":"span"},{"style":{"height":16.4},"width":278.54,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-31.png","element":"img","alt":" α, β, γ ≥ 0, we","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"say ","element":"span"},{"style":{"height":17.6},"width":237.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-32.png","element":"img","alt":" f is (α, β, γ)","inline":true},{"style":{"fontStyle":"italic"},"text":"-strict saddle if every ","element":"span"},{"style":{"height":15.94},"width":127.84,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-33.png","element":"img","alt":" x ∈ Rd ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfies at least one of the following three conditions:","element":"span"}],[{"style":{"width":"79%"},"width":1484,"height":141,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-34.png","element":"img"}],[{"text":"We see that if a function is (","element":"span"},{"style":{"height":16.4},"width":117.95,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-35.png","element":"img","alt":"α, β, γ","inline":true},{"text":")-strict saddle, then for ","element":"span"},{"style":{"height":19.14},"width":398.53,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-36.png","element":"img","alt":" ε < min{α, β2} an ε","inline":true},{"text":"-approximate local minimum is ","element":"span"},{"style":{"height":11.6},"width":24,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/4-37.png","element":"img","alt":" γ","inline":true},{"text":"-close to some local minimum.","element":"span"}],[{"id":"id-12","style":{"width":"100%"},"width":1874,"height":626,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-0.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"2.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Main Results","element":"span"}],[{"text":"The finite-sum setting captures much of supervised learning, including Neural Networks and Generalized Linear Models. The main theorem which we show in our paper is as follows:","element":"span"}],[{"id":"id-13","style":{"width":"100%"},"width":1890,"height":502,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-1.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Two Known Subroutines. ","element":"span"},{"text":"Our running time of ","element":"span"},{"href":"#id-12","text":"FastCubic ","element":"a"},{"text":"relies on the following recent results for approximate matrix inverse and approximate PCA:","element":"span"}],[{"id":"id-93","style":{"fontWeight":"bold"},"text":"Theorem 2.4 ","element":"span"},{"text":"(Approximate Matrix Inverse)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose matrix ","element":"span"},{"style":{"height":19.54},"width":656.72,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-2.png","element":"img","alt":" M ∈ Rd×d satisfies ∥M∥ ≤ L2 and","inline":true},{"style":{"height":14.8},"width":243.79,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-3.png","element":"img","alt":"λI + M ⪰ δI","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"for constants ","element":"span"},{"style":{"height":22.01},"width":492.74,"height":55.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-4.png","element":"img","alt":" λ, δ, L2 > 0. Let κ ≜ λ+L2δ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":". Then, we can compute vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"style":{"fontStyle":"italic"},"text":"satisfying","element":"span"}],[{"id":"id-35","style":{"width":"63%"},"width":1189,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-5.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"using Accelerated gradient descent (AGD) in ","element":"span"},{"style":{"height":21.79},"width":317.46,"height":54.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-6.png","element":"img","alt":" O�κ1/2 log(κ/ε)�","inline":true},{"style":{"fontStyle":"italic"},"text":"iterations, each requiring ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"time plus the time needed to multiply ","element":"span"},{"style":{"fontWeight":"bold"},"text":"M ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with a vector.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Moreover, suppose ","element":"span"},{"style":{"height":21.3},"width":318.08,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-7.png","element":"img","alt":" M = 1n�ni=1 Mi","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"where each ","element":"span"},{"style":{"height":14.62},"width":59.66,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-8.png","element":"img","alt":" Mi","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is symmetric and satisfies ","element":"span"},{"style":{"height":17.6},"width":293.1,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-9.png","element":"img","alt":" ∥Mi∥ ≤ L2. If","inline":true},{"style":{"height":15.02},"width":80.19,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-10.png","element":"img","alt":"Mib","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"can be computed in time ","element":"span"},{"style":{"height":17.6},"width":293.5,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-11.png","element":"img","alt":" O(d′) for each i","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"b","element":"span"},{"style":{"fontStyle":"italic"},"text":", then accelerated SVRG ","element":"span"},{"href":"#id-33","referenceIndex":4,"style":{"fontStyle":"italic"},"text":"[4, ","element":"a"},{"href":"#id-34","referenceIndex":33,"style":{"fontStyle":"italic"},"text":"33] ","element":"a"},{"style":{"fontStyle":"italic"},"text":"computes a vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x ","element":"span"},{"style":{"fontStyle":"italic"},"text":"that satisfies equation ","element":"span"},{"href":"#id-35","style":{"fontStyle":"italic"},"text":"(2.2) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"in time ","element":"span"},{"style":{"height":21.79},"width":703.2,"height":54.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-12.png","element":"img","alt":" O�max{n, n3/4κ1/2} · d′ · log2 (κ/ε)�.","inline":true}],[{"style":{"width":"91%"},"width":1717,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-13.png","element":"img"}],[{"text":"Above, the SVRG based running time shall be used only towards our finite-sum case in ","element":"span"},{"href":"#id-36","text":"Definition 2.2.","element":"a"}],[{"id":"id-94","style":{"fontWeight":"bold"},"text":"Theorem 2.5 ","element":"span"},{"text":"(AppxPCA ","element":"span"},{"href":"#id-37","referenceIndex":3,"text":"[3, ","element":"a"},{"href":"#id-30","referenceIndex":13,"text":"13, ","element":"a"},{"href":"#id-31","referenceIndex":14,"text":"14]","element":"a"},{"text":")","element":"span"},{"style":{"height":15.94},"width":344.75,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-14.png","element":"img","alt":". Let M ∈ Rd×d ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a symmetric matrix with eigenvalues ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":15.24},"width":382.94,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-15.png","element":"img","alt":" ≥ λ1 ≥ · · · ≥ λd ≥ 0","inline":true},{"style":{"fontStyle":"italic"},"text":". With probability at least ","element":"span"},{"text":"1","element":"span"},{"style":{"height":11.6},"width":61.8,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-16.png","element":"img","alt":"−p","inline":true},{"style":{"fontStyle":"italic"},"text":", AppxPCA produces a unit vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"w ","element":"span"},{"style":{"fontStyle":"italic"},"text":"satisfying ","element":"span"},{"style":{"height":17.6},"width":691.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-17.png","element":"img","alt":"w⊤Mw ≥ (1 − δ×)(1 − ε)λmax(M) .","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"The total running time is ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":17.6},"width":421.34,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-18.png","element":"img","alt":"O(Tinverse(1/δ×, εδ×)).","inline":true}]]},{"heading":"3 Our Fast Cubic Regularization Algorithm","paragraphs":[[{"text":"Recall that the cubic regularization method of Nesterov and Polyak ","element":"span"},{"href":"#id-9","referenceIndex":28,"text":"[28] ","element":"a"},{"text":"studies the following upper bound on the change in objective value as we move from a point ","element":"span"},{"style":{"height":15.02},"width":231.06,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/5-19.png","element":"img","alt":" xt to xt + h","inline":true},{"text":": (it follows simply","element":"span"}],[{"text":"from the Taylor series truncated to the third order)","element":"span"}],[{"id":"id-42","style":{"width":"89%"},"width":1677,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-0.png","element":"img"}],[{"text":"Denote by ","element":"span"},{"style":{"height":12.8},"width":42.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-1.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"an arbitrary minimizer of ","element":"span"},{"style":{"height":17.6},"width":94.54,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-2.png","element":"img","alt":" mt(h","inline":true},{"text":"). We propose in this paper a subroutine ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"to minimizes ","element":"span"},{"style":{"height":17.6},"width":94.54,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-3.png","element":"img","alt":" mt(h","inline":true},{"text":") approximately. Note that ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"returns two vectors ","element":"span"},{"style":{"height":15.02},"width":308.41,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-4.png","element":"img","alt":" v and vmin. We","inline":true,"padRight":true},{"text":"then choose ","element":"span"},{"style":{"height":12.8},"width":41.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-5.png","element":"img","alt":" h′ ","inline":true,"padRight":true},{"text":"to be either ","element":"span"},{"style":{"height":22.3},"width":180.16,"height":55.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-6.png","element":"img","alt":" v or λvmin2L ","inline":true,"padRight":true},{"text":", whichever gives a smaller value for ","element":"span"},{"style":{"height":17.6},"width":123.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-7.png","element":"img","alt":" mt(h).","inline":true}],[{"text":"Before discussing the details of ","element":"span"},{"href":"#id-32","text":"FastCubicMin","element":"a"},{"text":", let us first state a main theorem for ","element":"span"},{"href":"#id-32","style":{"height":15.14},"width":280.28,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-8.png","element":"img","alt":" FastCubicMin:3","inline":true}],[{"id":"id-38","style":{"width":"100%"},"width":1890,"height":706,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-9.png","element":"img"}],[{"text":"Above, the first guarantee promises that we are either done (because ","element":"span"},{"style":{"height":17.6},"width":111.7,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-10.png","element":"img","alt":" mt(h∗","inline":true},{"text":") is close to zero), or we obtain a 1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"3000 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"multiplicative ","element":"span"},{"text":"approximation to ","element":"span"},{"style":{"height":17.6},"width":111.69,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-11.png","element":"img","alt":" mt(h∗","inline":true},{"text":"). Our second guarantee in ","element":"span"},{"href":"#id-38","text":"Theorem 2 ","element":"a"},{"text":"promises that when we are done (because ","element":"span"},{"style":{"height":17.6},"width":111.69,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-12.png","element":"img","alt":" mt(h∗","inline":true},{"text":") is close to zero), the output vector ","element":"span"},{"style":{"height":12.8},"width":252.32,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-13.png","element":"img","alt":" h′ and h∗ are","inline":true,"padRight":true},{"text":"roughly similar in Euclidean norm and have a small gradient ","element":"span"},{"style":{"height":17.6},"width":203.07,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-14.png","element":"img","alt":" ∥∇mt(h′)∥","inline":true},{"text":". Our third guarantee gives the time complexity of ","element":"span"},{"href":"#id-32","text":"FastCubicMin","element":"a"},{"text":".","element":"span"}],[{"text":"Now, our final algorithm ","element":"span"},{"href":"#id-12","text":"FastCubic ","element":"a"},{"text":"for finding the ","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-15.png","element":"img","alt":" ε","inline":true},{"text":"-approximate local minimum of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":") is included in ","element":"span"},{"href":"#id-12","text":"Algorithm 1. ","element":"a"},{"text":"It simply iteratively calls ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"to find an approximate minimizer, and it then stops whenever ","element":"span"},{"style":{"height":28.81},"width":285.72,"height":72.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-16.png","element":"img","alt":" mt(h′) > − ε3/2c√L ","inline":true,"padRight":true},{"text":"for some large constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c","element":"span"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Roadmap. ","element":"span"},{"text":"In ","element":"span"},{"text":"Section 4 ","element":"span"},{"text":"we show why ","element":"span"},{"href":"#id-38","text":"Theorem 2 ","element":"a"},{"text":"implies ","element":"span"},{"href":"#id-13","text":"Theorem 1. ","element":"a"},{"text":"All the remaining sections are for the purpose of proving ","element":"span"},{"href":"#id-38","text":"Theorem 2. ","element":"a"},{"text":"Because our ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"is very technical, instead of stating what the algorithm is right away, we decide to take a different path. In ","element":"span"},{"text":"Section 5, ","element":"span"},{"text":"we first state a lemma characterizing “what ","element":"span"},{"style":{"height":12.8},"width":42.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-17.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"looks like”. In ","element":"span"},{"href":"#id-39","text":"Section 6, ","element":"a"},{"text":"we provide a set of sufficient conditions which “look similar” to the characterization of ","element":"span"},{"style":{"height":12.8},"width":42.16,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-18.png","element":"img","alt":" h∗","inline":true},{"text":", and show that as long as these conditions are met, ","element":"span"},{"href":"#id-38","text":"Theorem 2-a ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-38","text":"2-b ","element":"a"},{"text":"follow easily. Finally, in ","element":"span"},{"text":"Section 7, ","element":"span"},{"text":"we state ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"and explain why it satisfies these sufficient conditions and why it runs in the aforementioned time.","element":"span"}]]},{"heading":"4 Theorem 2 implies Theorem 1","paragraphs":[[{"text":"In this section, we show that Theorem ","element":"span"},{"href":"#id-38","text":"2 ","element":"a"},{"text":"implies Theorem ","element":"span"},{"href":"#id-13","text":"1. ","element":"a"},{"text":"It relies on the following lemma (proved in ","element":"span"},{"href":"#id-40","text":"Appendix B) ","element":"a"},{"text":"regarding the sufficient condition for us to reach an ","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/6-19.png","element":"img","alt":" ε","inline":true},{"text":"-approximate local minimum.","element":"span"}],[{"id":"id-41","style":{"height":28.81},"width":815.92,"height":72.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-0.png","element":"img","alt":"Lemma 4.1. If mt(h∗) ≥ − ε3/2800√L and h′","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is an approximate minimizer of ","element":"span"},{"style":{"height":17.6},"width":111.69,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-1.png","element":"img","alt":" mt(h)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfying ","element":"span"},{"style":{"height":28.63},"width":876.46,"height":71.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-2.png","element":"img","alt":"∥h′∥ ≤ ∥h∗∥ + √ε4√L and ∥∇mt(h′)∥ ≤ ε2 ,","inline":true}],[{"style":{"fontStyle":"italic"},"text":"then we have that ","element":"span"},{"style":{"height":19.99},"width":1001.66,"height":49.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-3.png","element":"img","alt":" ∥∇f(xt + h′)∥ ≤ ε and λmin(∇2f(xt + h′)) ≥ −√Lε.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-13","style":{"fontStyle":"italic"},"text":"Theorem 1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"from ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"Theorem 2. ","element":"a"},{"text":"When ","element":"span"},{"href":"#id-12","text":"FastCubic ","element":"a"},{"text":"terminates, we have ","element":"span"},{"style":{"height":28.81},"width":429.28,"height":72.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-4.png","element":"img","alt":" mt(h′) > − ε3/2c√L; there-","inline":true}],[{"text":"fore, it satisfies ","element":"span"},{"style":{"height":28.81},"width":329.63,"height":72.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-5.png","element":"img","alt":" mt(h∗) ≥ − ε3/2800√L ","inline":true,"padRight":true},{"text":"according to ","element":"span"},{"href":"#id-38","text":"Theorem 2-a. ","element":"a"},{"text":"Combining this with ","element":"span"},{"href":"#id-38","text":"Theorem 2-b ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-41","text":"Corollary 4.1, ","element":"a"},{"text":"we conclude that in the last iteration of ","element":"span"},{"href":"#id-12","text":"FastCubic","element":"a"},{"text":", our output satisfies ","element":"span"},{"style":{"height":17.6},"width":295.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-6.png","element":"img","alt":" ∥∇f(xt+h′)∥ ≤","inline":true},{"style":{"height":19.98},"width":650.96,"height":49.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-7.png","element":"img","alt":"ε and λmin(∇2f(xt +h′)) ≥ −√Lε","inline":true},{"text":". This finishes the proof with respect to the accuracy conditions.","element":"span"}],[{"text":"As for the running time, in every iteration except for the last one, ","element":"span"},{"href":"#id-12","style":{"height":17.6},"width":524.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-8.png","element":"img","alt":" FastCubic satisfies mt(h′) ≤","inline":true},{"style":{"height":28.81},"width":203.59,"height":72.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-9.png","element":"img","alt":"−Ω� −ε3/2√L �","inline":true},{"text":". Therefore by ","element":"span"},{"href":"#id-42","text":"(3.1), ","element":"a"},{"text":"we must have decreased the objective by at least Ω","element":"span"},{"style":{"height":28.81},"width":275.22,"height":72.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-10.png","element":"img","alt":"� −ε3/2√L �in this","inline":true,"padRight":true},{"text":"round, and this cannot happen for more than ","element":"span"},{"style":{"height":27.84},"width":307.01,"height":69.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-11.png","element":"img","alt":" O� (f(x0)−f∗)√Lε3/2 �","inline":true},{"text":"iterations. The final running time of ","element":"span"},{"href":"#id-12","text":"FastCubic ","element":"a"},{"text":"follows from this bound together with ","element":"span"},{"href":"#id-38","text":"Theorem 2-c.","element":"a"}],[{"text":"Therefore, in the rest of the paper it suffices to study ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"and prove ","element":"span"},{"href":"#id-38","text":"Theorem 2.","element":"a"}]]},{"heading":"5 Characterization Lemma of the Minimizer h∗","paragraphs":[[{"text":"For notational simplicity in this and the subsequent sections we focus on the following problem:","element":"span"}],[{"style":{"width":"61%"},"width":1147,"height":163,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-12.png","element":"img"}],[{"text":"Recall from the previous section that we have denoted by ","element":"span"},{"style":{"height":12.8},"width":42.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-13.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"an arbitrary minimizer of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"). We have the following lemma which characterizes ","element":"span"},{"style":{"height":12.8},"width":42.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-14.png","element":"img","alt":" h∗","inline":true},{"text":": (a variant of this lemma has appeared in ","element":"span"},{"href":"#id-27","referenceIndex":8,"text":"[8]","element":"a"},{"text":", and we prove it in the appendix for the sake of completeness)","element":"span"}],[{"id":"id-43","style":{"height":12.8},"width":515.92,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-15.png","element":"img","alt":"Lemma 5.1. We have h∗ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a minimizer of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"if and only if there exists ","element":"span"},{"style":{"height":14.8},"width":315.49,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-16.png","element":"img","alt":" λ∗ ≥ 0 such that","inline":true}],[{"style":{"width":"58%"},"width":1098,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-17.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"The objective value in this case is given by","element":"span"}],[{"style":{"width":"43%"},"width":824,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-18.png","element":"img"}],[{"text":"The following corollary comes from ","element":"span"},{"href":"#id-43","text":"Lemma 5.1 ","element":"a"},{"text":"and its proof:","element":"span"}],[{"id":"id-44","style":{"height":16},"width":582.07,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-19.png","element":"img","alt":"Corollary 5.2. The value λ∗ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"in ","element":"span"},{"href":"#id-43","style":{"fontStyle":"italic"},"text":"Lemma 5.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"is unique, and for every ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-20.png","element":"img","alt":" λ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfying ","element":"span"},{"style":{"height":15.6},"width":292.83,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-21.png","element":"img","alt":" H + λI ≻ 0, we","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"have","element":"span"}],[{"style":{"width":"77%"},"width":1444,"height":84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-22.png","element":"img"}],[{"text":"In the above characterization, we have a crude upper bound on ","element":"span"},{"style":{"height":12.8},"width":56.4,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-23.png","element":"img","alt":" λ∗:","inline":true}],[{"id":"id-53","style":{"height":21.59},"width":1360.17,"height":53.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-24.png","element":"img","alt":"Proposition 5.3. We have λ∗ ≤ B ≜ max�2L2 +�L∥g∥, 1�with λ∗ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"defined in ","element":"span"},{"href":"#id-43","style":{"fontStyle":"italic"},"text":"Lemma 5.1.","element":"a"}],[{"style":{"height":27.65},"width":1260.25,"height":69.14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-25.png","element":"img","alt":"Proof. We have L∥(H + BI)−1g∥ ≤ L∥g∥λmin(H+BI) ≤ L∥g∥B−L2 < 2B","inline":true,"padRight":true},{"text":"and therefore ","element":"span"},{"style":{"height":14.8},"width":309,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/7-26.png","element":"img","alt":" λ∗ ≤ B due to","inline":true,"padRight":true},{"id":"id-39","href":"#id-44","text":"Corollary 5.2.","element":"a"}]]},{"heading":"6 Suﬃcient Conditions for Theorem 2-a and 2-b","paragraphs":[[{"text":"Without worrying about the design of ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"at this moment, let us first state a set of sufficient conditions under which the assumptions in ","element":"span"},{"href":"#id-38","text":"Theorem 2-a ","element":"a"},{"text":"can be satisfied.","element":"span"}],[{"id":"id-45","style":{"fontWeight":"bold"},"text":"Main Lemma 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider an algorithm that outputs a real ","element":"span"},{"style":{"height":19.53},"width":671.16,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-0.png","element":"img","alt":" λ ∈ [0, 2B], a vector v ∈ Rd, and a","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"unit vector ","element":"span"},{"style":{"height":17.76},"width":182.53,"height":44.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-1.png","element":"img","alt":" vmin ∈ Rd","inline":true},{"style":{"fontStyle":"italic"},"text":". Additionally, suppose numbers ","element":"span"},{"style":{"height":14.8},"width":145.12,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-2.png","element":"img","alt":" κ, ˜ε ≥ 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfying the following conditions:","element":"span"}],[{"id":"id-46","style":{"width":"75%"},"width":1418,"height":172,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Moreover, suppose that the outputs ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":183.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-4.png","element":"img","alt":"λ, v, vmin)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfy one of the following two cases:","element":"span"}],[{"style":{"width":"84%"},"width":1585,"height":506,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-5.png","element":"img"}],[{"text":"Let us compare such sufficient conditions to the characterization ","element":"span"},{"href":"#id-43","text":"Lemma 5.1.","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"In Case 1, up to a very small error ˜","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-6.png","element":"img","alt":"ε","inline":true},{"text":", we have essentially found a vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"text":"that satisfies ","element":"span"},{"style":{"height":21.7},"width":659.83,"height":54.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-7.png","element":"img","alt":"v ≈ −(H + λI)−1g and ∥v∥ ≈ 2λL .","inline":true,"padRight":true},{"text":"Therefore, this ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"text":"should be close to ","element":"span"},{"style":{"height":12.8},"width":42.16,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-8.png","element":"img","alt":" h∗","inline":true,"padRight":true},{"text":"for obvious reason. ","element":"span"},{"text":"(This is the simple case.)","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"In Case 2, we have only found a vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"text":"that satisfies ","element":"span"},{"style":{"height":21.7},"width":690.74,"height":54.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-9.png","element":"img","alt":" v ≈ −(H + λI)−1g and ∥v∥ ≲ 2λL .","inline":true,"padRight":true},{"text":"In this case, we also compute an approximate lowest eigenvector ","element":"span"},{"style":{"height":17.6},"width":497.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-10.png","element":"img","alt":" vmin of λmin(H) up to an","inline":true,"padRight":true},{"text":"additive 1","element":"span"},{"style":{"height":17.6},"width":90.49,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-11.png","element":"img","alt":"/10κ","inline":true,"padRight":true},{"text":"accuracy (see case 2-c). We will make sure that, as long as the conditions in 2-a hold, then either ","element":"span"},{"style":{"height":22.3},"width":180.16,"height":55.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-12.png","element":"img","alt":" v or λvmin2L","inline":true,"padRight":true},{"text":"will be an approximate minimizer for ","element":"span"},{"style":{"height":17.6},"width":123.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-13.png","element":"img","alt":" mt(h).","inline":true}],[{"style":{"width":"23%"},"width":437,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-14.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-45","style":{"fontStyle":"italic"},"text":"Main Lemma 1. ","element":"a"},{"text":"We first consider Case 1. According to ","element":"span"},{"href":"#id-44","text":"Corollary 5.2, ","element":"a"},{"text":"if ˜","element":"span"},{"style":{"height":12.8},"width":314.32,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-15.png","element":"img","alt":"ε = 0 then v is a","inline":true,"padRight":true},{"text":"minimizer of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"). The following claim extends this argument to the setting when ˜","element":"span"},{"style":{"height":12.4},"width":112.39,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-16.png","element":"img","alt":"ε > 0:","inline":true}],[{"id":"id-74","style":{"height":16.4},"width":455.87,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-17.png","element":"img","alt":"Claim 6.1. If λ and v","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfy Case 1 and ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-18.png","element":"img","alt":"ε","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfies ","element":"span"},{"href":"#id-46","style":{"fontStyle":"italic"},"text":"(6.1), ","element":"a"},{"style":{"fontStyle":"italic"},"text":"then ","element":"span"},{"style":{"height":21.76},"width":452.77,"height":54.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-19.png","element":"img","alt":" m(v) ≤ m(h∗) + 1250κ3L2","inline":true}],[{"text":"From the above lemma it follows that either ","element":"span"},{"style":{"height":21.76},"width":950.77,"height":54.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-20.png","element":"img","alt":" m(h∗) ≥ − 8κ3L2 otherwise m(h∗) ≥ 1.1m(v) which","inline":true,"padRight":true},{"text":"satisfies the conditions of the theorem.","element":"span"}],[{"text":"We now consider Case 2, and in this case we make the following two claims:","element":"span"}],[{"id":"id-77","style":{"width":"87%"},"width":1634,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-21.png","element":"img"}],[{"id":"id-78","style":{"fontWeight":"bold"},"text":"Claim 6.3. ","element":"span"},{"style":{"height":21.76},"width":901.44,"height":54.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-22.png","element":"img","alt":"If λmin(H) ≥ − 1κ then m(h∗) ≥ 2m(v) − 16κ3L2 .","inline":true}],[{"href":"#id-45","text":"Lemma 1 ","element":"a"},{"text":"now follows from the two claims because we can output the vector ","element":"span"},{"style":{"height":12.8},"width":41.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-23.png","element":"img","alt":" h′ ","inline":true,"padRight":true},{"text":"which has the lowest value of ","element":"span"},{"style":{"height":17.6},"width":96.46,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-24.png","element":"img","alt":" m(h′","inline":true},{"text":") amongst the two choices ","element":"span"},{"style":{"height":20.97},"width":316.7,"height":52.42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-25.png","element":"img","alt":" h′ ∈ �v, λvmin2d �.","inline":true,"padRight":true},{"text":"This satisfies either ","element":"span"},{"style":{"height":17.6},"width":172.45,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-26.png","element":"img","alt":" m(h∗) ≥","inline":true,"padRight":true},{"text":"3000","element":"span"},{"style":{"height":21.76},"width":482.57,"height":54.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/8-27.png","element":"img","alt":"m(h′) or m(h∗) ≥ − 32κ3L2 .","inline":true}],[{"text":"The missing proofs of the three claims are deferred to ","element":"span"},{"href":"#id-47","text":"Appendix D.","element":"a"}],[{"text":"The next main lemma shows that, under the same sufficient conditions as ","element":"span"},{"href":"#id-45","text":"Main Lemma 1, ","element":"a"},{"text":"we also have that ","element":"span"},{"href":"#id-38","text":"Theorem 2-b ","element":"a"},{"text":"holds. (Its proof is contained in ","element":"span"},{"href":"#id-48","text":"Appendix E.","element":"a"},{"text":")","element":"span"}],[{"id":"id-55","style":{"fontWeight":"bold"},"text":"Main Lemma 2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"In the same setting as ","element":"span"},{"href":"#id-45","style":{"fontStyle":"italic"},"text":"Main Lemma 1, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"suppose ","element":"span"},{"style":{"height":28.81},"width":541.56,"height":72.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-0.png","element":"img","alt":" m(h∗) ≥ − ε3/2300√L. Then the","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"output vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"style":{"fontStyle":"italic"},"text":"satisfies the following conditions:","element":"span"}],[{"style":{"width":"50%"},"width":947,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-1.png","element":"img"}]]},{"heading":"7 Main Algorithms for Theorem 2","paragraphs":[[{"text":"We are now ready to state our main algorithm ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"and sketch why it satisfies the sufficient conditions in ","element":"span"},{"href":"#id-45","text":"Main Lemma 1. ","element":"a"},{"text":"As described in ","element":"span"},{"href":"#id-32","text":"Algorithm 2, ","element":"a"},{"text":"our algorithm starts with a very large choice ","element":"span"},{"style":{"height":15.02},"width":168.14,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-2.png","element":"img","alt":" λ0 ← 2B","inline":true,"padRight":true},{"text":"and decreases it gradually. At each iteration ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":", it computes an approximate inverse ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"text":"satisfying ","element":"span"},{"style":{"height":19.14},"width":471.92,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-3.png","element":"img","alt":" ∥v + (H + λiI)−1g∥ ≤ ˜ε","inline":true,"padRight":true},{"text":"with respect to the current ","element":"span"},{"style":{"height":15.02},"width":37.46,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-4.png","element":"img","alt":" λi","inline":true},{"text":". Then there are three cases, depending on whether ","element":"span"},{"style":{"height":17.6},"width":96.26,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-5.png","element":"img","alt":" L∥v∥","inline":true,"padRight":true},{"text":"is approximately equal to, larger than, or smaller than 2","element":"span"},{"style":{"height":15.02},"width":37.46,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-6.png","element":"img","alt":"λi","inline":true},{"text":". At a high level, if it is “equal”, then we have met Case 1 in ","element":"span"},{"href":"#id-45","text":"Main Lemma 1; ","element":"a"},{"text":"if it is “larger”, then we can binary search the correct value of ","element":"span"},{"style":{"height":12.8},"width":42.47,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-7.png","element":"img","alt":" λ∗ ","inline":true,"padRight":true},{"text":"in the interval [","element":"span"},{"style":{"height":15.6},"width":138.76,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-8.png","element":"img","alt":"λi, λi−1","inline":true},{"text":"]; and if it is “smaller”, then we need to compute an approximate eigenvector and carefully choose the next point ","element":"span"},{"style":{"height":16.22},"width":94.3,"height":40.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-9.png","element":"img","alt":" λi+1.","inline":true}],[{"text":"We state our main lemma below regarding the correctness and running time of ","element":"span"},{"href":"#id-32","text":"FastCubicMin","element":"a"},{"text":".","element":"span"}],[{"id":"id-49","style":{"fontWeight":"bold"},"text":"Main Lemma 3. ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"style":{"fontStyle":"italic"},"text":"in ","element":"span"},{"href":"#id-32","style":{"fontStyle":"italic"},"text":"Algorithm 2 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"outputs a real ","element":"span"},{"style":{"height":19.54},"width":638.24,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-10.png","element":"img","alt":" λ ∈ [0, 2B], a vector v ∈ Rd, and","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"a unit vector ","element":"span"},{"style":{"height":17.76},"width":183.37,"height":44.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-11.png","element":"img","alt":" vmin ∈ Rd ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfying one of the two sufficient conditions in ","element":"span"},{"href":"#id-45","style":{"fontStyle":"italic"},"text":"Main Lemma 1. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"We also have that the procedure can be implemented in a total running time of","element":"span"}],[{"style":{"width":"97%"},"width":1821,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-12.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Here ","element":"span"},{"text":"˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"O ","element":"span"},{"style":{"fontStyle":"italic"},"text":"hides logarithmic factors in ","element":"span"},{"style":{"height":15.6},"width":252.15,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-13.png","element":"img","alt":" L, L2, κ, d, B.","inline":true}],[{"text":"We prove the correctness half of ","element":"span"},{"href":"#id-49","text":"Main Lemma 3, ","element":"a"},{"text":"and defer its running time analysis to ","element":"span"},{"href":"#id-50","text":"Appendix G.","element":"a"}],[{"style":{"fontWeight":"bold"},"text":"7.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Correctness Half of Main Lemma ","element":"span"},{"href":"#id-49","style":{"fontWeight":"bold"},"text":"3","element":"a"}],[{"text":"We will now establish the correctness of our algorithm. We first observe that the ","element":"span"},{"href":"#id-32","text":"BinarySearch ","element":"a"},{"text":"subroutine returns (","element":"span"},{"style":{"height":16.8},"width":109,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-14.png","element":"img","alt":"λ, v, ∅","inline":true},{"text":") that satisfies Case 1 of ","element":"span"},{"href":"#id-45","text":"Main Lemma 1.","element":"a"}],[{"id":"id-54","style":{"fontWeight":"bold"},"text":"Fact 7.1. ","element":"span"},{"href":"#id-32","text":"BinarySearch ","element":"a"},{"style":{"fontStyle":"italic"},"text":"outputs a pair ","element":"span"},{"style":{"height":12.8},"width":339.66,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-15.png","element":"img","alt":" λ and v such that","inline":true}],[{"style":{"width":"73%"},"width":1380,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-16.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"The latter is guaranteed by line ","element":"span"},{"href":"#id-32","text":"3 ","element":"a"},{"text":"in ","element":"span"},{"href":"#id-32","text":"BinarySearch","element":"a"},{"text":", and the former is implied by the latter because","element":"span"}],[{"style":{"width":"87%"},"width":1634,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-17.png","element":"img"}],[{"text":"We also establish the following invariants regarding the values ","element":"span"},{"style":{"height":15.02},"width":37.46,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-18.png","element":"img","alt":" λi","inline":true},{"text":". (Proof in ","element":"span"},{"href":"#id-51","text":"Appendix F.","element":"a"},{"text":")","element":"span"}],[{"id":"id-52","style":{"fontWeight":"bold"},"text":"Lemma 7.2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The following statements hold for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"style":{"fontStyle":"italic"},"text":"until ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"style":{"fontStyle":"italic"},"text":"terminates","element":"span"}],[{"style":{"width":"55%"},"width":1042,"height":191,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-19.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Moreover when ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"style":{"fontStyle":"italic"},"text":"terminates at ","element":"span"},{"href":"#id-32","style":{"fontStyle":"italic"},"text":"Line 20 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"we have ","element":"span"},{"style":{"height":21.3},"width":349.86,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-20.png","element":"img","alt":" λi + λmin(H) ≤ 1κ.","inline":true}],[{"text":"We now prove the output (","element":"span"},{"href":"#id-32","style":{"height":17.6},"width":489.77,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/9-21.png","element":"img","alt":"λ, v, vmin) of FastCubicMin","inline":true,"padRight":true},{"text":"satisfies the sufficient conditions of ","element":"span"},{"href":"#id-45","text":"Main Lemma 1.","element":"a"}],[{"id":"id-32","style":{"width":"100%"},"width":1874,"height":2518,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/10-0.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Correctness Proof of ","element":"span"},{"href":"#id-49","style":{"fontStyle":"italic"},"text":"Main Lemma 3. ","element":"a"},{"text":"We carefully verify these sufficient conditions:","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"href":"#id-52","text":"Lemma 7.2 ","element":"a"},{"text":"implies ","element":"span"},{"style":{"height":17.6},"width":226.98,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-0.png","element":"img","alt":" λi ∈ [0, 2B].","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"style":{"height":21.3},"width":374.08,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-1.png","element":"img","alt":" λi + λmin(H) ≥ 310κ ","inline":true,"padRight":true},{"text":"from ","element":"span"},{"href":"#id-52","text":"Lemma 7.2 ","element":"a"},{"text":"implies ","element":"span"},{"style":{"height":19.14},"width":386.28,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-2.png","element":"img","alt":" ∥(H + λiI)−1∥ ≤ 4κ","inline":true},{"text":". It is now immediate that ","element":"span"},{"text":"the choice of ˜","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-3.png","element":"img","alt":"ε","inline":true,"padRight":true},{"text":"on ","element":"span"},{"href":"#id-32","text":"Line 2 ","element":"a"},{"text":"satisfies the Condition ","element":"span"},{"href":"#id-46","text":"(6.1) ","element":"a"},{"text":"in the assumption of ","element":"span"},{"href":"#id-45","text":"Main Lemma 1.","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"Since ˜","element":"span"},{"style":{"height":21.3},"width":643.5,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-4.png","element":"img","alt":"ε ≤ 110κL and λi + λmin(H) ≥ 310κ ","inline":true,"padRight":true},{"text":"it follows that (","element":"span"},{"style":{"height":19.14},"width":402.12,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-5.png","element":"img","alt":"H + (λi − L˜ε)I)−1 ≻","inline":true,"padRight":true},{"text":"0 which proves ","element":"span"},{"text":"Condition ","element":"span"},{"href":"#id-46","text":"(6.2) ","element":"a"},{"text":"in ","element":"span"},{"href":"#id-45","text":"Main Lemma 1.","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"We now verify Case 1 and 2 in the assumption of ","element":"span"},{"href":"#id-45","text":"Main Lemma 1. ","element":"a"},{"text":"At the beginning of the algorithm, our choice ","element":"span"},{"style":{"height":15.02},"width":162.54,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-6.png","element":"img","alt":" λ0 = 2B","inline":true,"padRight":true},{"text":"ensures (using ","element":"span"},{"href":"#id-53","text":"Proposition 5.3) ","element":"a"},{"text":"that ","element":"span"},{"style":{"height":19.14},"width":473.29,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-7.png","element":"img","alt":" L∥(H + λ0I)−1g∥ < 2λ0.","inline":true,"padRight":true},{"text":"Let us now consider the various places where the algorithm outputs:","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"– ","element":"span"},{"text":"If ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"terminates at ","element":"span"},{"href":"#id-32","text":"Line 7, ","element":"a"},{"text":"then we have ","element":"span"},{"style":{"height":19.14},"width":452.11,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-8.png","element":"img","alt":" ∥v + (H + λiI)−1g∥ ≤ ˜ε","inline":true,"padRight":true},{"text":"and additionally","element":"span"}],[{"style":{"width":"83%"},"width":1558,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-9.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"– ","element":"span"},{"text":"If ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"terminates at ","element":"span"},{"href":"#id-32","text":"Line 9, ","element":"a"},{"text":"then ","element":"span"},{"style":{"height":19.14},"width":728.09,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-10.png","element":"img","alt":" L∥(H + λiI)−1g∥ > L∥v∥ − L˜ε ≥ 2λi .","inline":true,"padRight":true},{"text":"Obviously, we must have ","element":"span"},{"style":{"height":14.4},"width":64.32,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-11.png","element":"img","alt":" i ≥","inline":true,"padRight":true},{"text":"1 in this case because ","element":"span"},{"style":{"height":19.14},"width":462.42,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-12.png","element":"img","alt":" L∥(H + λ0I)−1g∥ < 2λ0","inline":true},{"text":". Therefore, ","element":"span"},{"href":"#id-32","text":"Line 10 ","element":"a"},{"text":"must have been reached at the previous iteration, so it implies ","element":"span"},{"style":{"height":19.14},"width":594.68,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-13.png","element":"img","alt":" L∥(H + λi−1I)−1g∥ < 2λi−1 .","inline":true,"padRight":true},{"text":"Together, these two imply that we can call ","element":"span"},{"href":"#id-32","style":{"height":17.6},"width":509.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-14.png","element":"img","alt":" BinarySearch with (λi−1, λi","inline":true},{"text":"). Owing to ","element":"span"},{"href":"#id-54","text":"Fact 7.1, ","element":"a"},{"text":"the subroutine outputs a pair (","element":"span"},{"style":{"height":15.6},"width":65.88,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-15.png","element":"img","alt":"λ, v","inline":true},{"text":") satisfying the Case 1 requirement of ","element":"span"},{"href":"#id-45","text":"Main Lemma 1.","element":"a"}],[{"style":{"width":"96%"},"width":1804,"height":280,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-16.png","element":"img"}],[{"text":"In sum, we have verified that all the assumptions of ","element":"span"},{"href":"#id-45","text":"Main Lemma 1 ","element":"a"},{"text":"hold.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Final Proof of Theorem ","element":"span"},{"href":"#id-38","style":{"fontWeight":"bold"},"text":"2. ","element":"a"},{"href":"#id-38","text":"Theorem 2 ","element":"a"},{"text":"is a direct corollary of our main lemmas. ","element":"span"},{"href":"#id-49","text":"Main Lemma 3 ","element":"a"},{"text":"ensures that the assumptions of ","element":"span"},{"href":"#id-45","text":"Main Lemma 1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-55","text":"Main Lemma 2 ","element":"a"},{"text":"both hold. Now, using the special choice of ","element":"span"},{"href":"#id-12","style":{"height":13.2},"width":273.95,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/11-17.png","element":"img","alt":" κ in FastCubic","inline":true},{"text":", ","element":"span"},{"href":"#id-38","text":"Theorem 2-a ","element":"a"},{"text":"immediately comes from ","element":"span"},{"href":"#id-45","text":"Main Lemma 1; ","element":"a"},{"href":"#id-38","text":"Theorem 2-b ","element":"a"},{"text":"immediately comes from ","element":"span"},{"href":"#id-55","text":"Main Lemma 2; ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-38","text":"Theorem 2-c ","element":"a"},{"text":"immediately comes from ","element":"span"},{"href":"#id-49","text":"Main Lemma 3. ","element":"a"},{"text":"This finishes the proof of ","element":"span"},{"href":"#id-38","text":"Theorem 2.","element":"a"}]]},{"heading":"Acknowledgements","paragraphs":[[{"text":"We thank Ben Recht for very helpful suggestions and corrections to a previous version. Z. Allen-Zhu is supported by an NSF Grant, no. CCF-1412958, and a Microsoft Research Grant, no. 0518584. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NSF or Microsoft.","element":"span"}]]},{"heading":"Appendix","paragraphs":[[{"id":"id-16","style":{"fontWeight":"bold"},"text":"A ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Computing Hessian-Vector Product in Linear Time","element":"span"}],[{"text":"In this section we sketch the intuition regarding why Hessian-vector products can be computed in linear time in many interesting (especially machine learning) problems. We start by showing that the gradient can be computed in linear time. The algorithm is often referred to as back-propagation, which dates back to Werbos’s PhD thesis ","element":"span"},{"href":"#id-56","referenceIndex":35,"text":"[35]","element":"a"},{"text":", and has been popularized by Rumelharte ","element":"span"},{"style":{"fontStyle":"italic"},"text":"et al. ","element":"span"},{"href":"#id-57","referenceIndex":31,"text":"[31] ","element":"a"},{"text":"for training neural networks.","element":"span"}],[{"id":"id-59","style":{"fontWeight":"bold"},"text":"Claim A.1 ","element":"span"},{"text":"(back-propagation, informally stated)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose a real-valued function ","element":"span"},{"style":{"height":19.14},"width":292.52,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-0.png","element":"img","alt":" f : Rd → R can","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be evaluated by a differentiable circuit of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"style":{"fontStyle":"italic"},"text":". Then, the gradient ","element":"span"},{"style":{"height":16.4},"width":61.38,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-1.png","element":"img","alt":" ∇f","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"can be computed in time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"N ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"(using a circuit of size ","element":"span"},{"href":"#id-58","style":{"height":19.14},"width":252.71,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-2.png","element":"img","alt":" O(N + d)). 4","inline":true}],[{"text":"The claim follows from simple induction and chain-rule, and is left to the readers. In the training of neural networks, often the size of circuits that computes the objective ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is proportional to (or equal to) the number of parameters ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":". Thus the gradient ","element":"span"},{"style":{"height":16.4},"width":61.38,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-3.png","element":"img","alt":" ∇f","inline":true,"padRight":true},{"text":"can be computed in time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":") using a circuit of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":".","element":"span"}],[{"text":"Next, we consider computing ","element":"span"},{"style":{"height":19.54},"width":952.85,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-4.png","element":"img","alt":" ∇2f(x) · v where v ∈ Rd. Let g(x) := ⟨∇f(x), v⟩","inline":true,"padRight":true},{"text":"be a function from ","element":"span"},{"style":{"height":15.54},"width":150.87,"height":38.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-5.png","element":"img","alt":" Rd to R","inline":true},{"text":". Then, we see that if suffices to compute the gradient of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":", since","element":"span"}],[{"style":{"width":"20%"},"width":385,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-6.png","element":"img"}],[{"text":"We observe that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":") can be evaluated in linear time using circuit of size ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":") since we’ve shown ","element":"span"},{"style":{"height":17.6},"width":218.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-7.png","element":"img","alt":"∇f(x) can.","inline":true,"padRight":true},{"text":"Thus, using ","element":"span"},{"href":"#id-59","text":"Claim A.1 ","element":"a"},{"text":"again on function ","element":"span"},{"style":{"height":18.74},"width":72.79,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-8.png","element":"img","alt":" g, 5","inline":true},{"text":"we conclude that ","element":"span"},{"style":{"height":17.6},"width":100.75,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-9.png","element":"img","alt":" ∇g(x","inline":true},{"text":") can also be computed in linear time.","element":"span"}],[{"id":"id-40","style":{"fontWeight":"bold"},"text":"B ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-60","style":{"fontWeight":"bold"},"text":"B.1 ","element":"a"},{"style":{"fontWeight":"bold"},"text":"and Corollary ","element":"span"},{"href":"#id-41","style":{"fontWeight":"bold"},"text":"4.1","element":"a"}],[{"id":"id-60","style":{"height":15.94},"width":587.86,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-10.png","element":"img","alt":"Lemma B.1. For all h′ ∈ Rd","inline":true},{"style":{"fontStyle":"italic"},"text":", it satisfies","element":"span"}],[{"style":{"width":"100%"},"width":1876,"height":87,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-11.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-60","style":{"fontStyle":"italic"},"text":"Lemma B.1. ","element":"a"},{"text":"Let us denote by ","element":"span"},{"style":{"height":19.14},"width":558.65,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-12.png","element":"img","alt":" g = ∇f(xt) and H = ∇2f(xt","inline":true},{"text":") in this proof. We begin by proving the first order condition. Note that we have","element":"span"}],[{"style":{"width":"29%"},"width":555,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-13.png","element":"img"}],[{"text":"Recall ","element":"span"},{"style":{"height":12.8},"width":42.16,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-14.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"is a minimizer of argmin ","element":"span"},{"style":{"height":17.6},"width":94.54,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-15.png","element":"img","alt":" mt(h","inline":true},{"text":"). The characterization result in ","element":"span"},{"href":"#id-43","text":"Lemma 5.1 ","element":"a"},{"text":"shows ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H ","element":"span"},{"text":"+","element":"span"}],[{"id":"id-61","style":{"width":"99%"},"width":1865,"height":295,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-16.png","element":"img"}],[{"text":"They imply","element":"span"}],[{"id":"id-68","style":{"width":"82%"},"width":1546,"height":209,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-17.png","element":"img"}],[{"text":"where ","element":"span"},{"text":"x ","element":"span"},{"text":"uses ","element":"span"},{"href":"#id-61","text":"(B.1) ","element":"a"},{"text":"and ","element":"span"},{"text":"y ","element":"span"},{"text":"uses ","element":"span"},{"href":"#id-61","text":"(B.2).","element":"a"}],[{"text":"We compute the norm of the gradient at a point ","element":"span"},{"style":{"height":19.14},"width":445.37,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-18.png","element":"img","alt":" xt + h′ for any h′ ∈ Rd:","inline":true}],[{"id":"id-58","style":{"width":"93%"},"width":1758,"height":205,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/12-19.png","element":"img"}],[{"style":{"width":"81%"},"width":1530,"height":232,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-0.png","element":"img"}],[{"text":"where ","element":"span"},{"text":"z ","element":"span"},{"text":"follows from the Lipschitz continuity on the Hessian ","element":"span"},{"href":"#id-62","text":"(2.1). ","element":"a"},{"text":"This proves the first conclusion of the lemma.","element":"span"}],[{"text":"As for the second-order condition, we first note that for all ","element":"span"},{"style":{"height":15.94},"width":139.22,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-1.png","element":"img","alt":" h′ ∈ Rd","inline":true},{"text":", by the Lipschitz continuity on the Hessian ","element":"span"},{"href":"#id-62","text":"(2.1), ","element":"a"},{"text":"we have ","element":"span"},{"style":{"height":19.14},"width":663.86,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-2.png","element":"img","alt":" ∥∇2f(xt + h′) − ∇2f(xt)∥ ≤ L∥h′∥","inline":true},{"text":". However, this implies","element":"span"}],[{"id":"id-64","style":{"width":"73%"},"width":1376,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-3.png","element":"img"}],[{"text":"because if two matrices ","element":"span"},{"style":{"height":17.6},"width":573.74,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-4.png","element":"img","alt":" A and B satisfies ∥A−B∥ ≤ p","inline":true},{"text":", then it must satisfy","element":"span"},{"style":{"height":22.08},"width":460.33,"height":55.19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-5.png","element":"img","alt":"��λmin(A)−λmin(B)�� ≤ p","inline":true,"padRight":true},{"text":"as well. We consider two cases: if ","element":"span"},{"style":{"height":19.14},"width":318.56,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-6.png","element":"img","alt":" λmin(∇2f(xt)) ≥","inline":true,"padRight":true},{"text":"0, then we have","element":"span"}],[{"id":"id-66","style":{"width":"65%"},"width":1230,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-7.png","element":"img"}],[{"text":"Otherwise, we consider the case where ","element":"span"},{"style":{"height":19.14},"width":743.69,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-8.png","element":"img","alt":" λmin(∇2f(xt)) = λmin(H) < 0. Let νd","inline":true,"padRight":true},{"text":"be the normalized eigenvector corresponding to ","element":"span"},{"style":{"height":17.6},"width":140.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-9.png","element":"img","alt":" λmin(H","inline":true},{"text":"), and define","element":"span"}],[{"style":{"width":"30%"},"width":574,"height":93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-10.png","element":"img"}],[{"text":"We calculate ","element":"span"},{"style":{"height":20.61},"width":94.54,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-11.png","element":"img","alt":" mt(˜h","inline":true},{"text":") as follows:","element":"span"}],[{"id":"id-63","style":{"width":"96%"},"width":1808,"height":214,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-12.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":18.52},"width":758.88,"height":46.29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-13.png","element":"img","alt":" x uses ν⊤d Hνd = λmin(H) < 0, and y","inline":true,"padRight":true},{"text":"uses the assumption that ","element":"span"},{"style":{"height":17.6},"width":209.77,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-14.png","element":"img","alt":" λmin(H) <","inline":true,"padRight":true},{"text":"0. Since by ","element":"span"},{"text":"definition ","element":"span"},{"style":{"height":20.61},"width":283.36,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-15.png","element":"img","alt":" mt(h∗) ≤ mt(˜h","inline":true},{"text":"), we can deduce from inequality ","element":"span"},{"href":"#id-63","text":"(B.7) ","element":"a"},{"text":"that","element":"span"}],[{"id":"id-65","style":{"width":"75%"},"width":1422,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-16.png","element":"img"}],[{"text":"Now we put together inequalities ","element":"span"},{"href":"#id-64","text":"(B.5) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-65","text":"(B.8), ","element":"a"},{"text":"and obtain","element":"span"}],[{"id":"id-67","style":{"width":"76%"},"width":1437,"height":119,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-17.png","element":"img"}],[{"text":"Combining ","element":"span"},{"href":"#id-66","text":"(B.6) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-67","text":"(B.9) ","element":"a"},{"text":"we finish the proof of ","element":"span"},{"href":"#id-60","text":"Lemma B.1.","element":"a"}],[{"href":"#id-41","style":{"height":28.81},"width":861.24,"height":72.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-18.png","element":"img","alt":"Corollary 4.1. If mt(h∗) ≥ − ε3/2800√L and h′","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is an approximate minimizer of ","element":"span"},{"style":{"height":17.6},"width":111.69,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-19.png","element":"img","alt":" mt(h)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfying ","element":"span"},{"style":{"height":28.63},"width":876.46,"height":71.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-20.png","element":"img","alt":"∥h′∥ ≤ ∥h∗∥ + √ε4√L and ∥∇mt(h′)∥ ≤ ε2 ,","inline":true}],[{"style":{"fontStyle":"italic"},"text":"then we have that ","element":"span"},{"style":{"height":19.98},"width":1001.66,"height":49.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-21.png","element":"img","alt":" ∥∇f(xt + h′)∥ ≤ ε and λmin(∇2f(xt + h′)) ≥ −√Lε.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-41","style":{"fontStyle":"italic"},"text":"Corollary 4.1. ","element":"a"},{"text":"First of all, our assumption that ","element":"span"},{"style":{"height":28.81},"width":334.8,"height":72.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-22.png","element":"img","alt":" mt(h∗) ≥ − ε3/2800√L","inline":true},{"text":", along with inequality","element":"span"}],[{"href":"#id-68","text":"(B.3), ","element":"a"},{"text":"tells us that ","element":"span"},{"style":{"height":28.63},"width":220.89,"height":71.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-23.png","element":"img","alt":" ∥h∗∥ ≤ √ε4√L","inline":true},{"text":". This, together with our assumption on ","element":"span"},{"style":{"height":28.63},"width":486.17,"height":71.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-24.png","element":"img","alt":" ∥h′∥, implies ∥h′∥ ≤ √ε2√L.","inline":true,"padRight":true},{"text":"Since we also assume ","element":"span"},{"style":{"height":19.22},"width":282.89,"height":48.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-25.png","element":"img","alt":" ∥∇mt(h′)∥ ≤ ε2","inline":true},{"text":", we have from ","element":"span"},{"href":"#id-60","text":"Lemma B.1 ","element":"a"},{"text":"that","element":"span"}],[{"style":{"width":"82%"},"width":1547,"height":156,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-26.png","element":"img"}],[{"style":{"height":43.53},"width":907.12,"height":108.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/13-27.png","element":"img","alt":"λmin(∇2f(xt + h′)) ≥ −�3L2 max{0, −mt(h∗)}2","inline":true}],[{"style":{"width":"1%"},"width":31,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-0.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"C ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-43","style":{"fontWeight":"bold"},"text":"5.1 ","element":"a"},{"style":{"fontWeight":"bold"},"text":"and Corollary ","element":"span"},{"href":"#id-44","style":{"fontWeight":"bold"},"text":"5.2","element":"a"}],[{"text":"We begin by proving a few lemmas that characterize the system of equations.","element":"span"}],[{"id":"id-69","style":{"fontWeight":"bold"},"text":"Lemma C.1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider the following system of equations/inequalities in variables ","element":"span"},{"style":{"height":15.6},"width":84.02,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-1.png","element":"img","alt":" λ, h:","inline":true}],[{"id":"id-70","style":{"width":"76%"},"width":1436,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-2.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"The following statements hold for any solution ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.6},"width":109.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-3.png","element":"img","alt":"λ′, h′)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"of the above system:","element":"span"}],[{"style":{"width":"97%"},"width":1825,"height":373,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-4.png","element":"img"}],[{"href":"#id-69","style":{"height":16.4},"width":849.1,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-5.png","element":"img","alt":"Proof of Lemma C.1. Note that H + λI ⪰","inline":true,"padRight":true},{"text":"0 ensures that for any solution ","element":"span"},{"style":{"height":15.6},"width":350.84,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-6.png","element":"img","alt":" λ′, we have λ′ ≥","inline":true},{"style":{"height":17.6},"width":203.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-7.png","element":"img","alt":"−λmin(H).","inline":true,"padRight":true},{"text":"Furthermore, for any ","element":"span"},{"style":{"height":17.6},"width":291.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-8.png","element":"img","alt":" λ′ > −λmin(H","inline":true},{"text":"), the corresponding ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"is uniquely defined by ","element":"span"},{"style":{"height":19.14},"width":625.32,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-9.png","element":"img","alt":"h = (H + λI)−1g since H + λ′I","inline":true,"padRight":true},{"text":"is invertible. If indeed ","element":"span"},{"style":{"height":17.6},"width":283.63,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-10.png","element":"img","alt":" λ′ = −λmin(H","inline":true},{"text":"), then we have that the equation (","element":"span"},{"style":{"height":17.6},"width":424.73,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-11.png","element":"img","alt":"H − λmin(H)I)h = −g","inline":true,"padRight":true},{"text":"has a solution. This implies that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"has no component in the null space of ","element":"span"},{"style":{"height":17.6},"width":269.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-12.png","element":"img","alt":" H − λmin(H)I","inline":true},{"text":", or equivalently that it has no component in the eigenspace corresponding to ","element":"span"},{"style":{"height":17.6},"width":140.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-13.png","element":"img","alt":" λmin(H","inline":true},{"text":"). We also have that every solution of (","element":"span"},{"style":{"height":17.6},"width":424.17,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-14.png","element":"img","alt":"H − λmin(H)I)h = −g","inline":true,"padRight":true},{"text":"is necessarily of the form","element":"span"}],[{"style":{"width":"26%"},"width":496,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-15.png","element":"img"}],[{"text":"for some ","element":"span"},{"style":{"height":16},"width":145.47,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-16.png","element":"img","alt":" γ and v","inline":true,"padRight":true},{"text":"in the lowest eigenspace of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H","element":"span"},{"text":".","element":"span"}],[{"text":"We will now prove the uniqueness of ","element":"span"},{"style":{"height":12.8},"width":41.47,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-17.png","element":"img","alt":" λ′ ","inline":true,"padRight":true},{"text":"by contradiction. Consider two distinct values of ","element":"span"},{"style":{"height":15.6},"width":112.57,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-18.png","element":"img","alt":" λ1, λ2","inline":true,"padRight":true},{"text":"that satisfy the system ","element":"span"},{"href":"#id-70","text":"(C.1). ","element":"a"},{"text":"If both ","element":"span"},{"style":{"height":17.6},"width":341.3,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-19.png","element":"img","alt":" λ1, λ2 > −λmin(H","inline":true},{"text":") we get that","element":"span"}],[{"style":{"width":"57%"},"width":1083,"height":87,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-20.png","element":"img"}],[{"text":"Now note that ","element":"span"},{"style":{"height":19.14},"width":277.93,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-21.png","element":"img","alt":" ∥(H + λI)−1g∥","inline":true,"padRight":true},{"text":"is a strictly decreasing function over the domain ","element":"span"},{"style":{"height":17.6},"width":367.55,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-22.png","element":"img","alt":" λ ∈ (−λmin(H), ∞)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":21.7},"width":36.95,"height":54.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-23.png","element":"img","alt":"2λL ","inline":true,"padRight":true},{"text":"is strictly increasing over the same domain. Therefore the above two equations cannot be ","element":"span"},{"text":"satisfied for two distinct ","element":"span"},{"style":{"height":17.6},"width":341.29,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-24.png","element":"img","alt":" λ1, λ2 > −λmin(H","inline":true},{"text":") which is a contradiction. Suppose now without loss of generality that ","element":"span"},{"style":{"height":17.6},"width":277.49,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-25.png","element":"img","alt":" λ1 = −λmin(H","inline":true},{"text":"). Then we have that the corresponding solution is of the form","element":"span"}],[{"style":{"width":"23%"},"width":437,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-26.png","element":"img"}],[{"text":"for some ","element":"span"},{"style":{"height":16},"width":147.62,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-27.png","element":"img","alt":" γ and v","inline":true,"padRight":true},{"text":"in the lowest eigenspace of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"has no component in the lowest eigenspace of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H","element":"span"},{"text":". It follows that ","element":"span"},{"style":{"height":19.14},"width":1173.06,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-28.png","element":"img","alt":" ∥(H − λmin(H)I)+g∥ ≥ ∥(H + λI)−1g∥ for any λ > −λmin(H","inline":true},{"text":"). By a similar argument as in the first case, we can now see that the following conditions,","element":"span"}],[{"style":{"width":"69%"},"width":1298,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-29.png","element":"img"}],[{"text":"cannot both be satisfied for ","element":"span"},{"style":{"height":17.6},"width":380.1,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-30.png","element":"img","alt":" λ2 > λ1 = −λmin(H","inline":true},{"text":"), giving us a contradiction. This finishes the proof of ","element":"span"},{"href":"#id-69","text":"Lemma C.1.","element":"a"}],[{"id":"id-71","style":{"height":17.6},"width":484.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-31.png","element":"img","alt":"Lemma C.2. Let (λ, h)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a solution of the system ","element":"span"},{"href":"#id-70","style":{"fontStyle":"italic"},"text":"(C.1). ","element":"a"},{"style":{"fontStyle":"italic"},"text":"Then we have that","element":"span"}],[{"style":{"width":"35%"},"width":657,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/14-32.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-71","style":{"fontStyle":"italic"},"text":"Lemma C.2. ","element":"a"},{"text":"By the definition of the system ","element":"span"},{"href":"#id-70","text":"(C.1), ","element":"a"},{"text":"any solution ","element":"span"},{"style":{"height":15.6},"width":69.88,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-0.png","element":"img","alt":" λ, h","inline":true,"padRight":true},{"text":"to the system should be such that there exists some ","element":"span"},{"style":{"height":16},"width":218.15,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-1.png","element":"img","alt":" γ such that","inline":true}],[{"style":{"width":"97%"},"width":1818,"height":446,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-2.png","element":"img"}],[{"text":"Equality ","element":"span"},{"text":"x ","element":"span"},{"text":"follows because (","element":"span"},{"style":{"height":17.6},"width":555.34,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-3.png","element":"img","alt":"H + λI)h = −g. Equality y","inline":true,"padRight":true},{"text":"follows because ","element":"span"},{"style":{"height":18.74},"width":434.58,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-4.png","element":"img","alt":" h = (H + λI)+g + γv0","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":21.7},"width":185.3,"height":54.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-5.png","element":"img","alt":" ∥h∥ = 2λL .","inline":true}],[{"href":"#id-43","style":{"height":12.8},"width":336.21,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-6.png","element":"img","alt":"Lemma 5.1. h∗ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a minimizer of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"if and only if there exists ","element":"span"},{"style":{"height":14.8},"width":315.5,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-7.png","element":"img","alt":" λ∗ ≥ 0 such that","inline":true}],[{"style":{"width":"58%"},"width":1098,"height":87,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"The objective value in this case is given by","element":"span"}],[{"style":{"width":"43%"},"width":824,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-9.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-43","style":{"fontStyle":"italic"},"text":"Lemma 5.1. ","element":"a"},{"text":"We first compute that","element":"span"}],[{"style":{"width":"86%"},"width":1611,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-10.png","element":"img"}],[{"text":"For the forward direction, suppose ","element":"span"},{"style":{"height":12.8},"width":42.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-11.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"is a minimizer of ","element":"span"},{"style":{"height":21.7},"width":426.95,"height":54.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-12.png","element":"img","alt":" m(h). Let λ∗ = L2 ∥h∗∥","inline":true},{"text":". Then, the necessary ","element":"span"},{"text":"conditions ","element":"span"},{"style":{"height":19.14},"width":550.06,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-13.png","element":"img","alt":" ∇m(h∗) = 0 and ∇2m(h∗) ⪰","inline":true,"padRight":true},{"text":"0 can be written as","element":"span"}],[{"id":"id-72","style":{"width":"99%"},"width":1869,"height":197,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-14.png","element":"img"}],[{"text":"Note that if ","element":"span"},{"href":"#id-72","style":{"height":17.6},"width":1325.89,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-15.png","element":"img","alt":" h∗ = 0, then the second inquality in (C.2) directly implies H + λ∗I ⪰","inline":true,"padRight":true},{"text":"0. Thus, we only need to focus on ","element":"span"},{"style":{"height":17.6},"width":937.45,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-16.png","element":"img","alt":" h∗ ̸= 0. We want to show that w⊤(H + λ∗I)w ≥","inline":true,"padRight":true},{"text":"0 for every ","element":"span"},{"style":{"height":18.34},"width":271.54,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-17.png","element":"img","alt":" w ∈ Rd. Now,","inline":true,"padRight":true},{"text":"if ","element":"span"},{"href":"#id-72","style":{"height":17.6},"width":1570.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-18.png","element":"img","alt":" w⊤h∗ = 0 then this trivially follows from (C.2), so it suffices to focus on those w","inline":true,"padRight":true},{"text":"that satisfies ","element":"span"},{"style":{"height":16.8},"width":196.93,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-19.png","element":"img","alt":"w⊤h∗ ̸= 0.","inline":true}],[{"text":"Since ","element":"span"},{"style":{"height":12.8},"width":174.39,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-20.png","element":"img","alt":" w and h∗ ","inline":true,"padRight":true},{"text":"are not orthogonal, there exists ","element":"span"},{"style":{"height":22.86},"width":885.42,"height":57.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-21.png","element":"img","alt":" γ ∈ R\\{0} such that ∥h∗ + γw∥ = ∥h∗∥. (This","inline":true,"padRight":true},{"text":"can be done by squaring both sides and solving the linear system in ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-22.png","element":"img","alt":" λ","inline":true},{"text":".) Squaring both sides we have","element":"span"}],[{"id":"id-73","style":{"width":"99%"},"width":1869,"height":620,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/15-23.png","element":"img"}],[{"style":{"width":"76%"},"width":1438,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-0.png","element":"img"}],[{"text":"where ","element":"span"},{"text":"x ","element":"span"},{"text":"and ","element":"span"},{"text":"y ","element":"span"},{"text":"follow from ","element":"span"},{"href":"#id-72","text":"(C.2) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-73","text":"(C.3), ","element":"a"},{"text":"respectively. Since ","element":"span"},{"style":{"height":12.8},"width":42.16,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-1.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"is a minimizer of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"), we immediately have","element":"span"}],[{"style":{"width":"46%"},"width":880,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-2.png","element":"img"}],[{"text":"and we conclude that (","element":"span"},{"style":{"height":17.6},"width":265.11,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-3.png","element":"img","alt":"H + λ∗I) ⪰ 0.","inline":true}],[{"text":"For the backward direction, we will make use ","element":"span"},{"href":"#id-69","text":"Lemma C.1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-71","text":"Lemma C.2. ","element":"a"},{"text":"First we note that the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":") is continuous and bounded from below, and there exists at least one minimizer ","element":"span"},{"style":{"height":12.8},"width":42.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-4.png","element":"img","alt":"h∗","inline":true},{"text":". Suppose now there exists a ","element":"span"},{"style":{"height":12.8},"width":42.46,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-5.png","element":"img","alt":" λ∗ ","inline":true,"padRight":true},{"text":"and a corresponding ","element":"span"},{"style":{"height":12.8},"width":42.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-6.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"such that (","element":"span"},{"style":{"height":15.6},"width":105.96,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-7.png","element":"img","alt":"λ∗, h∗","inline":true},{"text":") is a solution to the system ","element":"span"},{"href":"#id-70","text":"C.1. ","element":"a"},{"text":"The backward direction requires us to prove that ","element":"span"},{"style":{"height":12.8},"width":42.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-8.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"must be a minimizer of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"). By Lemma ","element":"span"},{"href":"#id-69","text":"C.1 ","element":"a"},{"text":"we get the following two cases.","element":"span"}],[{"text":"We prove the backward direction by showing that the conditions in Equation ","element":"span"},{"href":"#id-72","text":"C.2 ","element":"a"},{"text":"determine the minimizer up to its norm. To this end we will use ","element":"span"},{"href":"#id-69","text":"Lemma C.1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-71","text":"Lemma C.2.","element":"a"}],[{"text":"First we note that the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":") is continuous, bounded from below, and tends to +","element":"span"},{"style":{"height":8},"width":44,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-9.png","element":"img","alt":"∞","inline":true,"padRight":true},{"text":"when ","element":"span"},{"style":{"height":17.6},"width":180.74,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-10.png","element":"img","alt":" ∥h∥ → ∞","inline":true},{"text":", so there exists at least one minimizer ","element":"span"},{"style":{"height":12.8},"width":56.09,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-11.png","element":"img","alt":" h∗.","inline":true}],[{"text":"Suppose now there exists a ","element":"span"},{"style":{"height":12.8},"width":42.47,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-12.png","element":"img","alt":" λ∗ ","inline":true,"padRight":true},{"text":"and a corresponding ","element":"span"},{"style":{"height":12.8},"width":42.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-13.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"such that (","element":"span"},{"style":{"height":15.6},"width":105.96,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-14.png","element":"img","alt":"λ∗, h∗","inline":true},{"text":") is a solution to the system ","element":"span"},{"href":"#id-70","text":"(C.1). ","element":"a"},{"text":"The backward direction requires us to prove that ","element":"span"},{"style":{"height":12.8},"width":42.16,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-15.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"must be a minimizer of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"). By ","element":"span"},{"href":"#id-69","text":"Lemma C.1 ","element":"a"},{"text":"we get the following two cases.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"If ","element":"span"},{"style":{"height":17.6},"width":536.13,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-16.png","element":"img","alt":" λ∗ > −λmin(H) then (λ∗, h∗","inline":true},{"text":") is the only solution to the system ","element":"span"},{"href":"#id-70","text":"(C.1). ","element":"a"},{"text":"By the proof of the forward direction we see that any minimizer of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":") must satisfy system ","element":"span"},{"href":"#id-70","text":"(C.1) ","element":"a"},{"text":"and therefore ","element":"span"},{"style":{"height":12.8},"width":42.15,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-17.png","element":"img","alt":"h∗ ","inline":true,"padRight":true},{"text":"must be the minimizer.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"• ","element":"span"},{"text":"If above is not the case, then ","element":"span"},{"style":{"height":17.6},"width":449.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-18.png","element":"img","alt":" λ∗ = −λmin(H). Let h′ ","inline":true,"padRight":true},{"text":"be any minimizer of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"). ","element":"span"},{"href":"#id-69","text":"Lemma C.1 ","element":"a"},{"text":"and the proof of the forward direction ensures that (","element":"span"},{"style":{"height":15.6},"width":104.96,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-19.png","element":"img","alt":"λ∗, h′","inline":true},{"text":") also satisfies the system ","element":"span"},{"href":"#id-70","text":"(C.1). ","element":"a"},{"text":"By ","element":"span"},{"href":"#id-71","text":"Lemma C.2 ","element":"a"},{"text":"we get ","element":"span"},{"style":{"height":17.6},"width":271.05,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-20.png","element":"img","alt":" m(h∗) = m(h′","inline":true},{"text":") and therefore ","element":"span"},{"style":{"height":12.8},"width":42.16,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-21.png","element":"img","alt":" h∗ ","inline":true,"padRight":true},{"text":"is a minimizer too.","element":"span"}],[{"href":"#id-44","style":{"height":16},"width":594.94,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-22.png","element":"img","alt":"Corollary 5.2. This value λ∗ ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is unique, and for every ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-23.png","element":"img","alt":" λ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfying ","element":"span"},{"style":{"height":15.6},"width":396.62,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-24.png","element":"img","alt":" H + λI ≻ 0, we have","inline":true},{"style":{"height":36.19},"width":1452.53,"height":90.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-25.png","element":"img","alt":"∥(H + λI)−1g∥ > 2λL ⇐⇒ λ∗ > λ and ∥(H + λI)−1g∥ < 2λL ⇐⇒ λ∗ < λ .","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-44","style":{"fontStyle":"italic"},"text":"Corollary 5.2. ","element":"a"},{"text":"The uniqueness of ","element":"span"},{"style":{"height":12.8},"width":42.47,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-26.png","element":"img","alt":" λ∗ ","inline":true,"padRight":true},{"text":"follows from ","element":"span"},{"href":"#id-69","text":"Lemma C.1. ","element":"a"},{"text":"To prove the second part we first make some observations about the function","element":"span"}],[{"style":{"width":"27%"},"width":521,"height":87,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-27.png","element":"img"}],[{"text":"defined on the domain ","element":"span"},{"style":{"height":17.6},"width":358.97,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-28.png","element":"img","alt":" y ∈ (−λmin(H), ∞","inline":true},{"text":"). Note that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"y","element":"span"},{"text":") is continuous and strictly increasing over the domain and ","element":"span"},{"style":{"height":17.6},"width":405.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-29.png","element":"img","alt":" p(y) → ∞ as y → ∞.","inline":true}],[{"text":"The corollary requires us to show that","element":"span"}],[{"style":{"width":"53%"},"width":1007,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-30.png","element":"img"}],[{"text":"We begin by showing the first equivalence. To see the backward direction note that if ","element":"span"},{"style":{"height":13.2},"width":184.91,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-31.png","element":"img","alt":" λ∗ > λ >","inline":true},{"style":{"height":17.6},"width":174.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-32.png","element":"img","alt":"−λmin(H","inline":true},{"text":"), by the characterization of ","element":"span"},{"style":{"height":12.8},"width":42.47,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-33.png","element":"img","alt":" λ∗ ","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-69","text":"Lemma C.1 ","element":"a"},{"text":"we have that ","element":"span"},{"style":{"height":21.8},"width":510.64,"height":54.5,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-34.png","element":"img","alt":" ∥(H + λ∗I)−1g∥ = 2λ∗L i.e.","inline":true},{"style":{"height":17.6},"width":890.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-35.png","element":"img","alt":"p(λ∗) = 0 which implies that p(λ) < 0 as p(y","inline":true},{"text":") is a strictly increasing function. For the forward direction note that since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"y","element":"span"},{"text":") is continuous and strictly increasing we see that the range of the function contains [","element":"span"},{"style":{"height":17.6},"width":442.41,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-36.png","element":"img","alt":"p(λ), ∞). Since p(λ) <","inline":true,"padRight":true},{"text":"0 there must exist a ","element":"span"},{"style":{"height":17.6},"width":655.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-37.png","element":"img","alt":" λ∗ > λ such that p(λ∗) = 0 which","inline":true,"padRight":true},{"text":"by the characterization in ","element":"span"},{"href":"#id-69","text":"Lemma C.1 ","element":"a"},{"text":"finishes the proof.","element":"span"}],[{"text":"Now we will prove that ","element":"span"},{"style":{"height":17.6},"width":397.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-38.png","element":"img","alt":" p(λ) > 0 ⇐⇒ λ∗ < λ","inline":true},{"text":". To see the forward direction note that if ","element":"span"},{"style":{"height":14.8},"width":129.65,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-39.png","element":"img","alt":" λ∗ ≥ λ","inline":true,"padRight":true},{"text":"then ","element":"span"},{"style":{"height":17.6},"width":427.47,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/16-40.png","element":"img","alt":" p(λ∗) = 0 and p(λ) >","inline":true,"padRight":true},{"text":"0 which contradicts the fact that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"y","element":"span"},{"text":") is strictly increasing. For the backward direction we consider two cases. Firstly if ","element":"span"},{"style":{"height":17.6},"width":277.49,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-0.png","element":"img","alt":" λ∗ > −λmin(H","inline":true},{"text":") the conclusion follows similarly by the monotonicity of ","element":"span"},{"style":{"height":17.6},"width":371.18,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-1.png","element":"img","alt":" p(y). If λ∗ = −λmin","inline":true,"padRight":true},{"text":"then by ","element":"span"},{"href":"#id-69","text":"Lemma C.1, ","element":"a"},{"text":"we have that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"has no component in the lowest eigenspace of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H ","element":"span"},{"text":"and therefore if we extend ","element":"span"},{"style":{"height":17.6},"width":321.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-2.png","element":"img","alt":" p(y) to −λmin(H","inline":true},{"text":") by defining","element":"span"}],[{"style":{"width":"52%"},"width":975,"height":92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-3.png","element":"img"}],[{"text":"we get that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"y","element":"span"},{"text":") is increasing in the domain ","element":"span"},{"style":{"height":17.6},"width":351.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-4.png","element":"img","alt":" y ∈ [−λmin(H), ∞","inline":true},{"text":"). Now from the characterization of the solution in ","element":"span"},{"href":"#id-69","text":"Lemma C.1 ","element":"a"},{"text":"we can see that ","element":"span"},{"style":{"height":17.6},"width":299.94,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-5.png","element":"img","alt":" p(−λmin(H)) ≥","inline":true,"padRight":true},{"text":"0 and therefore by monotonicity ","element":"span"},{"style":{"height":17.6},"width":127.51,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-6.png","element":"img","alt":"p(λ) >","inline":true,"padRight":true},{"text":"0. This finishes the proof.","element":"span"}],[{"id":"id-47","style":{"fontWeight":"bold"},"text":"D ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Main Lemma ","element":"span"},{"href":"#id-45","style":{"fontWeight":"bold"},"text":"1","element":"a"}],[{"style":{"fontWeight":"bold"},"text":"D.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Claim ","element":"span"},{"href":"#id-74","style":{"fontWeight":"bold"},"text":"6.1","element":"a"}],[{"href":"#id-74","style":{"height":16.4},"width":455.87,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-7.png","element":"img","alt":"Claim 6.1. If λ and v","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfy Case 1 and ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-8.png","element":"img","alt":"ε","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfies ","element":"span"},{"href":"#id-46","style":{"fontStyle":"italic"},"text":"(6.1), ","element":"a"},{"style":{"fontStyle":"italic"},"text":"then ","element":"span"},{"style":{"height":21.76},"width":452.77,"height":54.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-9.png","element":"img","alt":" m(v) ≤ m(h∗) + 1250κ3L2","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-74","style":{"fontStyle":"italic"},"text":"Claim 6.1. ","element":"a"},{"text":"Note that by the conditions of the theorem we have that (","element":"span"},{"style":{"height":19.14},"width":405.58,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-10.png","element":"img","alt":"H+(λ−L˜ε)I)−1 ≻ eq","inline":true,"padRight":true},{"text":"and","element":"span"}],[{"style":{"height":20.01},"width":1518.12,"height":50.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-11.png","element":"img","alt":"L∥(H + (λ − L˜ε)I)−1g∥ ≥ 2λ − 2L˜ε and L∥(H + (λ + L˜ε)I)−1g∥ ≤ 2λ − 2L˜ε ,","inline":true,"padRight":true},{"text":"according to ","element":"span"},{"href":"#id-44","text":"Corollary 5.2 ","element":"a"},{"text":"we must have","element":"span"}],[{"id":"id-76","style":{"width":"60%"},"width":1131,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-12.png","element":"img"}],[{"text":"This also implies (using our assumption on ˜","element":"span"},{"style":{"height":17.6},"width":37.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-13.png","element":"img","alt":"ε)","inline":true}],[{"style":{"width":"32%"},"width":610,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-14.png","element":"img"}],[{"text":"Next, consider the value ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"v","element":"span"},{"text":")","element":"span"}],[{"id":"id-75","style":{"width":"88%"},"width":1665,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-15.png","element":"img"}],[{"text":"We bound the two parts on the right hand side of ","element":"span"},{"href":"#id-75","text":"(D.2) ","element":"a"},{"text":"separately. The first part","element":"span"}],[{"style":{"height":29.01},"width":374.8,"height":72.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-16.png","element":"img","alt":"g⊤v + v⊤(H + λI)v","inline":true}],[{"id":"id-83","style":{"width":"101%"},"width":1898,"height":557,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-17.png","element":"img"}],[{"text":"Note that (","element":"span"},{"style":{"height":19.14},"width":255.85,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-18.png","element":"img","alt":"H+λ∗I)−1 ≻","inline":true,"padRight":true},{"text":"0 by Equations ","element":"span"},{"href":"#id-76","text":"(D.1) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-46","text":"(6.2). ","element":"a"},{"text":"The second part of ","element":"span"},{"href":"#id-75","text":"(D.2) ","element":"a"},{"text":"can be bounded as follows","element":"span"}],[{"style":{"width":"65%"},"width":1225,"height":222,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/17-19.png","element":"img"}],[{"text":"Above, inequality ","element":"span"},{"style":{"height":14.8},"width":283.63,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-0.png","element":"img","alt":" x uses λ∗ ≤ B","inline":true,"padRight":true},{"text":"(owing to ","element":"span"},{"href":"#id-53","text":"Proposition 5.3) ","element":"a"},{"text":"and our assumption on ˜","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-1.png","element":"img","alt":"ε","inline":true,"padRight":true},{"text":"from ","element":"span"},{"href":"#id-46","text":"(6.1). ","element":"a"},{"text":"Putting these together we get that","element":"span"}],[{"style":{"width":"63%"},"width":1197,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-2.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"D.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proofs for Claims ","element":"span"},{"href":"#id-77","style":{"fontWeight":"bold"},"text":"6.2 ","element":"a"},{"style":{"fontWeight":"bold"},"text":"and ","element":"span"},{"href":"#id-78","style":{"fontWeight":"bold"},"text":"6.3","element":"a"}],[{"text":"For notational simplicity, let us rotate the space into the basis in the eigenspace of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H","element":"span"},{"text":"; let the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":"-th dimension correspond to the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":"-th largest eigenvalue ","element":"span"},{"style":{"height":15.25},"width":889.61,"height":38.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-3.png","element":"img","alt":" λi of H. We have λ1 ≥ λ2 . . . ≥ λd = λmin. Let","inline":true},{"style":{"height":12},"width":32.82,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-4.png","element":"img","alt":"gi","inline":true,"padRight":true},{"text":"denote the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":"-th coordinate of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"in this basis.","element":"span"}],[{"href":"#id-43","text":"Lemma 5.1 ","element":"a"},{"text":"implies","element":"span"}],[{"id":"id-86","style":{"width":"78%"},"width":1471,"height":121,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-5.png","element":"img"}],[{"text":"where we denote by","element":"span"}],[{"id":"id-80","style":{"width":"99%"},"width":1869,"height":580,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-6.png","element":"img"}],[{"text":"We begin with a few auxiliary claims.","element":"span"}],[{"id":"id-79","style":{"fontWeight":"bold"},"text":"Claim D.1. ","element":"span"},{"style":{"height":31.6},"width":853.75,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-7.png","element":"img","alt":"If λmin(H) ≤ − 1κ then S2 ≥ 1000 · m�λvmin2L �","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-79","style":{"fontStyle":"italic"},"text":"Claim D.1. ","element":"a"},{"text":"We compute that","element":"span"}],[{"id":"id-82","style":{"width":"92%"},"width":1735,"height":256,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-8.png","element":"img"}],[{"text":"Above, ","element":"span"},{"text":"x ","element":"span"},{"text":"uses ","element":"span"},{"href":"#id-80","text":"(D.5), ","element":"a"},{"text":"and ","element":"span"},{"text":"y ","element":"span"},{"text":"follows because we have ","element":"span"},{"style":{"height":21.3},"width":279.8,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-9.png","element":"img","alt":" λmin(H) ≤ − 1κ ","inline":true,"padRight":true},{"text":"in the assumption and have ","element":"span"},{"style":{"height":21.3},"width":371.9,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-10.png","element":"img","alt":"λ∗ ≤ −λmin(H) + 1κ ","inline":true,"padRight":true},{"text":"in the assumption of Case 2 of ","element":"span"},{"href":"#id-45","text":"Main Lemma 1.","element":"a"}],[{"text":"Let us now consider the value of the vector ","element":"span"},{"style":{"height":22.3},"width":84.62,"height":55.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-11.png","element":"img","alt":"λvmin2L ","inline":true,"padRight":true},{"text":". We have that","element":"span"}],[{"style":{"width":"78%"},"width":1474,"height":224,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-12.png","element":"img"}],[{"text":"Above, ","element":"span"},{"text":"x ","element":"span"},{"text":"is because our assumption ","element":"span"},{"style":{"height":21.3},"width":278.78,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-13.png","element":"img","alt":" λmin(H) ≤ − 1κ ","inline":true,"padRight":true},{"text":"and assumption ","element":"span"},{"style":{"height":21.3},"width":532.5,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-14.png","element":"img","alt":" vminHvmin ≤ λmin(H) + 110κ","inline":true,"padRight":true},{"text":"together imply ","element":"span"},{"style":{"height":22.3},"width":403.14,"height":55.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-15.png","element":"img","alt":" vminHvmin ≤ λmin2 . y","inline":true,"padRight":true},{"text":"follows from ","element":"span"},{"style":{"height":21.3},"width":747.16,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-16.png","element":"img","alt":" λmin(H) ≤ − 1κ and λ ≤ −λmin(H) + 1κ.","inline":true}],[{"text":"Now, recall that the sign of ","element":"span"},{"style":{"height":10.62},"width":77.8,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-17.png","element":"img","alt":" vmin","inline":true,"padRight":true},{"text":"is chosen so ","element":"span"},{"style":{"height":12},"width":128.54,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/18-18.png","element":"img","alt":" g⊤vmin","inline":true,"padRight":true},{"text":"is non-positive, and therefore by our","element":"span"}],[{"id":"id-81","style":{"width":"99%"},"width":1868,"height":180,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-0.png","element":"img"}],[{"text":"Putting inequalities ","element":"span"},{"href":"#id-81","text":"(D.8) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-82","text":"(D.7) ","element":"a"},{"text":"together finishes the proof of ","element":"span"},{"href":"#id-79","text":"Claim D.1.","element":"a"}],[{"text":"We also show the following lemma, the proof of which can be seen from inequality ","element":"span"},{"href":"#id-83","text":"(D.3), ","element":"a"},{"text":"as part of the proof of ","element":"span"},{"href":"#id-74","text":"Claim 6.1 ","element":"a"},{"text":"above.","element":"span"}],[{"id":"id-85","style":{"height":16.4},"width":777.06,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-1.png","element":"img","alt":"Lemma D.2. If we have λ, v such that","inline":true}],[{"style":{"width":"54%"},"width":1028,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-2.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"with ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-3.png","element":"img","alt":"ε","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfying condition ","element":"span"},{"href":"#id-46","style":{"fontStyle":"italic"},"text":"(6.1) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"then we have that","element":"span"}],[{"style":{"width":"53%"},"width":1011,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-4.png","element":"img"}],[{"id":"id-84","style":{"fontWeight":"bold"},"text":"Claim D.3. ","element":"span"},{"style":{"height":21.76},"width":403.93,"height":54.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-5.png","element":"img","alt":"S1 ≥ 4m(v) − 1250κ3L2","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-84","style":{"fontStyle":"italic"},"text":"Claim D.3. ","element":"a"},{"text":"We have that","element":"span"}],[{"style":{"width":"83%"},"width":1566,"height":579,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-6.png","element":"img"}],[{"text":"Above, ","element":"span"},{"text":"x ","element":"span"},{"text":"is due to ","element":"span"},{"href":"#id-85","text":"Lemma D.2; ","element":"a"},{"text":"y ","element":"span"},{"text":"uses our condition on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"text":"which gives ","element":"span"},{"style":{"height":17.6},"width":534.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-7.png","element":"img","alt":" L∥v∥ ∈ [2λ−3L˜ε, 2λ+3L˜ε];","inline":true,"padRight":true},{"text":"z ","element":"span"},{"text":"uses our condition ","element":"span"},{"href":"#id-46","text":"(6.1) ","element":"a"},{"text":"on ˜","element":"span"},{"style":{"height":8.4},"width":32.36,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-8.png","element":"img","alt":"ε.","inline":true}],[{"text":"We now bound ","element":"span"},{"style":{"height":15.02},"width":43.77,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-9.png","element":"img","alt":" S1","inline":true},{"text":". For this purpose first we note that if ","element":"span"},{"style":{"height":21.3},"width":633.24,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-10.png","element":"img","alt":" λi + λ∗ ≥ 1κ and λ − λ∗ ≤ 1κ then","inline":true}],[{"style":{"width":"38%"},"width":714,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-11.png","element":"img"}],[{"text":"Therefore, the sum ","element":"span"},{"style":{"height":15.02},"width":206.26,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-12.png","element":"img","alt":" S1 satisfies","inline":true}],[{"style":{"width":"96%"},"width":1800,"height":140,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-13.png","element":"img"}],[{"text":"(Note that we have ","element":"span"},{"style":{"height":14},"width":183.28,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-14.png","element":"img","alt":" H + λI ≻","inline":true,"padRight":true},{"text":"0.) This finishes the proof of ","element":"span"},{"href":"#id-84","text":"Claim D.3.","element":"a"}],[{"style":{"width":"85%"},"width":1599,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-15.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-77","style":{"fontStyle":"italic"},"text":"Claim 6.2. ","element":"a"},{"text":"We derive that","element":"span"}],[{"style":{"width":"59%"},"width":1116,"height":334,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/19-16.png","element":"img"}],[{"style":{"width":"44%"},"width":839,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-0.png","element":"img"}],[{"text":"Above, ","element":"span"},{"text":"x ","element":"span"},{"text":"uses equation ","element":"span"},{"href":"#id-86","text":"(D.4), ","element":"a"},{"text":"inequality ","element":"span"},{"text":"y ","element":"span"},{"text":"follows because we have ","element":"span"},{"style":{"height":21.3},"width":274.14,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-1.png","element":"img","alt":" λmin(H) ≤ − 1κ ","inline":true,"padRight":true},{"text":"in the assump- ","element":"span"},{"text":"tion and have ","element":"span"},{"style":{"height":21.3},"width":363.47,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-2.png","element":"img","alt":" λ∗ ≤ −λmin(H)+ 1κ ","inline":true,"padRight":true},{"text":"in the assumption of Case 2 of ","element":"span"},{"href":"#id-45","text":"Main Lemma 1; ","element":"a"},{"text":"inequality ","element":"span"},{"text":"z ","element":"span"},{"text":"uses ","element":"span"},{"href":"#id-84","text":"Claim D.3 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-79","text":"Claim D.1; ","element":"a"},{"text":"and inequality ","element":"span"},{"text":"{ ","element":"span"},{"text":"uses ","element":"span"},{"href":"#id-81","text":"(D.8). ","element":"a"},{"text":"This finishes the proof of ","element":"span"},{"href":"#id-77","text":"Claim 6.2.","element":"a"}],[{"style":{"fontWeight":"bold"},"text":"Claim ","element":"span"},{"href":"#id-78","style":{"fontWeight":"bold"},"text":"6.3. ","element":"a"},{"style":{"height":21.76},"width":860.73,"height":54.39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-3.png","element":"img","alt":"If λmin(H) ≥ − 1κ then m(h∗) ≥ 2m(v) − 16κ3L2","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-78","style":{"fontStyle":"italic"},"text":"Claim 6.3. ","element":"a"},{"text":"This time we lower bound ","element":"span"},{"style":{"height":15.02},"width":43.77,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-4.png","element":"img","alt":" S2","inline":true,"padRight":true},{"text":"slightly differently:","element":"span"}],[{"id":"id-87","style":{"width":"63%"},"width":1188,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-5.png","element":"img"}],[{"text":"where ","element":"span"},{"text":"x ","element":"span"},{"text":"comes from the second to last inequality from ","element":"span"},{"href":"#id-82","text":"(D.7) ","element":"a"},{"text":"and ","element":"span"},{"text":"y ","element":"span"},{"text":"comes from ","element":"span"},{"style":{"height":14.8},"width":212.21,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-6.png","element":"img","alt":" λ∗ ≤ λ ≤","inline":true},{"style":{"height":21.3},"width":356.6,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-7.png","element":"img","alt":"−λmin(H) + 1κ ≤ 2κ ","inline":true,"padRight":true},{"text":"using our assumption in Case 2 of ","element":"span"},{"href":"#id-45","text":"Main Lemma 1.","element":"a"}],[{"text":"Putting these together we get that","element":"span"}],[{"style":{"width":"78%"},"width":1479,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-8.png","element":"img"}],[{"text":"Above, ","element":"span"},{"text":"x ","element":"span"},{"text":"comes from ","element":"span"},{"href":"#id-86","text":"(D.4), ","element":"a"},{"text":"y ","element":"span"},{"text":"uses ","element":"span"},{"href":"#id-84","text":"Claim D.3, ","element":"a"},{"text":"lower bound ","element":"span"},{"href":"#id-87","text":"(D.10) ","element":"a"},{"text":"and ","element":"span"},{"style":{"height":25.99},"width":255.88,"height":64.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-9.png","element":"img","alt":"2(λ∗)33L2 ≤ 163κ3L2","inline":true}],[{"id":"id-48","style":{"fontWeight":"bold"},"text":"E ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Main Lemma ","element":"span"},{"href":"#id-55","style":{"fontWeight":"bold"},"text":"2","element":"a"}],[{"style":{"fontWeight":"bold"},"text":"Main Lemma ","element":"span"},{"href":"#id-55","style":{"fontWeight":"bold"},"text":"2. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"In the same setting as ","element":"span"},{"href":"#id-45","style":{"fontStyle":"italic"},"text":"Main Lemma 1, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"suppose ","element":"span"},{"style":{"height":28.81},"width":541.56,"height":72.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-10.png","element":"img","alt":" m(h∗) ≥ − ε3/2300√L. Then the","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"output vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"style":{"fontStyle":"italic"},"text":"satisfies the following conditions:","element":"span"}],[{"style":{"width":"50%"},"width":947,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-11.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-55","style":{"fontStyle":"italic"},"text":"Main Lemma 2. ","element":"a"},{"text":"Let’s first note that from the value given in ","element":"span"},{"href":"#id-43","text":"Lemma 5.1,","element":"a"}],[{"id":"id-88","style":{"width":"66%"},"width":1246,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-12.png","element":"img"}],[{"text":"If Case 1 occurs, we have","element":"span"}],[{"style":{"width":"72%"},"width":1357,"height":92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-13.png","element":"img"}],[{"text":"Above, inequalities ","element":"span"},{"text":"x ","element":"span"},{"text":"and ","element":"span"},{"text":"y ","element":"span"},{"text":"both use the assumptions of Case 1; inequality ","element":"span"},{"text":"z ","element":"span"},{"text":"uses the fact that ","element":"span"},{"style":{"height":17.6},"width":386.95,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-14.png","element":"img","alt":"λ∗ ∈ [λ − L˜ε, λ + L˜ε","inline":true},{"text":"] which again follows from the assumptions of Case 1 (see ","element":"span"},{"href":"#id-76","text":"(D.1))","element":"a"},{"text":"; inequality ","element":"span"},{"text":"{ ","element":"span"},{"text":"uses ","element":"span"},{"style":{"height":21.8},"width":202.45,"height":54.5,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-15.png","element":"img","alt":" ∥h∗∥ = 2λ∗L ","inline":true,"padRight":true},{"text":"from ","element":"span"},{"href":"#id-43","text":"Lemma 5.1 ","element":"a"},{"text":"as well as our assumption ","element":"span"},{"href":"#id-46","text":"(6.1) ","element":"a"},{"text":"on ˜","element":"span"},{"style":{"height":8.4},"width":32.36,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-16.png","element":"img","alt":"ε.","inline":true}],[{"text":"As for the quantity ","element":"span"},{"style":{"height":17.6},"width":175.22,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-17.png","element":"img","alt":" ∥∇m(v)∥","inline":true},{"text":", we bound it as follows","element":"span"}],[{"style":{"width":"99%"},"width":1872,"height":648,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/20-18.png","element":"img"}],[{"href":"#id-76","text":"(D.1))","element":"a"},{"text":"; inequality ","element":"span"},{"style":{"height":14.8},"width":297.96,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-0.png","element":"img","alt":" | uses λ∗ ≤ 2B","inline":true},{"text":"; and inequality ","element":"span"},{"text":"} ","element":"span"},{"text":"uses ","element":"span"},{"href":"#id-88","text":"(E.1) ","element":"a"},{"text":"together with our assumption ","element":"span"},{"href":"#id-46","text":"(6.1) ","element":"a"},{"text":"on ˜","element":"span"},{"style":{"height":8.4},"width":32.36,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-1.png","element":"img","alt":"ε.","inline":true}],[{"text":"If Case 2 occurs, we have","element":"span"}],[{"id":"id-89","style":{"width":"85%"},"width":1593,"height":94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-2.png","element":"img"}],[{"text":"Above, inequalities ","element":"span"},{"text":"x ","element":"span"},{"text":"and ","element":"span"},{"text":"y ","element":"span"},{"text":"both use the assumptions of Case 2; inequality ","element":"span"},{"style":{"height":17.6},"width":447.19,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-3.png","element":"img","alt":" z uses λ ≤ −λmin(H)+","inline":true,"padRight":true},{"text":"1","element":"span"},{"style":{"height":17.6},"width":46.83,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-4.png","element":"img","alt":"/κ","inline":true,"padRight":true},{"text":"from our assumption of Case 2 as well as ","element":"span"},{"style":{"height":17.6},"width":311.5,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-5.png","element":"img","alt":" −λmin(H) ≤ λ∗ ","inline":true,"padRight":true},{"text":"which comes from ","element":"span"},{"href":"#id-43","text":"Lemma 5.1; ","element":"a"},{"text":"inequality ","element":"span"},{"style":{"height":21.8},"width":344.06,"height":54.5,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-6.png","element":"img","alt":" { uses ∥h∗∥ = 2λ∗L ","inline":true,"padRight":true},{"text":"from ","element":"span"},{"href":"#id-43","text":"Lemma 5.1 ","element":"a"},{"text":"as well as our assumption ","element":"span"},{"href":"#id-46","text":"(6.1) ","element":"a"},{"text":"on ˜","element":"span"},{"style":{"height":8.4},"width":32.36,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-7.png","element":"img","alt":"ε.","inline":true}],[{"text":"The quantity ","element":"span"},{"style":{"height":17.6},"width":175.22,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-8.png","element":"img","alt":" ∥∇m(v)∥","inline":true,"padRight":true},{"text":"can be bounded in an analogous manner as Case 1:","element":"span"}],[{"style":{"width":"82%"},"width":1541,"height":212,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-9.png","element":"img"}],[{"text":"Above, inequality ","element":"span"},{"text":"x ","element":"span"},{"text":"uses our assumption ","element":"span"},{"href":"#id-46","text":"(6.1) ","element":"a"},{"text":"on ˜","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-10.png","element":"img","alt":"ε","inline":true},{"text":"; inequality ","element":"span"},{"style":{"height":21.3},"width":345.69,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-11.png","element":"img","alt":" y uses λ ≤ λ∗ + 1κ ","inline":true,"padRight":true},{"text":"which appeared ","element":"span"},{"text":"in ","element":"span"},{"href":"#id-89","text":"(E.2); ","element":"a"},{"text":"inequality ","element":"span"},{"text":"z ","element":"span"},{"text":"uses ","element":"span"},{"href":"#id-88","text":"(E.1).","element":"a"}],[{"id":"id-51","style":{"fontWeight":"bold"},"text":"F ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-52","style":{"fontWeight":"bold"},"text":"7.2","element":"a"}],[{"style":{"fontWeight":"bold"},"text":"Lemma ","element":"span"},{"href":"#id-52","style":{"fontWeight":"bold"},"text":"7.2. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"The following statements hold for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"style":{"fontStyle":"italic"},"text":"until ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"style":{"fontStyle":"italic"},"text":"terminates","element":"span"}],[{"style":{"width":"55%"},"width":1042,"height":193,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-12.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Moreover when ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"style":{"fontStyle":"italic"},"text":"terminates at ","element":"span"},{"href":"#id-32","style":{"fontStyle":"italic"},"text":"Line 20 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"we have ","element":"span"},{"style":{"height":21.3},"width":349.86,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-13.png","element":"img","alt":" λi + λmin(H) ≤ 1κ.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-52","style":{"fontStyle":"italic"},"text":"Lemma 7.2. ","element":"a"},{"text":"The lemma follows via induction.","element":"span"}],[{"text":"To see (a) and (b) at the base case ","element":"span"},{"style":{"height":15.6},"width":835.19,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-14.png","element":"img","alt":" i = 0, recall that the definitions of B and L2","inline":true,"padRight":true},{"text":"together ensure ","element":"span"},{"style":{"height":21.3},"width":1214.86,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-15.png","element":"img","alt":"λ0 + λmax(H) ≤ 3B and λ0 + λmin(H) ≥ 310κ. Also λ0 ∈ [0, 2B].","inline":true}],[{"text":"Suppose now for some ","element":"span"},{"style":{"height":14.4},"width":66.4,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-16.png","element":"img","alt":" i ≥","inline":true,"padRight":true},{"text":"0 properties (a) and (b) hold. It is easy to check that ","element":"span"},{"style":{"height":15.02},"width":188.02,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-17.png","element":"img","alt":" λi ≤ λi−1","inline":true,"padRight":true},{"text":"and thus we have ","element":"span"},{"style":{"height":17.6},"width":645.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-18.png","element":"img","alt":" λi + λmax(H) ≤ 2B and λi ≤ 2B","inline":true},{"text":". This implies property (a) at iteration ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"+ 1 also hold. We now proceed to show property (c) at iteration ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"and property (b) at iteration ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"+ 1.","element":"span"}],[{"text":"Recall that the algorithm ensures","element":"span"}],[{"style":{"width":"65%"},"width":1229,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-19.png","element":"img"}],[{"text":"and by the definition of ˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"w ","element":"span"},{"text":"we have","element":"span"}],[{"style":{"width":"80%"},"width":1516,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-20.png","element":"img"}],[{"text":"Now, since ","element":"span"},{"style":{"height":21.3},"width":485,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-21.png","element":"img","alt":"310κ ≤ λi + λmin(H) ≤ 3B","inline":true,"padRight":true},{"text":"from the inductive assumption, it follows from the choice of ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"height":12.8},"width":114.99,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-22.png","element":"img","alt":"ε that","inline":true}],[{"style":{"width":"99%"},"width":1868,"height":275,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/21-23.png","element":"img"}],[{"text":"Inverting this chain of inequalities, we have","element":"span"}],[{"style":{"width":"69%"},"width":1309,"height":94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-0.png","element":"img"}],[{"text":"From this we derive the following implications:","element":"span"}],[{"id":"id-90","style":{"width":"67%"},"width":1265,"height":194,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-1.png","element":"img"}],[{"text":"If Condition ","element":"span"},{"href":"#id-90","text":"(F.4) ","element":"a"},{"text":"happens, our algorithm ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"outputs on ","element":"span"},{"href":"#id-32","text":"Line 20; ","element":"a"},{"text":"in such a case ","element":"span"},{"href":"#id-90","text":"(F.4) ","element":"a"},{"text":"implies our desired inequality ","element":"span"},{"style":{"height":21.3},"width":342.18,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-2.png","element":"img","alt":" λi + λmin(H) ≤ 1κ","inline":true},{"text":". If Condition ","element":"span"},{"href":"#id-90","text":"(F.5) ","element":"a"},{"text":"happens, our choice ˜","element":"span"},{"style":{"height":16.22},"width":141.87,"height":40.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-3.png","element":"img","alt":"λi+1 ←","inline":true}],[{"style":{"width":"78%"},"width":1470,"height":327,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-4.png","element":"img"}],[{"text":"Therefore, we conclude that property (c) at iteration ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"holds and property (b) at iteration ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"+ 1 hold because ","element":"span"},{"style":{"height":20.03},"width":220.86,"height":50.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-5.png","element":"img","alt":" λi+1 ≥ ˜λi+1","inline":true},{"text":". This finishes the proof of ","element":"span"},{"href":"#id-52","text":"Lemma 7.2.","element":"a"}],[{"id":"id-50","style":{"fontWeight":"bold"},"text":"G ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Main Lemma ","element":"span"},{"href":"#id-49","style":{"fontWeight":"bold"},"text":"3: ","element":"a"},{"style":{"fontWeight":"bold"},"text":"Running Time Half","element":"span"}],[{"text":"Having proven the correctness of the algorithm, we now aim to bound the overall running time of ","element":"span"},{"href":"#id-32","text":"FastCubicMin","element":"a"},{"text":", completing the proof of ","element":"span"},{"href":"#id-49","text":"Main Lemma 3. ","element":"a"},{"text":"We prove in ","element":"span"},{"href":"#id-91","text":"Appendix H ","element":"a"},{"text":"the following lemma:","element":"span"}],[{"id":"id-99","href":"#id-32","style":{"height":26},"width":1668.23,"height":64.99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-6.png","element":"img","alt":"Lemma G.1. If λ2+λmin(H) ≥ c1 ∈ (0, 1) then BinarySearch ends in O�log( (λ1−λ2)Bc1·L·˜ε )�","inline":true},{"style":{"fontStyle":"italic"},"text":"iterations.","element":"span"}],[{"text":"Since in our ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"algorithm, it satisfies ","element":"span"},{"style":{"height":21.3},"width":591.1,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-7.png","element":"img","alt":" λi ≤ 2B and λi+λmin(H) ≥ 310κ ","inline":true,"padRight":true},{"text":"(see ","element":"span"},{"href":"#id-52","text":"Lemma 7.2)","element":"a"},{"text":", ","element":"span"},{"text":"taken together with our choice of ˜","element":"span"},{"style":{"height":12.8},"width":197.26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-8.png","element":"img","alt":"ε we have:","inline":true}],[{"id":"id-95","style":{"fontWeight":"bold"},"text":"Claim G.2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Each invocation of ","element":"span"},{"href":"#id-32","style":{"height":20.8},"width":642.94,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-9.png","element":"img","alt":" BinarySearch ends in O�log(1/˜ε)�","inline":true},{"style":{"fontStyle":"italic"},"text":"iterations.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Claim G.3. ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"style":{"fontStyle":"italic"},"text":"ends in at most ","element":"span"},{"style":{"height":17.6},"width":219.3,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-10.png","element":"img","alt":" O(log(Bκ))","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"outer loops.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"According to ","element":"span"},{"href":"#id-52","text":"Lemma 7.2 ","element":"a"},{"text":"we have ","element":"span"},{"style":{"height":21.3},"width":648.62,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-11.png","element":"img","alt":"34(λi−1 + λmin(H)) ≥ λi + λmin(H","inline":true},{"text":") so the quantity ","element":"span"},{"style":{"height":15.02},"width":83.65,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-12.png","element":"img","alt":" λi +","inline":true},{"style":{"height":17.6},"width":140.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-13.png","element":"img","alt":"λmin(H","inline":true},{"text":") decreases by a constant factor per iteration (except possibly ","element":"span"},{"style":{"height":16},"width":508.15,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-14.png","element":"img","alt":" λi = 0 the last outer loop","inline":true,"padRight":true},{"text":"in which case we shall terminate in one more iteration). On one hand, we have began with ","element":"span"},{"style":{"height":15.02},"width":89.24,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-15.png","element":"img","alt":" λ0 +","inline":true},{"style":{"height":17.6},"width":274.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-16.png","element":"img","alt":"λmin(H) ≤ 3B","inline":true},{"text":". On the other hand, we always have ","element":"span"},{"style":{"height":21.3},"width":371.23,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-17.png","element":"img","alt":" λi + λmin(H) ≥ 310κ ","inline":true,"padRight":true},{"text":"according to ","element":"span"},{"href":"#id-52","text":"Lemma 7.2. ","element":"a"},{"text":"Therefore, the total number of outer loops is at most ","element":"span"},{"style":{"height":17.6},"width":231.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-18.png","element":"img","alt":" O(log(Bκ)).","inline":true}],[{"style":{"fontWeight":"bold"},"text":"G.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Matrix Inverse","element":"span"}],[{"text":"Since the key component of the running time is the computation of (","element":"span"},{"style":{"height":19.14},"width":218.32,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-19.png","element":"img","alt":"H+λiI)−1b","inline":true,"padRight":true},{"text":"for different vectors ","element":"span"},{"style":{"fontStyle":"italic"},"text":"b ","element":"span"},{"text":"we will first bound the condition number of the matrix (","element":"span"},{"style":{"height":19.13},"width":211.02,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-20.png","element":"img","alt":"H + λiI)−1 ","inline":true,"padRight":true},{"text":"via the following lemma","element":"span"}],[{"id":"id-92","style":{"fontWeight":"bold"},"text":"Claim G.4. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Through out the execution of ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-32","text":"BinarySearch ","element":"a"},{"style":{"fontStyle":"italic"},"text":"whenever we compute ","element":"span"},{"text":"(","element":"span"},{"style":{"height":19.14},"width":231.96,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-21.png","element":"img","alt":"H + λiI)−1b","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"for some vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"b ","element":"span"},{"style":{"fontStyle":"italic"},"text":"it satisfies ","element":"span"},{"style":{"height":25.61},"width":379.32,"height":64.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-22.png","element":"img","alt":"λi+L2λi+λmin(H) ≤ 10κL2.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-92","style":{"fontStyle":"italic"},"text":"Claim G.4. ","element":"a"},{"text":"We first focus on ","element":"span"},{"href":"#id-32","text":"Line 5 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-32","text":"Line 11 ","element":"a"},{"text":"of ","element":"span"},{"href":"#id-32","text":"FastCubicMin","element":"a"},{"text":". There are two cases. If ","element":"span"},{"style":{"height":15.02},"width":167.43,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-23.png","element":"img","alt":"λi ≥ 2L2","inline":true},{"text":", then according to ","element":"span"},{"style":{"height":14.62},"width":328.36,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-24.png","element":"img","alt":" −L2I ⪯ H ⪯ L2I","inline":true,"padRight":true},{"text":"we can bound ","element":"span"},{"style":{"height":25.61},"width":237.62,"height":64.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/22-25.png","element":"img","alt":"λi+L2λi+λmin(H) ≤","inline":true,"padRight":true},{"text":"3 because the left hand ","element":"span"},{"text":"side is the largest when ","element":"span"},{"style":{"height":15.02},"width":419.29,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-0.png","element":"img","alt":" λi = 2L2. If λi < 2L2","inline":true},{"text":", then by ","element":"span"},{"href":"#id-52","text":"Lemma 7.2 ","element":"a"},{"text":"we know ","element":"span"},{"style":{"height":21.3},"width":387.41,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-1.png","element":"img","alt":" λi + λmin(H) ≥ 310κ.","inline":true,"padRight":true},{"text":"This implies ","element":"span"},{"style":{"height":25.61},"width":378.32,"height":64.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-2.png","element":"img","alt":"λi+L2λi+λmin(H) ≤ 10κL2.","inline":true}],[{"text":"We now focus on ","element":"span"},{"href":"#id-32","text":"Line 3 ","element":"a"},{"text":"of ","element":"span"},{"href":"#id-32","text":"BinarySearch","element":"a"},{"text":". We claim that all values ","element":"span"},{"style":{"height":15.24},"width":82.1,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-3.png","element":"img","alt":" λmid","inline":true,"padRight":true},{"text":"iterated over ","element":"span"},{"href":"#id-32","text":"BinarySearch ","element":"a"},{"text":"also satisfy ","element":"span"},{"style":{"height":21.3},"width":404.57,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-4.png","element":"img","alt":" λmid +λmin(H) ≥ 310κ ","inline":true,"padRight":true},{"text":"(because the values ","element":"span"},{"style":{"height":21.3},"width":847.77,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-5.png","element":"img","alt":" λmid ≥ λi and λi satisfies λi +λmin(H) ≥ 310κ","inline":true,"padRight":true},{"text":"according to ","element":"span"},{"href":"#id-52","text":"Lemma 7.2)","element":"a"},{"text":". ","element":"span"},{"text":"Therefore, the same case analysis (with respect to ","element":"span"},{"style":{"height":15.24},"width":320.64,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-6.png","element":"img","alt":" λmid ≥ 2L2 and","inline":true},{"style":{"height":15.24},"width":210.69,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-7.png","element":"img","alt":"λmid < 2L2","inline":true},{"text":") also gives ","element":"span"},{"style":{"height":25.61},"width":378.32,"height":64.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-8.png","element":"img","alt":"λi+L2λi+λmin(H) ≤ 10κL2.","inline":true}],[{"id":"id-97","style":{"fontWeight":"bold"},"text":"Claim G.5. ","element":"span"},{"href":"#id-32","style":{"fontStyle":"italic"},"text":"Line 5 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"of ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-32","style":{"fontStyle":"italic"},"text":"Line 3 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"of ","element":"span"},{"href":"#id-32","text":"BinarySearch ","element":"a"},{"style":{"fontStyle":"italic"},"text":"runs in time ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":17.6},"width":355.98,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-9.png","element":"img","alt":"O(Tinverse(κL2, ˜ε)).","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Whenever we compute (","element":"span"},{"style":{"height":19.14},"width":235.37,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-10.png","element":"img","alt":"H + λiI)−1b","inline":true,"padRight":true},{"text":"for some vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"text":"it satisfies ","element":"span"},{"style":{"height":17.6},"width":197.7,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-11.png","element":"img","alt":" ∥b∥ ≤ 1/˜ε","inline":true},{"text":"; therefore to find ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"text":"satisfying ","element":"span"},{"style":{"height":19.14},"width":453.35,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-12.png","element":"img","alt":" ∥v + (H + λiI)−1b∥ ≤ ˜ε","inline":true,"padRight":true},{"text":"it suffices to find ","element":"span"},{"style":{"height":19.14},"width":530.3,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-13.png","element":"img","alt":" ∥v + (H + λiI)−1b∥ ≤ ˜ε2∥b∥","inline":true},{"text":". This costs a total running time ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":17.6},"width":313.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-14.png","element":"img","alt":"O(Tinverse(κL2, ˜ε","inline":true},{"text":")) according to ","element":"span"},{"href":"#id-93","text":"Theorem 2.4.","element":"a"}],[{"text":"Therefore by ","element":"span"},{"href":"#id-93","text":"Theorem 2.4, ","element":"a"},{"text":"every time we need to multiply a vector ","element":"span"},{"style":{"height":19.14},"width":478.92,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-15.png","element":"img","alt":" v to (H + λI)−1 to error","inline":true},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-16.png","element":"img","alt":"δ","inline":true},{"text":", the time required to approximately solve such a system is ","element":"span"},{"style":{"height":17.6},"width":325.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-17.png","element":"img","alt":" Tinverse(O(κL2), δ","inline":true},{"text":"). We will state our running time with respect to ","element":"span"},{"style":{"height":14.73},"width":124.96,"height":36.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-18.png","element":"img","alt":" Tinverse","inline":true,"padRight":true},{"text":"as it is the dominant operation in the algorithm.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"G.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Power Method","element":"span"}],[{"text":"We now bound the running time of Power Method in ","element":"span"},{"href":"#id-32","text":"Line 11 ","element":"a"},{"text":"of ","element":"span"},{"href":"#id-32","text":"FastCubicMin","element":"a"},{"text":". It is a folklore (cf. ","element":"span"},{"href":"#id-37","referenceIndex":3,"text":"[3, ","element":"a"},{"text":"Appendix A]) that getting any constant multiplicative approximation to the leading eigenvector of any PSD matrix ","element":"span"},{"style":{"height":15.94},"width":213.22,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-19.png","element":"img","alt":" M ∈ Rd×d ","inline":true,"padRight":true},{"text":"requires only ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(log ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":") iterations, each computing ","element":"span"},{"style":{"fontWeight":"bold"},"text":"M","element":"span"},{"style":{"fontStyle":"italic"},"text":"b ","element":"span"},{"text":"for some vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"b","element":"span"},{"text":". In our case, we have ","element":"span"},{"style":{"height":19.13},"width":339.08,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-20.png","element":"img","alt":" M = (H + λiI)−1 ","inline":true,"padRight":true},{"text":"so we cannot compute ","element":"span"},{"style":{"fontWeight":"bold"},"text":"M","element":"span"},{"style":{"fontStyle":"italic"},"text":"b ","element":"span"},{"text":"exactly. Fortunately, folklore results on inaccurate power method suggests that, as long as each ","element":"span"},{"style":{"fontWeight":"bold"},"text":"M","element":"span"},{"style":{"fontStyle":"italic"},"text":"b ","element":"span"},{"text":"is computed to a very good accuracy such as ˜","element":"span"},{"style":{"height":16.34},"width":164.23,"height":40.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-21.png","element":"img","alt":"ε−Ω(log d)","inline":true},{"text":", then we can still get a constant multiplicative approximate leading eigenvector that satisfies ","element":"span"},{"href":"#id-32","text":"Line 11 ","element":"a"},{"text":"of ","element":"span"},{"href":"#id-32","text":"FastCubicMin","element":"a"},{"text":". Ignoring all the details (which are quite standard and can be found for instance in ","element":"span"},{"href":"#id-37","referenceIndex":3,"text":"[3, ","element":"a"},{"text":"Appendix A]), we claim that","element":"span"}],[{"id":"id-96","style":{"fontWeight":"bold"},"text":"Claim G.6. ","element":"span"},{"href":"#id-32","style":{"fontStyle":"italic"},"text":"Line 11 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"of ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"style":{"fontStyle":"italic"},"text":"runs in time ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":21.87},"width":967.02,"height":54.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-22.png","element":"img","alt":"O�Tinverse�κL2, ε−Θ(log(d))��= ˜O (Tinverse (κL2, ε)).","inline":true}],[{"style":{"fontWeight":"bold"},"text":"G.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Lowest Eigenvector","element":"span"}],[{"text":"We will now focus on the running time for the computation of the lowest eigenvector of the Hessian which is required in ","element":"span"},{"href":"#id-32","text":"Line 18. ","element":"a"},{"text":"We recall ","element":"span"},{"href":"#id-94","text":"Theorem 2.5 ","element":"a"},{"text":"from ","element":"span"},{"text":"Section 2 ","element":"span"},{"text":"which uses Shift and Invert to compute the largest eigenvalue of a matrix.","element":"span"}],[{"text":"Since we are concerned with the lowest eigenvector of ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H ","element":"span"},{"text":"and by assumption ","element":"span"},{"style":{"height":15.2},"width":337.12,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-23.png","element":"img","alt":" −L2I ⪯ H ⪯ L2I,","inline":true,"padRight":true},{"text":"we can equivalently compute the largest eigenvector of ","element":"span"},{"style":{"height":23.78},"width":294.48,"height":59.45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-24.png","element":"img","alt":" M ≜ I − H+L2I2L2","inline":true,"padRight":true},{"text":"which satisfies 0 ","element":"span"},{"style":{"height":14.4},"width":182.99,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-25.png","element":"img","alt":" ⪯ M ⪯ I.","inline":true,"padRight":true},{"text":"Note that computing ","element":"span"},{"style":{"fontWeight":"bold"},"text":"M","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"text":"is of the same time complexity as computing ","element":"span"},{"style":{"fontWeight":"bold"},"text":"H","element":"span"},{"style":{"fontStyle":"italic"},"text":"v","element":"span"},{"text":". By setting ","element":"span"},{"style":{"height":23.07},"width":249.61,"height":57.67,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-26.png","element":"img","alt":" ε = δ× = 0.01κL2","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-94","text":"Theorem 2.5 ","element":"a"},{"text":"and running AppxPCA, we obtain a unit vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"w ","element":"span"},{"text":"such that","element":"span"}],[{"style":{"width":"95%"},"width":1796,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-27.png","element":"img"}],[{"text":"Above, ","element":"span"},{"style":{"height":17.6},"width":377.63,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-28.png","element":"img","alt":" x uses λmax(M) ≤","inline":true,"padRight":true},{"text":"1. Rearranging the terms we obtain ","element":"span"},{"style":{"height":17.6},"width":584.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-29.png","element":"img","alt":" w⊤Hw ≤ λmin(H) + 0.05κ as","inline":true,"padRight":true},{"text":"desired. In sum,","element":"span"}],[{"id":"id-98","style":{"fontWeight":"bold"},"text":"Claim G.7. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The approximate lowest eigenvector computation on ","element":"span"},{"href":"#id-32","style":{"fontStyle":"italic"},"text":"Line 18 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"runs in time ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":17.6},"width":370.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/23-30.png","element":"img","alt":"O (Tinverse (κL2, ˜ε)).","inline":true}],[{"style":{"fontWeight":"bold"},"text":"G.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Putting It All Together","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Running-Time Proof of ","element":"span"},{"href":"#id-49","style":{"fontStyle":"italic"},"text":"Main Lemma 3. ","element":"a"},{"text":"Putting together our bounds in ","element":"span"},{"href":"#id-95","text":"Claim G.2 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-95","text":"Claim G.2 ","element":"a"},{"text":"which bound the number of iterations, as well as our bounds in ","element":"span"},{"href":"#id-96","text":"Claim G.6, ","element":"a"},{"href":"#id-97","text":"Claim G.5, ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-98","text":"Claim G.7 ","element":"a"},{"text":"for power method, matrix inverse, and lowest eigenvectors, we conclude that our total running time of ","element":"span"},{"href":"#id-32","text":"FastCubicMin ","element":"a"},{"text":"is at most ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":20.41},"width":534.18,"height":51.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-0.png","element":"img","alt":"O (Tinverse(κL2, ˜ε)), where ˜O","inline":true,"padRight":true},{"text":"contains factors polylogarithmic in ","element":"span"},{"style":{"height":15.6},"width":251.15,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-1.png","element":"img","alt":" κ, L, L2, B, d.","inline":true}],[{"text":"By putting together our choice of ˜","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-2.png","element":"img","alt":"ε","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-32","text":"Line 2 ","element":"a"},{"text":"as well as the running time of either accelerated gradient descent or accelerated SVRG from ","element":"span"},{"href":"#id-93","text":"Theorem 2.4 ","element":"a"},{"text":"into formula ","element":"span"},{"style":{"height":17.6},"width":169.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-3.png","element":"img","alt":" O(κL2, ˜ε","inline":true},{"text":"), we finish the proof of the running time part for ","element":"span"},{"href":"#id-49","text":"Main Lemma 3.","element":"a"}],[{"id":"id-91","style":{"fontWeight":"bold"},"text":"H ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-99","style":{"fontWeight":"bold"},"text":"G.1","element":"a"}],[{"href":"#id-99","style":{"height":26},"width":1668.23,"height":65,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-4.png","element":"img","alt":"Lemma G.1. If λ2+λmin(H) ≥ c1 ∈ (0, 1) then BinarySearch ends in O�log( (λ1−λ2)Bc1·L·˜ε )�","inline":true},{"style":{"fontStyle":"italic"},"text":"iterations.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of ","element":"span"},{"href":"#id-99","style":{"fontStyle":"italic"},"text":"Lemma G.1. ","element":"a"},{"text":"We first note that in all iterations of ","element":"span"},{"href":"#id-32","text":"BinarySearch ","element":"a"},{"text":"it always satisfies","element":"span"}],[{"style":{"width":"80%"},"width":1502,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-5.png","element":"img"}],[{"text":"This is true at the beginning. In each of the follow-up iterations, if we have set ","element":"span"},{"style":{"height":15.24},"width":340.68,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-6.png","element":"img","alt":" λ1 ← λmid then it","inline":true,"padRight":true},{"text":"must satisfy ","element":"span"},{"style":{"height":17.6},"width":359.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-7.png","element":"img","alt":" L∥v∥ + L˜ε ≤ 2λmid","inline":true,"padRight":true},{"text":"but this implies ","element":"span"},{"style":{"height":19.14},"width":530.95,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-8.png","element":"img","alt":" L∥(H + λmidI)−1g∥ ≤ 2λmid","inline":true,"padRight":true},{"text":"according to triangle inequality and ","element":"span"},{"style":{"height":19.14},"width":525.5,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-9.png","element":"img","alt":" ∥v + (H + λmidI)−1g∥ ≤ L˜ε","inline":true},{"text":"; similarly, if we have set ","element":"span"},{"style":{"height":15.24},"width":194.42,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-10.png","element":"img","alt":" λ2 ← λmid","inline":true,"padRight":true},{"text":"then it must satisfy ","element":"span"},{"style":{"height":19.14},"width":546.62,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-11.png","element":"img","alt":"L∥(H + λmidI)−1g∥ ≥ 2λmid.","inline":true}],[{"style":{"width":"99%"},"width":1870,"height":175,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-12.png","element":"img"}],[{"text":"and therefore","element":"span"}],[{"style":{"width":"91%"},"width":1720,"height":217,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-13.png","element":"img"}],[{"text":"Now, we notice that ","element":"span"},{"style":{"height":23.07},"width":872.98,"height":57.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-14.png","element":"img","alt":" ∥(H + λ2I)−1∥ ≤ 1c1 and λ1 ≤ 2B because λ2","inline":true,"padRight":true},{"text":"only increases and ","element":"span"},{"style":{"height":16.4},"width":141.62,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-15.png","element":"img","alt":" λ1 only","inline":true,"padRight":true},{"text":"decreases through the execution of the algorithm. Therefore by the choice of ˆ","element":"span"},{"style":{"height":21.6},"width":298.31,"height":54.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-16.png","element":"img","alt":"ε = ˜εc140B, we get","inline":true}],[{"style":{"width":"33%"},"width":632,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-17.png","element":"img"}],[{"text":"A completely analogous argument also shows that","element":"span"}],[{"style":{"width":"97%"},"width":1830,"height":331,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-18.png","element":"img"}],[{"text":"which means ","element":"span"},{"href":"#id-32","text":"BinarySearch ","element":"a"},{"text":"will stop in this iteration. In sum, we have concluded that there will be no more than ","element":"span"},{"style":{"height":26},"width":334.32,"height":64.99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/24-19.png","element":"img","alt":" O�log( (λ1−λ2)Bc1·L·˜ε )�","inline":true},{"text":"iterations.","element":"span"}]]},{"heading":"Acknowledgements","paragraphs":[[{"text":"We thank Ben Recht for helpful suggestions and corrections to a previous version.","element":"span"}]]},{"heading":"References","paragraphs":[[{"id":"id-14","text":"[1] Naman Agarwal, Brian Bullins, and Elad Hazan. Second order stochastic optimization for ","element":"span"},{"text":"machine learning in linear time. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1602.03943","element":"span"},{"text":", 2016.","element":"span"}],[{"id":"id-23","text":"[2] Zeyuan Allen-Zhu and Elad Hazan. Variance Reduction for Faster Non-Convex Optimization. ","element":"span"},{"text":"In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ICML","element":"span"},{"text":", 2016.","element":"span"}],[{"id":"id-37","text":"[3] Zeyuan Allen-Zhu and Yuanzhi Li. Even Faster SVD Decomposition Yet Without Agonizing ","element":"span"},{"text":"Pain. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"NIPS","element":"span"},{"text":", 2016.","element":"span"}],[{"id":"id-33","text":"[4] Zeyuan Allen-Zhu and Yang Yuan. Improved SVRG for Non-Strongly-Convex or Sum-of-Non- ","element":"span"},{"text":"Convex Objectives. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ICML","element":"span"},{"text":", 2016.","element":"span"}],[{"id":"id-3","text":"[5] Afonso S Bandeira, Nicolas Boumal, and Vladislav Voroninski. On the low-rank approach ","element":"span"},{"text":"for semidefinite programs arising in synchronization and community detection. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1602.04426","element":"span"},{"text":", 2016.","element":"span"}],[{"id":"id-4","text":"[6] S. Bhojanapalli, B. Neyshabur, and N. Srebro. Global Optimality of Local Search for Low ","element":"span"},{"text":"Rank Matrix Recovery. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ArXiv e-prints","element":"span"},{"text":", May 2016.","element":"span"}],[{"id":"id-26","text":"[7] Yair Carmon, John C. Duchi, Oliver Hinder, and Aaron Sidford. Accelerated methods for ","element":"span"},{"text":"non-convex optimization. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint 1611.00756","element":"span"},{"text":", 2016.","element":"span"}],[{"id":"id-27","text":"[8] Coralia Cartis, Nicholas IM Gould, and Philippe L Toint. Adaptive cubic regularisation meth- ","element":"span"},{"text":"ods for unconstrained optimization. part i: motivation, convergence and numerical results. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematical Programming","element":"span"},{"text":", 127(2):245–295, 2011.","element":"span"}],[{"id":"id-28","text":"[9] Coralia Cartis, Nicholas IM Gould, and Philippe L Toint. Adaptive cubic regularisation meth- ","element":"span"},{"text":"ods for unconstrained optimization. part ii: worst-case function-and derivative-evaluation complexity. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematical Programming","element":"span"},{"text":", 130(2):295–319, 2011.","element":"span"}],[{"id":"id-0","text":"[10] Anna Choromanska, Mikael Henaff, Michael Mathieu, G´erard Ben Arous, and Yann LeCun. ","element":"span"},{"text":"The loss surfaces of multilayer networks. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"AISTATS","element":"span"},{"text":", 2015.","element":"span"}],[{"id":"id-1","text":"[11] Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and ","element":"span"},{"text":"Yoshua Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in neural information processing systems","element":"span"},{"text":", pages 2933–2941, 2014.","element":"span"}],[{"id":"id-21","text":"[12] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning ","element":"span"},{"text":"and stochastic optimization. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Journal of Machine Learning Research","element":"span"},{"text":", 12:2121–2159, 2011.","element":"span"}],[{"id":"id-30","text":"[13] Dan Garber and Elad Hazan. Fast and simple PCA via convex optimization. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ArXiv e-prints","element":"span"},{"text":", September 2015.","element":"span"}],[{"id":"id-31","text":"[14] Dan Garber, Elad Hazan, Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, ","element":"span"},{"text":"and Aaron Sidford. Robust shift-and-invert preconditioning: Faster and more sample efficient algorithms for eigenvector computation. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ICML","element":"span"},{"text":", 2016.","element":"span"}],[{"id":"id-10","text":"[15] Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. ","element":"span"},{"text":"Escaping from saddle points—online stochastic gradient for tensor decomposition. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv:1503.02101","element":"span"},{"text":", 2015.","element":"span"}],[{"text":"[16] Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. ","element":"span"},{"text":"Escaping from saddle points—online stochastic gradient for tensor decomposition. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 28th Annual Conference on Learning Theory","element":"span"},{"text":", COLT 2015, 2015.","element":"span"}],[{"id":"id-24","text":"[17] Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. ","element":"span"},{"text":"Escaping from saddle points - online stochastic gradient for tensor decomposition. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015","element":"span"},{"text":", pages 797–842, 2015.","element":"span"}],[{"id":"id-5","text":"[18] Rong Ge, Jason Lee, and Tengyu Ma. Matrix Completion has No Spurious Local Minimum. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ArXiv e-prints","element":"span"},{"text":", May 2016.","element":"span"}],[{"id":"id-6","text":"[19] Rong Ge and Tengyu Ma. On the optimization landscape of tensor decompositions, 2016.","element":"span"}],[{"text":"[20] Saeed Ghadimi and Guanghui Lan. Accelerated gradient methods for nonconvex nonlinear ","element":"span"},{"text":"and stochastic programming. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematical Programming","element":"span"},{"text":", pages 1–26, feb 2015.","element":"span"}],[{"id":"id-2","text":"[21] I. J. Goodfellow, O. Vinyals, and A. M. Saxe. Qualitatively characterizing neural network ","element":"span"},{"text":"optimization problems. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ArXiv e-prints","element":"span"},{"text":", December 2014.","element":"span"}],[{"id":"id-25","text":"[22] Elad Hazan and Tomer Koren. A linear-time algorithm for trust region problems. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematical Programming","element":"span"},{"text":", pages 1–19, 2015.","element":"span"}],[{"id":"id-7","text":"[23] Christopher J. Hillar and Lek-Heng Lim. Most tensor problems are np-hard. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J. ACM","element":"span"},{"text":", 60(6):45, 2013.","element":"span"}],[{"id":"id-11","text":"[24] Jason D. Lee, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. Gradient descent only ","element":"span"},{"text":"converges to minimizers. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, USA, June 23-26, 2016","element":"span"},{"text":", pages 1246–1257, 2016.","element":"span"}],[{"id":"id-8","text":"[25] Katta G Murty and Santosh N Kabadi. Some np-complete problems in quadratic and nonlinear ","element":"span"},{"text":"programming. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematical programming","element":"span"},{"text":", 39(2):117–129, 1987.","element":"span"}],[{"id":"id-20","text":"[26] Yurii Nesterov. A method of solving a convex programming problem with convergence rate ","element":"span"},{"style":{"height":19.14},"width":1416.79,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1611.01146/images/26-0.png","element":"img","alt":"O(1/k2). In Doklady AN SSSR (translated as Soviet Mathematics Doklady)","inline":true},{"text":", volume 269, pages 543–547, 1983.","element":"span"}],[{"id":"id-18","text":"[27] Yurii Nesterov. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Introductory Lectures on Convex Programming Volume: A Basic course","element":"span"},{"text":", volume I. Kluwer Academic Publishers, 2004.","element":"span"}],[{"id":"id-9","text":"[28] Yurii Nesterov and Boris T Polyak. Cubic regularization of newton method and its global ","element":"span"},{"text":"performance. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematical Programming","element":"span"},{"text":", 108(1):177–205, 2006.","element":"span"}],[{"id":"id-15","text":"[29] Barak A Pearlmutter. Fast exact multiplication by the hessian. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Neural computation","element":"span"},{"text":", 6(1):147– 160, 1994.","element":"span"}],[{"id":"id-19","text":"[30] Herbert Robbins and Sutton Monro. ","element":"span"},{"text":"A stochastic approximation method. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The annals of mathematical statistics","element":"span"},{"text":", pages 400–407, 1951.","element":"span"}],[{"id":"id-57","text":"[31] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by ","element":"span"},{"text":"back-propagating errors. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Cognitive modeling","element":"span"},{"text":", 5(3):1, 1988.","element":"span"}],[{"id":"id-22","text":"[32] Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing finite sums with the stochastic ","element":"span"},{"text":"average gradient. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1309.2388","element":"span"},{"text":", pages 1–45, 2013. ","element":"span"},{"text":"Preliminary version appeared in NIPS 2012.","element":"span"}],[{"id":"id-34","text":"[33] Shai Shalev-Shwartz. SDCA without Duality, Regularization, and Individual Convexity. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"ICML","element":"span"},{"text":", 2016.","element":"span"}],[{"id":"id-29","text":"[34] Jonathan Richard Shewchuk. An introduction to the conjugate gradient method without the ","element":"span"},{"text":"agonizing pain, 1994.","element":"span"}],[{"id":"id-56","text":"[35] Paul Werbos. ","element":"span"},{"text":"Beyond regression: New tools for prediction and analysis in the behavioral sciences. 1974.","element":"span"}]]}],"_version":"3.3.4"},"paperNode":"$28:props:children:props:children:0:props:product"}]]