28:["$","$L31",null,{"isWhiteLabelled":false,"children":["$","$Lc",null,{"pt":{"compact":0,"expanded":3},"children":[["$","$L32",null,{"noStar":true,"publisher":true,"task":true,"params":true,"size":"xl","product":{"id":"eyJwYXBlcklEIjoiMjAwMi4xMjQ5MyIsInB1Ymxpc2hlciI6ImFyeGl2In0=","publisher":"arxiv","updated":"2021-04-12T07:03:56.000Z","paperID":"2002.12493","published":"2020-02-28T00:32:47.000Z","authors":"[\"Michael Muehlebach\",\"Michael I. Jordan\"]","title":"Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives","scoreTrending":null,"summary":"We analyze the convergence rate of various momentum-based optimization\nalgorithms from a dynamical systems point of view. Our analysis exploits\nfundamental topological properties, such as the continuous dependence of\niterates on their initial conditions, to provide a simple characterization of\nconvergence rates. In many cases, closed-form expressions are obtained that\nrelate algorithm parameters to the convergence rate. The analysis encompasses\ndiscrete time and continuous time, as well as time-invariant and time-variant\nformulations, and is not limited to a convex or Euclidean setting. In addition,\nthe article rigorously establishes why symplectic discretization schemes are\nimportant for momentum-based optimization algorithms, and provides a\ncharacterization of algorithms that exhibit accelerated convergence.","lastCheckedForCode":"2022-09-04T23:03:16.964Z","links":[{"id":"eyJ1cmwiOiJodHRwczovL3BhcGVyc3dpdGhjb2RlLmNvbS9wYXBlci9vcHRpbWl6YXRpb24td2l0aC1tb21lbnR1bS1keW5hbWljYWwtY29udHJvbCJ9","type":"pwc","url":"https://paperswithcode.com/paper/optimization-with-momentum-dynamical-control","data":null}],"reposConnection":{"edges":[]},"models":[],"tags":[],"summaries":[],"emailsConnection":{"edges":[]},"__typename":"paper","authorArray":["Michael Muehlebach","Michael I. Jordan"]}}],["$","$L25",null,{"container":true,"columns":100,"spacing":{"compact":0,"expanded":2,"large":3},"children":[["$","$L25",null,{"size":{"compact":100,"expanded":100,"large":68},"children":[["$","$8",null,{"children":["$","$L33",null,{"publisher":"arxiv","paperID":"2002.12493","product":{"paper":"$28:props:children:props:children:0:props:product","models":"$28:props:children:props:children:0:props:product:models"},"isWhiteLabelled":false}]}],["$","$8",null,{"children":["$","$L34",null,{"article":"$L35","model":"$undefined"}]}]]}],["$","$L25",null,{"size":"grow","children":["$","$L36",null,{}]}]]}],["$","$8",null,{"children":null}],[["$","audio",null,{"id":"tts"}],["$","$L37",null,{"paperID":"2002.12493","publisher":"arxiv","paperJSON":{"title":"Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives","paperID":"2002.12493","avgLineHeight":13.55,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"We analyze the convergence rate of various momentum-based optimization algorithms from a dynamical systems point of view. Our analysis exploits fundamental topological properties, such as the continuous dependence of iterates on their initial conditions, to provide a simple characterization of convergence rates. In many cases, closed-form expressions are obtained that relate algorithm parameters to the convergence rate. The analysis encompasses discrete time and continuous time, as well as time-invariant and time-variant formulations, and is not limited to a convex or Euclidean setting. In addition, the article rigorously establishes why symplectic discretization schemes are important for momentum-based optimization algorithms, and provides a characterization of algorithms that exhibit accelerated convergence.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Keywords: ","element":"span"},{"text":"Gradient-based optimization, convergence rate analysis, Nesterov acceleration, symplectic integration, nonconvex optimization","element":"span"}]]},{"heading":"1. Introduction","paragraphs":[[{"text":"Optimization problems lie at the heart of many machine-learning formulations. As a result, a better understanding of optimization algorithms, combined with implementations that target distinctive properties of machine-learning problems, have contributed significantly to the recent progress in the field.","element":"span"}],[{"text":"One of the most popular methods for large-scale optimization is the (stochastic) gradient method, due to its simplicity, wide applicability, and efficiency. However, in the deterministic setting, where gradients are evaluated exactly, it has been shown that better convergence rates can often be achieved by leveraging two successive gradients ","element":"span"},{"href":"#id-0","referenceIndex":29,"text":"(Nesterov, ","element":"a"},{"href":"#id-0","referenceIndex":29,"text":"1983)","element":"a"},{"text":". Based on analogies to mechanical systems these methods are referred to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"momentum-based optimization algorithms","element":"span"},{"text":", and can be viewed as particular discretizations of continuous-time harmonic oscillators, ","element":"span"},{"href":"#id-1","referenceIndex":32,"text":"(Polyak, ","element":"a"},{"href":"#id-1","referenceIndex":32,"text":"1964)","element":"a"},{"text":". Yet, even for the class of strongly convex functions, most proofs that establish the superior convergence are algebraic and provide little qualitative understanding. As a consequence, there is little guide to the generality or robustness of the acceleration phenomenon across instances of optimization problems.","element":"span"}],[{"text":"This article characterizes the convergence rate of momentum-based optimization algorithms by taking fundamental topological properties into account. In many cases our analysis leads to closed-","element":"span"}],[{"style":{"width":"68%"},"width":1176,"height":86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/0-0.png","element":"img"}],[{"text":"form expressions that relate the convergence rate to the different algorithm parameters, such as the step size and the damping. The analysis provides insight into the design of algorithms that, for example, require little tuning when applied to large-scale and ill-conditioned optimization problems. For simplicity of notation, we focus on the Euclidean setting, but we note that the scope of our analysis is not limited to Euclidean problems.","element":"span"}],[{"text":"We will derive convergence rates of the form","element":"span"}],[{"id":"id-2","style":{"width":"67%"},"width":1159,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-0.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":189.67,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-1.png","element":"img","alt":" |x(t) − x∗|","inline":true,"padRight":true},{"text":"is a distance measure between the iterate ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"text":"and the isolated local minimum ","element":"span"},{"style":{"height":15.13},"width":54.87,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-2.png","element":"img","alt":" x∗,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"refers to time or the iteration number, ","element":"span"},{"style":{"height":15.02},"width":45.19,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-3.png","element":"img","alt":" Cc","inline":true,"padRight":true},{"text":"is a constant which does not depend on ","element":"span"},{"style":{"height":17.6},"width":302.05,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-4.png","element":"img","alt":" x(0), and ρc(t) is","inline":true,"padRight":true},{"text":"monotonically decreasing, satisfies ","element":"span"},{"style":{"height":17.6},"width":174.69,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-5.png","element":"img","alt":" ρc(0) = 1","inline":true},{"text":", and characterizes the convergence rate. An important aspect of the analysis is to characterize how the convergence rate ","element":"span"},{"style":{"height":12},"width":36.56,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-6.png","element":"img","alt":" ρc","inline":true,"padRight":true},{"text":"is affected by the shape of the objective function about an isolated local minimum. The local shape is summarized by the (condition) number ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-7.png","element":"img","alt":" κ","inline":true},{"text":", which is defined as the ratio between the maximum and minimum curvature about a local minimum. We call an isolated minimum degenerate if the minimum curvature vanishes in at least one direction. The main results and insights, which will be rigorously derived in the remainder of the article, are summarized as follows:","element":"span"}],[{"text":"• The convergence rate, characterized by ","element":"span"},{"style":{"height":12},"width":36.56,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-8.png","element":"img","alt":" ρc","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-2","text":"(1)","element":"a"},{"text":", is uniquely determined by the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"local ","element":"span"},{"text":"shape of the objective function about an isolated minimum. The global shape of the objective function determines the constant ","element":"span"},{"style":{"height":15.02},"width":45.19,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-9.png","element":"img","alt":" Cc","inline":true,"padRight":true},{"text":"and the set of initial conditions ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":"(0) ","element":"span"},{"text":"for which ","element":"span"},{"href":"#id-2","text":"(1) ","element":"a"},{"text":"holds. In other words, the global shape (for example described by convexity) determines the stability and region of attraction of a local minimum, whereas the local shape, i.e., the curvature at an isolated minimum, determines the convergence rate.","element":"span"}],[{"text":"• An algorithm is called accelerated if the convergence rate scales favorably for ill-conditioned optimization problems, meaning that the convergence rate scales favorably for large ","element":"span"},{"style":{"height":12.4},"width":122.14,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-10.png","element":"img","alt":" κ. Ac-","inline":true,"padRight":true},{"text":"celeration is therefore a statement about the robustness of the convergence rate with respect to changes in the curvature, which is well defined for continuous-time as well as for discrete-time formulations.","element":"span"}],[{"text":"• Accelerated convergence is generic to momentum-based optimization algorithms, provided that the damping scales with ","element":"span"},{"style":{"height":17.77},"width":310.62,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-11.png","element":"img","alt":" 1/√κ for large κ","inline":true},{"text":". Neither the evaluation of the gradient at a shifted position, nor a specifically engineered damping parameter, as for example proposed in ","element":"span"},{"href":"#id-3","referenceIndex":31,"text":"Nesterov ","element":"a"},{"href":"#id-3","referenceIndex":31,"text":"(2004, ","element":"a"},{"text":"Sec. 2.2), are necessary.","element":"span"}],[{"text":"• From a physics perspective, a momentum-based optimization algorithm can be thought of as a mass-spring-damper system, where the spring potential is given by the objective function. The algorithm design specifies the damping. The fact that the system has inertia (due to the second-order dynamics) implies that the convergence rate is robust to small changes in the spring, i.e., robust to small changes in the curvature of the objective function. The inertia gives the system the tendency to keep its velocity, which, provided that the damping is chosen appropriately, implies that small changes in the spring will not slow down convergence. This captures the mechanism behind accelerated convergence.","element":"span"}],[{"text":"• The convergence rate typically becomes arbitrarily small for large ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-12.png","element":"img","alt":" κ","inline":true,"padRight":true},{"text":"(if the dynamics are time-varying this statement applies for ","element":"span"},{"style":{"height":11.6},"width":135.41,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/1-13.png","element":"img","alt":" t → ∞","inline":true},{"text":"). The underlying dynamics are therefore close to","element":"span"}],[{"text":"conservative, which means that great care is required for the discretization. A discretization that introduces an artificial energy drift, such as the explicit forward Euler discretization, for example, might even lead to instability, which clearly makes a favorable scaling of the convergence rate with ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/2-0.png","element":"img","alt":" κ","inline":true,"padRight":true},{"text":"impossible. This motivates the use of symplectic discretization schemes, which preserve the Hamiltonian structure of the underlying conservative part of the dynamics.","element":"span"}],[{"text":"• By introducing time-varying dynamics, which are obtained by adjusting the damping parameters with the number of iterations, the linear convergence rate is improved with an additional sublinearly converging term. If the local minimum is close to degenerate, the convergence is dominated by this term, as the rate of the linear convergence becomes arbitrarily small. In that way, the analysis relates the non-degenerate and degenerate cases.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Related work: ","element":"span"},{"text":"The phenomenon of acceleration, which fundamentally motivates the use of momentum, has puzzled many researchers for almost 40 years. We do not attempt to give a full overview of the literature, but highlight some of the most recent work. Important contributions were made by ","element":"span"},{"href":"#id-4","referenceIndex":7,"text":"Bubeck et al. ","element":"a"},{"href":"#id-4","referenceIndex":7,"text":"(2015)","element":"a"},{"text":", who proposed an accelerated algorithm that has a geometric interpretation, by ","element":"span"},{"href":"#id-5","referenceIndex":2,"text":"Allen-Zhu and Orecchia ","element":"a"},{"href":"#id-5","referenceIndex":2,"text":"(2014)","element":"a"},{"text":", who show that coupling of gradient and mirror descent can lead to acceleration, and ","element":"span"},{"href":"#id-6","referenceIndex":24,"text":"Lessard et al. ","element":"a"},{"href":"#id-6","referenceIndex":24,"text":"(2016)","element":"a"},{"text":", who propose a general control-theoretic analysis framework. The framework has subsequently been extended and refined, for example by ","element":"span"},{"href":"#id-7","referenceIndex":26,"text":"Michalowsky ","element":"a"},{"href":"#id-7","referenceIndex":26,"text":"et al. ","element":"a"},{"href":"#id-7","referenceIndex":26,"text":"(2019)","element":"a"},{"text":", who analyzed and quantified robustness and convergence trade-offs. Other work includes ","element":"span"},{"href":"#id-8","referenceIndex":10,"text":"Diakonikolas and Orecchia ","element":"a"},{"href":"#id-8","referenceIndex":10,"text":"(2018)","element":"a"},{"text":", who unify the analysis of first-order methods by imposing certain decay conditions, and ","element":"span"},{"href":"#id-9","referenceIndex":37,"text":"Scieur et al. ","element":"a"},{"href":"#id-9","referenceIndex":37,"text":"(2017)","element":"a"},{"text":", who interpret the accelerated gradient method as a multi-step discretization of gradient flow.","element":"span"}],[{"text":"The work of ","element":"span"},{"href":"#id-10","referenceIndex":38,"text":"Su et al. ","element":"a"},{"href":"#id-10","referenceIndex":38,"text":"(2016) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-11","referenceIndex":22,"text":"Krichene et al. ","element":"a"},{"href":"#id-11","referenceIndex":22,"text":"(2015) ","element":"a"},{"text":"showed that the trajectories of the accelerated gradient method approach the solutions of a certain second-order ordinary differential equation. The resulting differential equation was analyzed in further detail in ","element":"span"},{"href":"#id-12","referenceIndex":4,"text":"Attouch et al. ","element":"a"},{"href":"#id-12","referenceIndex":4,"text":"(2018) ","element":"a"},{"text":"and placed within a variational framework by ","element":"span"},{"href":"#id-13","referenceIndex":40,"text":"Wibisono et al. ","element":"a"},{"href":"#id-13","referenceIndex":40,"text":"(2016)","element":"a"},{"text":". This motivated further research on structure-preserving integration schemes for discretizing continuous-time optimization algorithms ","element":"span"},{"href":"#id-14","referenceIndex":6,"text":"(Betancourt et al., ","element":"a"},{"href":"#id-14","referenceIndex":6,"text":"2018)","element":"a"},{"text":". While the continuous-time formulation of ","element":"span"},{"href":"#id-10","referenceIndex":38,"text":"Su et al. ","element":"a"},{"href":"#id-10","referenceIndex":38,"text":"(2016) ","element":"a"},{"text":"is based on an accelerated optimization algorithm for smooth and convex objective functions (i.e., the case where the minimum could be degenerate), an alternative for strongly convex objective functions (non-degenerate case) was proposed in ","element":"span"},{"href":"#id-15","referenceIndex":12,"text":"D¨urr and Ebenbauer ","element":"a"},{"href":"#id-15","referenceIndex":12,"text":"(2012)","element":"a"},{"text":". In ","element":"span"},{"href":"#id-16","referenceIndex":14,"text":"Franc¸a et al. ","element":"a"},{"href":"#id-16","referenceIndex":14,"text":"(2019) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-17","referenceIndex":28,"text":"Muehlebach and Jordan ","element":"a"},{"href":"#id-17","referenceIndex":28,"text":"(2019)","element":"a"},{"text":", important geometric properties of the underlying dynamics are highlighted and corresponding structure-preserving discretization schemes are analyzed. In addition, ","element":"span"},{"href":"#id-16","referenceIndex":14,"text":"Franc¸a et al. ","element":"a"},{"href":"#id-16","referenceIndex":14,"text":"(2019) ","element":"a"},{"text":"propose relativistic dynamics, as these naturally bound the rate of change of the position by the speed of light, which is supposed to prevent overshoot. The work of ","element":"span"},{"href":"#id-18","referenceIndex":25,"text":"Maddison ","element":"a"},{"href":"#id-18","referenceIndex":25,"text":"et al. ","element":"a"},{"href":"#id-18","referenceIndex":25,"text":"(2018) ","element":"a"},{"text":"suggests that convergence can be improved by a suitable choice of the kinetic energy (for example choosing the kinetic energy to be the convex conjugate of the objective function).","element":"span"}],[{"text":"$38","element":"span"}],[{"text":"In the context of nonconvex optimization, the aim is typically to find a local minimum of a twice continuously differentiable function that satisfies certain non-degeneracy conditions. It has been shown that gradient descent converges to a local minimum from almost every initial condition, which is due to the fact that local maxima and saddle points are unstable equilibria ","element":"span"},{"href":"#id-19","referenceIndex":23,"text":"(Lee et al., ","element":"a"},{"href":"#id-19","referenceIndex":23,"text":"2016)","element":"a"},{"text":". The same reasoning applies to gradient descent with momentum. However, even though gradient descent and gradient descent with momentum are guaranteed (in an almost everywhere sense) to ultimately reach a local minimum, it may take an arbitrarily long time to escape saddle points and local maxima. A major concern arises due to the fact that an objective function with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"decision variables might have ","element":"span"},{"style":{"height":12},"width":94.3,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/3-0.png","element":"img","alt":" n−1","inline":true,"padRight":true},{"text":"different isolated saddle points, which have to be traversed before reaching a local minimum. This implies that even for generic initialization strategies, the worst-case convergence rate depends on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"href":"#id-20","referenceIndex":11,"text":"(Du et al., ","element":"a"},{"href":"#id-20","referenceIndex":11,"text":"2017)","element":"a"},{"text":". This article is concerned with characterizing the convergence of momentum-based algorithms up to a constant factor. We will not analyze how this factor scales with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"; the example from ","element":"span"},{"href":"#id-20","referenceIndex":11,"text":"Du et al. ","element":"a"},{"href":"#id-20","referenceIndex":11,"text":"(2017) ","element":"a"},{"text":"suggests that the factor scales exponentially in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". However, the results from ","element":"span"},{"href":"#id-21","referenceIndex":16,"text":"Ge et al. ","element":"a"},{"href":"#id-21","referenceIndex":16,"text":"(2015) ","element":"a"},{"text":"indicate that adding small random perturbations to gradient descent can improve the convergence rate and reduce the dimension-dependence of the convergence rate on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"to polylog factors. A similar result applies likewise to gradient descent with momentum ","element":"span"},{"href":"#id-22","referenceIndex":20,"text":"(Jin et al., ","element":"a"},{"href":"#id-22","referenceIndex":20,"text":"2017)","element":"a"},{"text":". A recent account of the state-of-the art is given in ","element":"span"},{"href":"#id-23","referenceIndex":21,"text":"Jin et al. ","element":"a"},{"href":"#id-23","referenceIndex":21,"text":"(2019)","element":"a"},{"text":", for example.","element":"span"}],[{"text":"The poor scaling in the dimension ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"in the nonconvex case can be avoided by explicitly leveraging curvature information, as for example proposed in ","element":"span"},{"href":"#id-24","referenceIndex":30,"text":"Nesterov and Polyak ","element":"a"},{"href":"#id-24","referenceIndex":30,"text":"(2006) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-25","referenceIndex":9,"text":"Curtis ","element":"a"},{"href":"#id-25","referenceIndex":9,"text":"et al. ","element":"a"},{"href":"#id-25","referenceIndex":9,"text":"(2017)","element":"a"},{"text":". However, the computational cost per iteration of these methods is generally higher and increases with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". For simplicity, this article focuses on first-order algorithms with momentum, even though many ideas generalize in straightforward ways.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Outline: ","element":"span"},{"text":"This article views optimization algorithms as dynamical systems. Two cases are distinguished: Section ","element":"span"},{"text":"2 ","element":"span"},{"text":"assumes that the algorithm parameters are fixed, leading to time-invariant dynamics. The more general case, where the parameters are allowed to vary with the number of iterations, is discussed in Section ","element":"span"},{"text":"3.","element":"span"}],[{"text":"Section ","element":"span"},{"text":"2 ","element":"span"},{"text":"starts by introducing the notation and defining the scope of the analysis. A momentum-based optimization algorithm is understood as a second-order dynamical system,","element":"span"},{"text":"1 ","element":"span"},{"text":"where the local minima of the objective function correspond to asymptotically stable equilibria. Section ","element":"span"},{"href":"#id-26","text":"2.2 ","element":"a"},{"text":"and Section ","element":"span"},{"href":"#id-27","text":"2.3 ","element":"a"},{"text":"introduce two prototypical examples of momentum-based optimization algorithms. These are generalized versions of Nesterov’s acclerated gradient scheme ","element":"span"},{"href":"#id-3","referenceIndex":31,"text":"(Nesterov, ","element":"a"},{"href":"#id-3","referenceIndex":31,"text":"2004, ","element":"a"},{"text":"Ch. 2.2), and also include Heavy-Ball methods ","element":"span"},{"href":"#id-1","referenceIndex":32,"text":"(Polyak, ","element":"a"},{"href":"#id-1","referenceIndex":32,"text":"1964)","element":"a"},{"text":". By focusing on smooth dynamical systems based on smooth objective functions, our analysis excludes, for example, the treatment of constraints via (non-differentiable) indicator functions, as well as optimization algorithms with restart schemes. The subsequent result derived in Section ","element":"span"},{"href":"#id-28","text":"2.4 ","element":"a"},{"text":"highlights our assertion that fundamental topological properties can be exploited for characterizing the convergence rate of momentum-based optimization algorithms. The results are applied to the analysis of the two prototypical examples in Section ","element":"span"},{"href":"#id-29","text":"2.6 ","element":"a"},{"text":"and Section ","element":"span"},{"href":"#id-30","text":"2.7, ","element":"a"},{"text":"leading to a broad characterization of the phenomenon of acceleration. In addition, Section ","element":"span"},{"href":"#id-30","text":"2.7 ","element":"a"},{"text":"rigorously motivates the use of symplectic discretization schemes in the context of optimization: The symplectic discretization enables the computation of a modified energy function that can be used for stability analysis.","element":"span"}],[{"text":"The structure of Section ","element":"span"},{"text":"3 ","element":"span"},{"text":"is analogous to Section ","element":"span"},{"text":"2. ","element":"span"},{"text":"The general result for characterizing the convergence rate in the time-varying case is presented in Section ","element":"span"},{"href":"#id-31","text":"3.1 ","element":"a"},{"text":"and illustrated with two subsequent examples in Section ","element":"span"},{"href":"#id-32","text":"3.2 ","element":"a"},{"text":"and Section ","element":"span"},{"href":"#id-33","text":"3.3. ","element":"a"},{"text":"The results highlight the fact that time-varying damping parameters can speed up the convergence rate by an additional sublinearly converging factor. For ill-conditioned problems, this additional gain in the convergence rate becomes significant. The analysis connects the non-degenerate case with the degenerate case and motivates the update rules for the parameters of well-known accelerated gradient schemes, such as ","element":"span"},{"href":"#id-3","referenceIndex":31,"text":"Nesterov ","element":"a"},{"href":"#id-3","referenceIndex":31,"text":"(2004, ","element":"a"},{"text":"p. 90, Constant Step Scheme II).","element":"span"}],[{"text":"The article concludes with a summary and final remarks in Section ","element":"span"},{"text":"4.","element":"span"}]]},{"heading":"2. The Time-Invariant Case","paragraphs":[[{"style":{"fontWeight":"bold"},"text":"2.1 Introduction","element":"span"}],[{"text":"Throughout the article we consider the problem of minimizing the function ","element":"span"},{"style":{"height":16.4},"width":371.55,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/4-0.png","element":"img","alt":" f : Rn → R, which","inline":true,"padRight":true},{"id":"id-34","text":"satisfies the following assumption:","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Assumption 1 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"fontStyle":"italic"},"text":"has a Lipschitz-continuous gradient. The critical points are non-degenerate and isolated.","element":"span"},{"text":"1","element":"span"}],[{"text":"The Lipschitz continuity of the gradient is important for ensuring that the resulting continuous-time trajectories exist and are unique. The fact that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"has isolated non-degenerate critical points excludes pathological cases where the function has multiple connected local minima. The less pathological case, where the local minima are isolated but have vanishing curvature in certain directions, can be obtained via a limit argument, as will be discussed in Section ","element":"span"},{"text":"3.","element":"span"}],[{"text":"Due to the Lipschitz continuity of the gradient, the Hessian exists almost everywhere and is essentially bounded. We will summarize the (essential) upper and lower bounds on the Hessian with the two constants ","element":"span"},{"style":{"height":17.5},"width":344.42,"height":43.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/4-1.png","element":"img","alt":" Cf ≥ 0 and ¯Cf ≥ 0:","inline":true}],[{"id":"id-35","style":{"width":"79%"},"width":1378,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/4-2.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":59.61,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/4-3.png","element":"img","alt":" | · |","inline":true,"padRight":true},{"text":"denotes the Euclidean norm. Thus for ","element":"span"},{"style":{"height":15.08},"width":133.88,"height":37.71,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/4-4.png","element":"img","alt":" Cf = 0","inline":true,"padRight":true},{"text":"the function is convex, for ","element":"span"},{"style":{"height":15.08},"width":214.25,"height":37.71,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/4-5.png","element":"img","alt":" Cf > 0 it is","inline":true,"padRight":true},{"text":"nonconvex. In order to simplify our exposition, we will consider functions that satisfy:","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Assumption 2 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"In addition to Assumption ","element":"span"},{"href":"#id-34","style":{"fontStyle":"italic"},"text":"1, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"fontStyle":"italic"},"text":"has a second derivative that is Lipschitz continuous.","element":"span"}],[{"text":"None of the subsequent results will explicitly depend on a Lipschitz constant related to the second derivative. Hence, under very mild conditions, all our results characterizing the convergence rate ","element":"span"},{"style":{"height":17.6},"width":88.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-0.png","element":"img","alt":"ρc(t)","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-2","text":"(1) ","element":"a"},{"text":"apply when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"satisfies Assumption ","element":"span"},{"href":"#id-34","text":"1 ","element":"a"},{"text":"instead of Assumption ","element":"span"},{"href":"#id-35","text":"2.","element":"a"},{"text":"1 ","element":"span"},{"text":"This is further discussed in Appendix ","element":"span"},{"text":"F.","element":"span"}],[{"text":"Without loss of generality, we further assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"has a local minimum at ","element":"span"},{"style":{"height":12.8},"width":289.91,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-1.png","element":"img","alt":" x∗ = 0 and that","inline":true},{"style":{"height":17.6},"width":324.05,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-2.png","element":"img","alt":"f(x∗) = f(0) = 0","inline":true},{"text":". The convergence rate of a momentum-based optimization algorithm will depend on the local shape of the local minimum in question, which is determined by the constants","element":"span"}],[{"id":"id-58","style":{"width":"90%"},"width":1569,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-3.png","element":"img"}],[{"text":"In case Assumption ","element":"span"},{"href":"#id-35","text":"2 ","element":"a"},{"text":"is replaced with Assumption ","element":"span"},{"href":"#id-34","text":"1, ","element":"a"},{"text":"the above constants are defined via the essential supremum and essential infimum of the Hessian in a neighborhood of ","element":"span"},{"style":{"height":12.73},"width":54.87,"height":31.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-4.png","element":"img","alt":" x∗.","inline":true}],[{"text":"We model a momentum-based optimization algorithm either as a continuous-time or discrete-time dynamical system of the form","element":"span"}],[{"id":"id-38","style":{"width":"79%"},"width":1380,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-5.png","element":"img"}],[{"text":"where the superscript ","element":"span"},{"text":"+ ","element":"span"},{"text":"denotes either differentiation with respect to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"(continuous-time setting), in which case ","element":"span"},{"style":{"height":17.02},"width":164.19,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-6.png","element":"img","alt":" I = R≥0","inline":true},{"text":", or a unit time-shift (i.e., ","element":"span"},{"style":{"height":18.73},"width":315.14,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-7.png","element":"img","alt":" q+(t) = q(t + 1)","inline":true},{"text":", discrete-time setting), in which case ","element":"span"},{"style":{"fontStyle":"italic"},"text":"I ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . ","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"text":". The nonnegative real numbers are denoted by ","element":"span"},{"style":{"height":17.02},"width":74.86,"height":42.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-8.png","element":"img","alt":" R≥0","inline":true},{"text":", whereas the positive real numbers are denoted by ","element":"span"},{"style":{"height":15.42},"width":74.86,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-9.png","element":"img","alt":" R>0","inline":true},{"text":". The dynamics","element":"span"}],[{"id":"id-36","style":{"width":"72%"},"width":1258,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-10.png","element":"img"}],[{"text":"are implicitly dependent on ","element":"span"},{"style":{"height":16.4},"width":61.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-11.png","element":"img","alt":" ∇f","inline":true,"padRight":true},{"text":"and are assumed to satisfy the following assumption.","element":"span"},{"text":"2","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Assumption 3 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The dynamics ","element":"span"},{"style":{"height":17.82},"width":169.33,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-12.png","element":"img","alt":" gq and gp","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are continuously differentiable in both arguments and the derivatives are Lipschitz continuous.","element":"span"}],[{"text":"In the continuous-time case, Assumption ","element":"span"},{"href":"#id-36","text":"3 ","element":"a"},{"text":"implies that the resulting trajectories exist and are unique for all times ","element":"span"},{"style":{"height":12.8},"width":103,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-13.png","element":"img","alt":" t ∈ I","inline":true,"padRight":true},{"href":"#id-37","referenceIndex":3,"text":"(Arnol’d, ","element":"a"},{"href":"#id-37","referenceIndex":3,"text":"1992, ","element":"a"},{"text":"p. 93, Corollary 3). Differentiability also implies that the dynamics can be linearized about an equilibrium, which typically provides a means to study the local behavior of the resulting trajectories. In addition, Assumption ","element":"span"},{"href":"#id-36","text":"3 ","element":"a"},{"text":"implies that, over a finite time interval, trajectories are continuously dependent on their initial conditions ","element":"span"},{"href":"#id-37","referenceIndex":3,"text":"(Arnol’d, ","element":"a"},{"href":"#id-37","referenceIndex":3,"text":"1992, ","element":"a"},{"text":"p. 93, Corollary 4). These topological properties will be exploited in the following.","element":"span"}],[{"text":"In order to simplify notation, we define","element":"span"}],[{"id":"id-42","style":{"width":"66%"},"width":1145,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-14.png","element":"img"}],[{"text":"and introduce ","element":"span"},{"style":{"height":17.6},"width":609.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-15.png","element":"img","alt":" z(t) := (q(t), p(t)) for all t ∈ I","inline":true},{"text":". Moreover, the map ","element":"span"},{"style":{"height":17.6},"width":460.38,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-16.png","element":"img","alt":" (q0, p0) → (q(t), p(t)) is","inline":true,"padRight":true},{"text":"denoted by ","element":"span"},{"style":{"height":18.73},"width":401.92,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/5-17.png","element":"img","alt":" ϕt : R2n → R2n, t ∈ I","inline":true},{"text":", and is referred to as the flow of the dynamical system ","element":"span"},{"href":"#id-36","text":"(4) ","element":"a"},{"text":"- ","element":"span"},{"href":"#id-38","text":"(5)","element":"a"},{"text":". Next, we provide a formal definition of a momentum-based optimization algorithm.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 1 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"We call the dynamical system ","element":"span"},{"href":"#id-36","text":"(4) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"- ","element":"span"},{"href":"#id-38","text":"(5) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"satisfying Assumption ","element":"span"},{"href":"#id-36","style":{"fontStyle":"italic"},"text":"3 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"a ","element":"span"},{"text":"momentum-based optimization algorithm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"for the function ","element":"span"},{"style":{"height":16.4},"width":209.42,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-0.png","element":"img","alt":" f if x∗ = 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is an asymptotically stable equilibrium in the sense of Lyapunov.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"This means that ","element":"span"},{"style":{"height":17.6},"width":463.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-1.png","element":"img","alt":" ϕt(0) = 0, for all t ∈ I, ϕt","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is continuous at ","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":", uniformly in ","element":"span"},{"style":{"height":17.6},"width":421.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-2.png","element":"img","alt":" t, and limt→∞ ϕt(z0) =","inline":true},{"style":{"height":15.6},"width":205.77,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-3.png","element":"img","alt":"0 for any z0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"in a neighborhood of the origin.","element":"span"}],[{"text":"Even though the dynamics ","element":"span"},{"style":{"height":17.82},"width":169.84,"height":44.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-4.png","element":"img","alt":" gq and gp","inline":true,"padRight":true},{"text":"can capture gradient flow, as a special case (e.g., ","element":"span"},{"style":{"height":17.02},"width":147.25,"height":42.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-5.png","element":"img","alt":" gp = 0,","inline":true},{"style":{"height":18.62},"width":248.3,"height":46.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-6.png","element":"img","alt":"gq = −∇f(q)","inline":true,"padRight":true},{"text":"in continuous time), we are interested in analyzing momentum methods, which arise from a nontrivial choice of ","element":"span"},{"style":{"height":17.82},"width":173.32,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-7.png","element":"img","alt":" gq and gp.","inline":true}],[{"text":"Throughout the article we will illustrate our ideas with two examples, which are prototypical versions of momentum-based optimization algorithms.","element":"span"}],[{"id":"id-26","style":{"fontWeight":"bold"},"text":"2.2 Example 1","element":"span"}],[{"text":"The first example is based on the following continuous-time dynamics","element":"span"}],[{"id":"id-39","style":{"width":"84%"},"width":1468,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-8.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":16.4},"width":61.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-9.png","element":"img","alt":" ∇f","inline":true,"padRight":true},{"text":"denotes the gradient of ","element":"span"},{"style":{"height":16.4},"width":151.98,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-10.png","element":"img","alt":" f and fd","inline":true,"padRight":true},{"text":"the dissipative forces. These dynamics can be viewed as a mass-spring-damper system, where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"represents the spring potential and ","element":"span"},{"style":{"height":16.4},"width":37.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-11.png","element":"img","alt":" fd","inline":true,"padRight":true},{"text":"the damping. The dissipative forces are assumed to take the form","element":"span"}],[{"id":"id-40","style":{"width":"72%"},"width":1258,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-12.png","element":"img"}],[{"text":"where the parameters ","element":"span"},{"style":{"height":16.4},"width":246.03,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-13.png","element":"img","alt":" d > 0, β ≥ 0","inline":true,"padRight":true},{"text":"are constant. Ideally, the parameters ","element":"span"},{"style":{"height":16.4},"width":138.07,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-14.png","element":"img","alt":" d and β","inline":true,"padRight":true},{"text":"are designed to take into account additional information about the the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"; for example, upper and lower bounds on the curvature ","element":"span"},{"style":{"height":19.13},"width":251.49,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-15.png","element":"img","alt":" d2f/dx2. If β","inline":true,"padRight":true},{"text":"is chosen to be zero, the dissipative forces reduce to ","element":"span"},{"style":{"height":16},"width":111.42,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-16.png","element":"img","alt":" −2dp.","inline":true,"padRight":true},{"text":"In that case, the dynamics ","element":"span"},{"href":"#id-39","text":"(8) ","element":"a"},{"text":"describe a continuous-time heavy ball method ","element":"span"},{"href":"#id-1","referenceIndex":32,"text":"(Polyak, ","element":"a"},{"href":"#id-1","referenceIndex":32,"text":"1964)","element":"a"},{"text":". In case ","element":"span"},{"style":{"height":16.4},"width":113.72,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-17.png","element":"img","alt":" β > 0","inline":true},{"text":", the dynamics are related to Nesterov’s accelerated gradient method ","element":"span"},{"href":"#id-17","referenceIndex":28,"text":"(Muehlebach and ","element":"a"},{"href":"#id-17","referenceIndex":28,"text":"Jordan, ","element":"a"},{"href":"#id-17","referenceIndex":28,"text":"2019)","element":"a"},{"text":".","element":"span"}],[{"text":"An intuitive interpretation of the dissipative forces ","element":"span"},{"href":"#id-40","text":"(9) ","element":"a"},{"text":"can be given in the following way: For ","element":"span"},{"style":{"height":16.4},"width":107.18,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-18.png","element":"img","alt":"β = 0","inline":true},{"text":", ","element":"span"},{"href":"#id-40","text":"(9) ","element":"a"},{"text":"describes linear isotropic damping. For ","element":"span"},{"style":{"height":16.4},"width":107.17,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-19.png","element":"img","alt":" β > 0","inline":true},{"text":", ","element":"span"},{"href":"#id-40","text":"(9) ","element":"a"},{"text":"can be rewritten as","element":"span"}],[{"style":{"width":"64%"},"width":1112,"height":119,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-20.png","element":"img"}],[{"text":"which implies that ","element":"span"},{"href":"#id-40","text":"(9) ","element":"a"},{"text":"includes an additional damping term that averages the local curvature in the interval between ","element":"span"},{"style":{"height":16.4},"width":241.55,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-21.png","element":"img","alt":" q and q + βp","inline":true},{"text":". As a result, the damping increases if the local curvature is large, and reduces if the local curvature is small. As the velocity ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"is larger, the interval over which the average is taken is increased. The two forms of damping, linear isotropic and curvature dependent, are balanced by the coefficients ","element":"span"},{"style":{"height":16.4},"width":145.52,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-22.png","element":"img","alt":" d and β.","inline":true}],[{"text":"The equilibria of ","element":"span"},{"href":"#id-39","text":"(8) ","element":"a"},{"text":"are given by the critical points of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":". Moreover, if a given critical point is a non-degenerate local minimum, the corresponding equilibrium is asymptotically stable. This follows by evaluating the total energy,","element":"span"}],[{"id":"id-53","style":{"width":"62%"},"width":1079,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/6-23.png","element":"img"}],[{"text":"along the trajectories of ","element":"span"},{"href":"#id-39","text":"(8)","element":"a"},{"text":",","element":"span"}],[{"id":"id-54","style":{"width":"88%"},"width":1526,"height":120,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-0.png","element":"img"}],[{"text":"which shows that the energy necessarily decreases in a neighborhood of the equilibrium. Combined with the fact that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"is locally positive definite about a non-degenerate local minimum, this implies stability in the sense of Lyapunov. Asymptotic stability of the non-degenerate local minimum can then be concluded from La Salle’s theorem (see, for example, ","element":"span"},{"href":"#id-41","referenceIndex":36,"text":"Sastry, ","element":"a"},{"href":"#id-41","referenceIndex":36,"text":"1999, ","element":"a"},{"text":"Ch. 5.4), which is based on examining the invariant sets satisfying ","element":"span"},{"text":"d","element":"span"},{"style":{"fontStyle":"italic"},"text":"H/","element":"span"},{"text":"d","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"= 0","element":"span"},{"text":". Thus, the dynamical system ","element":"span"},{"href":"#id-39","text":"(8) ","element":"a"},{"text":"is a momentum-based optimization algorithm for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"according to Definition ","element":"span"},{"href":"#id-42","text":"1.","element":"a"}],[{"text":"However, analyzing how the energy evolves along the trajectories of ","element":"span"},{"href":"#id-39","text":"(8) ","element":"a"},{"text":"also reveals global properties of the dynamics. For ","element":"span"},{"style":{"height":16.4},"width":253.53,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-1.png","element":"img","alt":" β = 0, d > 0","inline":true,"padRight":true},{"text":"it follows that the set of initial conditions that do not converge to a local minimum is a set of measure zero, given by the critical points of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"that are not local minima. The analysis holds without assuming convexity of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":". However, if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"happens to be convex and has a unique global minimum, then the corresponding equilibrium is globally asymptotically stable for any choice of parameters ","element":"span"},{"style":{"height":16.4},"width":240.09,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-2.png","element":"img","alt":" β ≥ 0, d > 0.","inline":true}],[{"text":"The same reasoning applies in case ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"satisfies Assumption ","element":"span"},{"href":"#id-34","text":"1 ","element":"a"},{"text":"instead of ","element":"span"},{"href":"#id-35","text":"2, ","element":"a"},{"text":"or when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"has degenerate local minima.","element":"span"}],[{"id":"id-27","style":{"fontWeight":"bold"},"text":"2.3 Example 2","element":"span"}],[{"text":"The second example is obtained by discretizing ","element":"span"},{"href":"#id-39","text":"(8) ","element":"a"},{"text":"in the following way:","element":"span"}],[{"id":"id-43","style":{"width":"96%"},"width":1663,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T > ","element":"span"},{"text":"0 ","element":"span"},{"text":"is the step size. The discretization consists of a forward Euler update of the momentum coordinates, and uses the newly computed momentum for the position update. As will be further discussed in the remainder of the article, the fact that the newly computed momentum coordinate is used for the position update makes the scheme symplectic for ","element":"span"},{"style":{"height":16.4},"width":128.06,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-4.png","element":"img","alt":" fd = 0","inline":true},{"text":". This means that the transformation ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"from ","element":"span"},{"style":{"height":17.6},"width":437.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-5.png","element":"img","alt":" (qk, pk) → (qk+1, pk+1)","inline":true,"padRight":true},{"text":"preserves the symplectic form (for ","element":"span"},{"style":{"height":16.4},"width":278,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-6.png","element":"img","alt":" fd = 0), which","inline":true,"padRight":true},{"text":"has important consequences. One of these consequences concerns the spectrum of the linearization of ","element":"span"},{"href":"#id-43","text":"(13)","element":"a"},{"text":". In case ","element":"span"},{"style":{"height":16.4},"width":125.17,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-7.png","element":"img","alt":" fd = 0","inline":true},{"text":", the corresponding eigenvalues are guaranteed to lie on the unit circle (for ","element":"span"},{"style":{"height":19.98},"width":214.05,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-8.png","element":"img","alt":"T ≤ 2/√L","inline":true},{"text":"). This is in sharp contrast to the standard explicit Euler discretization, which is not symplectic, and where the eigenvalues lie outside the unit circle even for arbitrarily small ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T > ","element":"span"},{"text":"0","element":"span"},{"text":". Indeed, we will exploit the fact that the map ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"is symplectic (for ","element":"span"},{"style":{"height":16.4},"width":119.96,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-9.png","element":"img","alt":" fd = 0","inline":true},{"text":") to construct a modified energy function, which will be used for a stability analysis that extends beyond the linearization. Additional background information on symplectic integration can be found in ","element":"span"},{"href":"#id-44","referenceIndex":35,"text":"Sanz-Serna ","element":"a"},{"href":"#id-44","referenceIndex":35,"text":"(1992) ","element":"a"},{"text":"or ","element":"span"},{"href":"#id-45","referenceIndex":17,"text":"Hairer et al. ","element":"a"},{"href":"#id-45","referenceIndex":17,"text":"(2002)","element":"a"},{"text":", for example.","element":"span"}],[{"text":"For ","element":"span"},{"style":{"height":16.4},"width":113.83,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-10.png","element":"img","alt":" β = 0","inline":true},{"text":", the resulting algorithm is referred to as gradient descent with momentum ","element":"span"},{"href":"#id-1","referenceIndex":32,"text":"(Polyak, ","element":"a"},{"href":"#id-1","referenceIndex":32,"text":"1964)","element":"a"},{"text":". For ","element":"span"},{"style":{"height":16.4},"width":111.89,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-11.png","element":"img","alt":" β > 0","inline":true},{"text":", Nesterov’s accelerated gradient scheme ","element":"span"},{"href":"#id-3","referenceIndex":31,"text":"(Nesterov, ","element":"a"},{"href":"#id-3","referenceIndex":31,"text":"2004, ","element":"a"},{"text":"Constant step scheme III, p. 81) is obtained by choosing the parameters as follows:","element":"span"}],[{"id":"id-46","style":{"width":"74%"},"width":1280,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-12.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":12.8},"width":144.86,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/7-13.png","element":"img","alt":" κ and L","inline":true,"padRight":true},{"text":"characterize the local shape of a local minimum, ","element":"span"},{"href":"#id-17","referenceIndex":28,"text":"(Muehlebach and Jordan, ","element":"a"},{"href":"#id-17","referenceIndex":28,"text":"2019)","element":"a"},{"text":". Moreover, as pointed out in ","element":"span"},{"href":"#id-17","referenceIndex":28,"text":"Muehlebach and Jordan ","element":"a"},{"href":"#id-17","referenceIndex":28,"text":"(2019)","element":"a"},{"text":", the “Constant step scheme II” algorithm of ","element":"span"},{"href":"#id-3","referenceIndex":31,"text":"Nesterov ","element":"a"},{"href":"#id-3","referenceIndex":31,"text":"(2004) ","element":"a"},{"text":"is obtained by a particular choice of time-varying coefficients ","element":"span"},{"style":{"height":16.4},"width":249.92,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-0.png","element":"img","alt":" β and d. The","inline":true,"padRight":true},{"text":"generalization to time-varying coefficients will be discussed in Section ","element":"span"},{"href":"#id-31","text":"3.1.","element":"a"}],[{"text":"The equilibria of ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"are again the stationary points of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":". In order to determine the stability of an isolated local minimum, we linearize the dynamics,","element":"span"}],[{"style":{"width":"89%"},"width":1551,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-1.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.13},"width":338.31,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-2.png","element":"img","alt":" He := df/dx2|x=0","inline":true,"padRight":true},{"text":"(without loss of generality we consider ","element":"span"},{"style":{"height":12.73},"width":124.88,"height":31.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-3.png","element":"img","alt":" x∗ = 0","inline":true},{"text":"). An eigenvalue analysis then reveals that the corresponding non-degenerate local minimum is asymptotically stable if","element":"span"}],[{"id":"id-47","style":{"width":"65%"},"width":1135,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-4.png","element":"img"}],[{"text":"holds for all eigenvalues ","element":"span"},{"style":{"height":18.73},"width":466.9,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-5.png","element":"img","alt":" h of He, i.e. µ ≤ h ≤ L.1 ","inline":true,"padRight":true},{"text":"Asymptotic stability of the linearized dynamics implies that the same equilibrium is asymptotically stable under the nonlinear dynamics ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"href":"#id-41","referenceIndex":36,"text":"(Sastry, ","element":"a"},{"href":"#id-41","referenceIndex":36,"text":"1999, ","element":"a"},{"text":"p. 215).","element":"span"}],[{"text":"For example, provided that the parameters ","element":"span"},{"style":{"height":16.4},"width":133.64,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-6.png","element":"img","alt":" d and β","inline":true,"padRight":true},{"text":"are chosen according to ","element":"span"},{"href":"#id-46","text":"(14)","element":"a"},{"text":", we obtain that the given equilibrium is asymptotically stable if","element":"span"}],[{"style":{"width":"62%"},"width":1079,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-7.png","element":"img"}],[{"text":"We thus conclude that ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"is an optimization algorithm for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"according to Definition ","element":"span"},{"href":"#id-42","text":"1 ","element":"a"},{"text":"provided that the constants ","element":"span"},{"style":{"height":16.4},"width":198.25,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-8.png","element":"img","alt":" d, β, and T","inline":true,"padRight":true},{"text":"are chosen such that ","element":"span"},{"href":"#id-47","text":"(16) ","element":"a"},{"text":"is satisfied.","element":"span"}],[{"text":"Unlike Example 1, our analysis for Example 2 is valid only in a neighborhood about the equilibrium, which corresponds to a local minimum. However, the specific structure of the discretization and the ideas from Example 1 can be exploited for obtaining a nonlinear analysis that is valid beyond a neighborhood of the equilibrium. This will be illustrated in Section ","element":"span"},{"href":"#id-30","text":"2.7.","element":"a"}],[{"id":"id-28","style":{"fontWeight":"bold"},"text":"2.4 Characterizing the convergence rate","element":"span"}],[{"text":"Guaranteeing mere convergence is often not enough, as we are primarily interested in how quickly the trajectories of the nonlinear dynamics ","element":"span"},{"href":"#id-36","text":"(4) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-38","text":"(5) ","element":"a"},{"text":"converge to a local minimum. In the following, we argue that in most cases, a linear analysis characterizes the convergence rate up to constants. The linear analysis typically reduces to the computation of eigenvalues.","element":"span"}],[{"text":"Our main proposition is based on the following assumption and definition.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Assumption 4 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let the linearized dynamics","element":"span"}],[{"id":"id-48","style":{"width":"50%"},"width":878,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-9.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"be such that there exists an estimate","element":"span"}],[{"style":{"width":"73%"},"width":1278,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-10.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":15.12},"width":317.57,"height":37.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/8-11.png","element":"img","alt":" Cl ≥ 1 and α > 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are constant.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 2 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The region of attraction of the equilibrium at the origin (of the nonlinear dynamics ","element":"span"},{"href":"#id-36","text":"(4)","element":"a"},{"style":{"fontStyle":"italic"},"text":") is defined as the set","element":"span"}],[{"style":{"width":"37%"},"width":647,"height":64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-0.png","element":"img"}],[{"id":"id-52","style":{"fontWeight":"bold"},"text":"Proposition 3 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let Assumption ","element":"span"},{"href":"#id-48","style":{"fontStyle":"italic"},"text":"4 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"be satisfied. Then, for any compact set ","element":"span"},{"style":{"height":13.6},"width":127.93,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-1.png","element":"img","alt":" A ⊂ R","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"there exists a finite constant ","element":"span"},{"style":{"height":19.21},"width":114.49,"height":48.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-2.png","element":"img","alt":"ˆC ≥ 1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that for all ","element":"span"},{"style":{"height":15.42},"width":125.55,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-3.png","element":"img","alt":" z0 ∈ A","inline":true}],[{"style":{"width":"69%"},"width":1194,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-4.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"is empty the statement is trivial. We therefore assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"is non-empty. Lemma ","element":"span"},{"href":"#id-49","text":"14 ","element":"a"},{"text":"(see Appendix ","element":"span"},{"text":"A) ","element":"span"},{"text":"implies that there exists an open ball ","element":"span"},{"style":{"height":15.24},"width":337.42,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-5.png","element":"img","alt":" Bδ of radius δ > 0","inline":true},{"text":", centered at the origin, such that any trajectory starting in ","element":"span"},{"style":{"height":14.84},"width":49.1,"height":37.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-6.png","element":"img","alt":" Bδ","inline":true,"padRight":true},{"text":"converges with rate ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-7.png","element":"img","alt":" α","inline":true},{"text":". We make the following claim. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Claim: ","element":"span"},{"text":"There exists a finite time ","element":"span"},{"style":{"height":14.8},"width":303.63,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-8.png","element":"img","alt":" Tm > 0, Tm ∈ I","inline":true,"padRight":true},{"text":"such that for all ","element":"span"},{"style":{"height":17.6},"width":522.31,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-9.png","element":"img","alt":" z0 ∈ A, ϕt(z0) ∈ Bδ, for all","inline":true},{"style":{"height":14.8},"width":250.75,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-10.png","element":"img","alt":"t ∈ I, t ≥ Tm.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"Proof of the claim: ","element":"span"},{"text":"The origin is a stable equilibrium. Hence there exists a constant ","element":"span"},{"style":{"height":13.2},"width":265.67,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-11.png","element":"img","alt":" ϵ > 0 such that","inline":true,"padRight":true},{"text":"all trajectories starting in ","element":"span"},{"style":{"height":14.62},"width":49.1,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-12.png","element":"img","alt":" Bϵ","inline":true},{"text":", the open ball of radius ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-13.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"centered at the origin, remain in ","element":"span"},{"style":{"height":15.24},"width":172.18,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-14.png","element":"img","alt":" Bδ for all","inline":true,"padRight":true},{"text":"times. In addition, each ","element":"span"},{"style":{"height":17.6},"width":887.85,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-15.png","element":"img","alt":" z0 ∈ A satisfies limt→∞ ϕt(z0) = 0, since A ⊂ R","inline":true},{"text":". Thus, for each ","element":"span"},{"style":{"height":15.42},"width":127.1,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-16.png","element":"img","alt":" z0 ∈ A","inline":true,"padRight":true},{"text":"there exists a time ","element":"span"},{"style":{"height":19.95},"width":599.08,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-17.png","element":"img","alt":" T(z0) such that ϕT(z0)(z0) < ϵ/2","inline":true},{"text":". The continuity assumptions on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"imply that ","element":"span"},{"style":{"height":14.75},"width":111.5,"height":36.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-18.png","element":"img","alt":"ϕT(z0)","inline":true,"padRight":true},{"text":"is continuous, which can be verified by the Gr¨onwall inequality. Therefore, for each ","element":"span"},{"style":{"height":15.42},"width":126.18,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-19.png","element":"img","alt":" z0 ∈ A","inline":true,"padRight":true},{"text":"there exists an open ball ","element":"span"},{"style":{"height":17.6},"width":108.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-20.png","element":"img","alt":" B(z0)","inline":true,"padRight":true},{"text":"centered about ","element":"span"},{"style":{"height":10.62},"width":37.29,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-21.png","element":"img","alt":" z0","inline":true,"padRight":true},{"text":"such that for all ","element":"span"},{"style":{"height":19.95},"width":528.49,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-22.png","element":"img","alt":" ¯z0 ∈ B(z0), ϕT(z0)(¯z0) < ϵ,","inline":true,"padRight":true},{"text":"which implies ","element":"span"},{"style":{"height":17.6},"width":529.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-23.png","element":"img","alt":" ϕt(¯z0) ∈ Bδ for all t ≥ T(z0)","inline":true},{"text":". The collection of all ","element":"span"},{"style":{"height":17.6},"width":254.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-24.png","element":"img","alt":" B(z0), z0 ∈ A","inline":true,"padRight":true},{"text":"is an open cover for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":". Due to the fact that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"is compact, there exists a finite sub-cover (according to the HeineBorel theorem), which we denote ","element":"span"},{"style":{"height":17.6},"width":585.3,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-25.png","element":"img","alt":" B(ˆzi), i = 1, 2, . . . , N, where N","inline":true,"padRight":true},{"text":"is finite. As a result, choosing ","element":"span"},{"style":{"height":19.95},"width":1270.64,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-26.png","element":"img","alt":"Tm := maxi∈{1,2,...,N} T(ˆzi) implies ϕt(z0) ∈ Bδ for all z0 ∈ A, t ≥ Tm","inline":true},{"text":", which proves the claim.","element":"span"}],[{"text":"Moreover, the continuity of ","element":"span"},{"style":{"height":16.4},"width":489.4,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-27.png","element":"img","alt":" ϕt for all t ∈ I, 0 ≤ t ≤ Tm","inline":true},{"text":", implies further that ","element":"span"},{"style":{"height":17.6},"width":109.47,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-28.png","element":"img","alt":" ϕt(A)","inline":true,"padRight":true},{"text":"is bounded for any ","element":"span"},{"style":{"height":14.8},"width":324.63,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-29.png","element":"img","alt":" t ∈ I, 0 ≤ t ≤ Tm","inline":true},{"text":". Combined with Lemma ","element":"span"},{"href":"#id-49","text":"14 ","element":"a"},{"text":"(see Appendix ","element":"span"},{"text":"A)","element":"span"},{"text":", this yields the following bound","element":"span"}],[{"id":"id-50","style":{"width":"77%"},"width":1343,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-30.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":18.81},"width":657.75,"height":47.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-31.png","element":"img","alt":" z0 ∈ A, where CA ≥ δ and ˜C ≥ 1","inline":true,"padRight":true},{"text":"are positive constants. We fix ","element":"span"},{"style":{"height":15.42},"width":139.64,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-32.png","element":"img","alt":" z0 ∈ A","inline":true},{"text":", consider the trajectory ","element":"span"},{"style":{"height":17.6},"width":371.05,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-33.png","element":"img","alt":" z(t) := ϕt(z0), t ∈ I","inline":true},{"text":", and apply the mean value theorem,","element":"span"}],[{"style":{"width":"75%"},"width":1301,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-34.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":70.82,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-35.png","element":"img","alt":" ξ(t)","inline":true,"padRight":true},{"text":"lies between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"text":"and the origin. Due to the fact that the dynamics are assumed to have Lipschitz-continuous derivatives, we obtain the following bound:","element":"span"}],[{"style":{"width":"73%"},"width":1270,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-36.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.52},"width":56.19,"height":43.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-37.png","element":"img","alt":"¯CA","inline":true,"padRight":true},{"text":"denotes a Lipschitz constant of ","element":"span"},{"style":{"height":17.6},"width":220.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-38.png","element":"img","alt":" ∂g/∂z on A","inline":true},{"text":". According to ","element":"span"},{"href":"#id-50","text":"(19)","element":"a"},{"text":", the trajectory ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"is integrable (in continuous time) and absolutely summable (in discrete time). We obtain, by virtue of Lemma ","element":"span"},{"href":"#id-51","text":"13 ","element":"a"},{"text":"(see Appendix ","element":"span"},{"text":"A)","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"81%"},"width":1410,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/9-39.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.02},"width":45.19,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-0.png","element":"img","alt":" Cz","inline":true,"padRight":true},{"text":"is constant. The constant ","element":"span"},{"style":{"height":15.02},"width":45.19,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-1.png","element":"img","alt":" Cz","inline":true,"padRight":true},{"text":"is related to an upper bound on the integral (in continuous time) or the sum (in discrete time) of ","element":"span"},{"style":{"height":17.6},"width":285.01,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-2.png","element":"img","alt":" |z(t)| over t ∈ I","inline":true},{"text":", which according to ","element":"span"},{"href":"#id-50","text":"(19)","element":"a"},{"text":", is guaranteed to be finite.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Remarks:","element":"span"}],[{"text":"• Proposition ","element":"span"},{"href":"#id-52","text":"3 ","element":"a"},{"text":"characterizes the convergence rate and states that the number of iterations required to obtain an ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-3.png","element":"img","alt":" ϵ","inline":true},{"text":"-accuracy approaches ","element":"span"},{"style":{"height":17.6},"width":405.95,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-4.png","element":"img","alt":" log(1/ϵ)/α for small ϵ","inline":true},{"text":". This does not provide a tight bound on the number of iterations required to achieve a certain accuracy, as the constant ","element":"span"},{"style":{"height":17.21},"width":34,"height":43.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-5.png","element":"img","alt":"ˆC","inline":true,"padRight":true},{"text":"might depend on ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-6.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"or grow rapidly with the size of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":". Nevertheless, it enables a qualitative and quantitative discussion of the convergence rate ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-7.png","element":"img","alt":" α","inline":true},{"text":". In particular Proposition ","element":"span"},{"href":"#id-52","text":"3 ","element":"a"},{"text":"highlights the fact that the convergence rate ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-8.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"is determined by the local properties of the dynamics, which depend on the local shape of the objective function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":".","element":"span"}],[{"text":"• The assumptions required for invoking Proposition ","element":"span"},{"href":"#id-52","text":"3 ","element":"a"},{"text":"are often straightforward to verify, as Assumption ","element":"span"},{"href":"#id-48","text":"4 ","element":"a"},{"text":"hinges on an eigenvalue analysis of ","element":"span"},{"style":{"height":17.6},"width":273.49,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-9.png","element":"img","alt":" ∂g/∂z at z = 0","inline":true},{"text":". Proposition ","element":"span"},{"href":"#id-52","text":"3 ","element":"a"},{"text":"can also be generalized to the case where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"satisfies Assumption ","element":"span"},{"href":"#id-34","text":"1 ","element":"a"},{"text":"instead of Assumption ","element":"span"},{"href":"#id-35","text":"2, ","element":"a"},{"text":"as shown in Appendix ","element":"span"},{"text":"F.","element":"span"}],[{"text":"• The proof highlights the following alternative statement of Proposition ","element":"span"},{"href":"#id-52","text":"3: ","element":"a"},{"text":"There exists a finite time ","element":"span"},{"style":{"height":14.62},"width":148,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-10.png","element":"img","alt":" Tm > 0","inline":true,"padRight":true},{"text":"after which all trajectories starting in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"are guaranteed to be within a small neighborhood of the origin. Within this neighborhood the convergence is exponential with rate ","element":"span"},{"style":{"height":8.4},"width":39.08,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-11.png","element":"img","alt":" α.","inline":true}],[{"text":"• Given the smoothness properties of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g","element":"span"},{"text":", the assumption that the linearized dynamics converge with rate ","element":"span"},{"style":{"height":12.4},"width":108.26,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-12.png","element":"img","alt":" α > 0","inline":true,"padRight":true},{"text":"(Assumption ","element":"span"},{"href":"#id-48","text":"4) ","element":"a"},{"text":"is necessary and sufficient for the convergence of the nonlinear dynamics with rate ","element":"span"},{"style":{"height":8.4},"width":28,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-13.png","element":"img","alt":" α","inline":true},{"text":", as can be shown with the arguments of Lemma ","element":"span"},{"href":"#id-49","text":"14 ","element":"a"},{"text":"in Appendix ","element":"span"},{"text":"A.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"2.5 Implications for optimization algorithms","element":"span"}],[{"text":"Proposition ","element":"span"},{"href":"#id-52","text":"3 ","element":"a"},{"text":"enables a characterization of the convergence rate based on a linear analysis of the dynamics about an equilibrium. In the following, we will use Proposition ","element":"span"},{"href":"#id-52","text":"3 ","element":"a"},{"text":"to discuss the convergence rate of the optimization algorithms given in Example 1 and Example 2. In particular, this provides conditions guaranteeing accelerated convergence.","element":"span"}],[{"id":"id-29","style":{"fontWeight":"bold"},"text":"2.6 Example 1","element":"span"}],[{"text":"The total energy of the dynamical system is given by ","element":"span"},{"href":"#id-53","text":"(11)","element":"a"},{"text":", and according to ","element":"span"},{"href":"#id-54","text":"(12)","element":"a"},{"text":", energy is dissipated along trajectories. As we will show in the following, the energy function can therefore be used to characterize the region of attraction of the equilibrium ","element":"span"},{"style":{"height":12.73},"width":130.07,"height":31.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-14.png","element":"img","alt":" z∗ = 0","inline":true},{"text":", whereby the topology of the level sets of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"will play an important role. This will be discussed next.","element":"span"}],[{"text":"By assumption, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"has a local non-degenerate minimum at ","element":"span"},{"style":{"height":12.73},"width":124.07,"height":31.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-15.png","element":"img","alt":" x∗ = 0","inline":true},{"text":", which implies that ","element":"span"},{"style":{"height":19.13},"width":189.65,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-16.png","element":"img","alt":" f−1([0, c])","inline":true,"padRight":true},{"text":"contains a compact, connected component containing the origin for sufficiently small ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c > ","element":"span"},{"text":"0","element":"span"},{"text":". The set ","element":"span"},{"style":{"height":19.13},"width":189.65,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-17.png","element":"img","alt":" f−1([0, c])","inline":true,"padRight":true},{"text":"describes all values ","element":"span"},{"style":{"height":17.6},"width":573.89,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-18.png","element":"img","alt":" x ∈ Rn such that 0 ≤ f(x) ≤ c","inline":true},{"text":". Morse theory ","element":"span"},{"href":"#id-55","referenceIndex":27,"text":"(Milnor, ","element":"a"},{"href":"#id-55","referenceIndex":27,"text":"1963) ","element":"a"},{"text":"concludes that the topology of the set ","element":"span"},{"style":{"height":19.13},"width":189.65,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-19.png","element":"img","alt":" f−1([0, c])","inline":true,"padRight":true},{"text":"is determined by the critical points of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":". Thus, the set ","element":"span"},{"style":{"height":19.13},"width":189.65,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/10-20.png","element":"img","alt":" f−1([0, c])","inline":true,"padRight":true},{"text":"includes a compact, connected component that contains the origin (and no other","element":"span"}],[{"style":{"width":"89%"},"width":1541,"height":776,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/11-0.png","element":"img"}],[{"id":"id-56","text":"Figure 1: The figure illustrates the set ","element":"figcaption","subtype":"caption"},{"style":{"height":19.14},"width":398.21,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/11-1.png","element":"img","alt":" f−1([0, c]), with c > 0","inline":true},{"text":", when the function ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"f ","element":"figcaption","subtype":"caption"},{"text":"is scalar. In this example, ","element":"figcaption","subtype":"caption"},{"style":{"height":19.13},"width":189.65,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/11-2.png","element":"img","alt":" f−1([0, c])","inline":true,"padRight":true},{"text":"contains two connected components ","element":"figcaption","subtype":"caption"},{"style":{"height":15.02},"width":182.46,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/11-3.png","element":"img","alt":" C1 and C2","inline":true},{"text":". The function ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"f ","element":"figcaption","subtype":"caption"},{"text":"has a local minimum at the origin, a saddle point at ","element":"figcaption","subtype":"caption"},{"style":{"height":12.8},"width":25,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/11-4.png","element":"img","alt":" ˆx","inline":true},{"text":", and a local maximum at ","element":"figcaption","subtype":"caption"},{"style":{"height":15.02},"width":184.17,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/11-5.png","element":"img","alt":" xmax. The","inline":true,"padRight":true},{"text":"origin is contained in the set ","element":"figcaption","subtype":"caption"},{"style":{"height":15.02},"width":48.19,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/11-6.png","element":"img","alt":" C1","inline":true},{"text":". Morse theory states that ","element":"figcaption","subtype":"caption"},{"style":{"height":15.02},"width":48.19,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/11-7.png","element":"img","alt":" C1","inline":true,"padRight":true},{"text":"is guaranteed to be compact provided that ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":282.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/11-8.png","element":"img","alt":" 0 < c < f(ˆx)","inline":true},{"text":". Due to the fact that ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"f ","element":"figcaption","subtype":"caption"},{"text":"is one-dimensional ","element":"figcaption","subtype":"caption"},{"style":{"height":15.02},"width":204.64,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/11-9.png","element":"img","alt":" C1 remains","inline":true,"padRight":true},{"text":"compact for ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":298.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/11-10.png","element":"img","alt":" 0 < c < f(xmax)","inline":true},{"text":"; this is, however, no longer true in higher dimensions.","element":"figcaption","subtype":"caption"}],[{"text":"critical point), as long as ","element":"span"},{"style":{"height":21.41},"width":463.4,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-0.png","element":"img","alt":" c < ˆf = f(ˆx), where ˆx","inline":true,"padRight":true},{"text":"is any other critical point. The situation is illustrated with an example in Figure ","element":"span"},{"href":"#id-56","text":"1.","element":"a"}],[{"text":"From the definition of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"we infer that the critical points of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"are all of the form ","element":"span"},{"style":{"height":17.6},"width":244.26,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-1.png","element":"img","alt":" (q∗, 0), where","inline":true},{"style":{"height":15.93},"width":38.04,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-2.png","element":"img","alt":"q∗ ","inline":true,"padRight":true},{"text":"corresponds to a critical point of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":". The above reasoning therefore also applies to the total energy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":", and concludes that the set ","element":"span"},{"style":{"height":19.13},"width":203.41,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-3.png","element":"img","alt":" H−1([0, c])","inline":true,"padRight":true},{"text":"includes a compact, connected component that contains the origin (and no other critical point), as long as ","element":"span"},{"style":{"height":21.01},"width":116.84,"height":52.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-4.png","element":"img","alt":" c < ˆf","inline":true},{"text":". This motivates the following definition, which will be used throughout the remainder of the article.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 4 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The set ","element":"span"},{"style":{"height":18.04},"width":53.84,"height":45.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-5.png","element":"img","alt":" Af","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is defined as the connected component of ","element":"span"},{"style":{"height":21.41},"width":215.44,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-6.png","element":"img","alt":" H−1([0, ˆf))","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"that contains the origin.","element":"span"}],[{"text":"Analyzing the rate of change of the energy along the trajectories of ","element":"span"},{"href":"#id-39","text":"(8) ","element":"a"},{"text":"(cf. ","element":"span"},{"href":"#id-54","text":"(12)","element":"a"},{"text":"),","element":"span"}],[{"id":"id-57","style":{"width":"83%"},"width":1444,"height":121,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-7.png","element":"img"}],[{"text":"reveals that by a suitable choice of the parameters ","element":"span"},{"style":{"height":16.4},"width":146.15,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-8.png","element":"img","alt":" d and β","inline":true},{"text":", the energy necessarily decays. In particular, this is the case for ","element":"span"},{"style":{"height":17.6},"width":680.7,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-9.png","element":"img","alt":" d > 0, 0 ≤ β < 2d/Cf, where Cf ≥ 0","inline":true,"padRight":true},{"text":"denotes a lower bound on the Hessian of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":", as defined in ","element":"span"},{"href":"#id-35","text":"(2)","element":"a"},{"text":". We therefore conclude as follows.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Proposition 5 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Provided that ","element":"span"},{"style":{"height":18.72},"width":520.63,"height":46.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-10.png","element":"img","alt":" d > 0 and 0 ≤ β ≤ 2d/Cf","inline":true},{"style":{"fontStyle":"italic"},"text":", the origin is an asymptotically stable equilibrium in the sense of Lyapunov. Its region of attraction contains the set ","element":"span"},{"style":{"height":18.04},"width":67.62,"height":45.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-11.png","element":"img","alt":" Af.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"We consider any initial condition ","element":"span"},{"style":{"height":18.44},"width":994.38,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-12.png","element":"img","alt":" z(0) ∈ Af and define H0 := H(z(0)). For d > 0 and","inline":true},{"style":{"height":17.6},"width":275.7,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-13.png","element":"img","alt":"0 ≤ β ≤ 2d/Cf","inline":true},{"text":", the energy necessarily decays along the trajectories of ","element":"span"},{"href":"#id-39","text":"(8)","element":"a"},{"text":", c.f. ","element":"span"},{"href":"#id-57","text":"(23)","element":"a"},{"text":". The trajectory ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") = (","element":"span"},{"style":{"fontStyle":"italic"},"text":"q","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":", p","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")) ","element":"span"},{"text":"is therefore confined to the connected component of ","element":"span"},{"style":{"height":19.14},"width":239.73,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-14.png","element":"img","alt":" H−1([0, H0])","inline":true,"padRight":true},{"text":"that contains the origin, which, according to the above discussion, is necessarily compact. This implies that the origin is stable. In addition, the energy strictly decreases except when ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") = 0","element":"span"},{"text":". Hence, according to La Salle’s theorem, (see, for example, ","element":"span"},{"href":"#id-41","referenceIndex":36,"text":"Sastry, ","element":"a"},{"href":"#id-41","referenceIndex":36,"text":"1999, ","element":"a"},{"text":"Ch. 5.4), ","element":"span"},{"style":{"fontStyle":"italic"},"text":"q","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"text":"necessarily converges to a critical point of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":", whereas ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"text":"converges to zero. The origin is the only critical point contained in ","element":"span"},{"style":{"height":18.04},"width":53.84,"height":45.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-15.png","element":"img","alt":"Af","inline":true},{"text":", which implies asymptotic stability of the origin.","element":"span"}],[{"text":"An eigenvalue analysis of the linearized dynamics (about the equilibrium) reveals that the eigenvalues are given by","element":"span"}],[{"id":"id-60","style":{"width":"66%"},"width":1149,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-16.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"is any eigenvalue of ","element":"span"},{"style":{"height":19.13},"width":233.68,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-17.png","element":"img","alt":" d2f/dx2|x=0","inline":true},{"text":". Thus, Proposition ","element":"span"},{"href":"#id-52","text":"3 ","element":"a"},{"text":"asserts that the convergence rate for all initial conditions in ","element":"span"},{"style":{"height":18.04},"width":53.84,"height":45.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-18.png","element":"img","alt":" Af","inline":true,"padRight":true},{"text":"is directly determined by the real part of the eigenvalues, provided that ","element":"span"},{"style":{"height":18.44},"width":483.02,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-19.png","element":"img","alt":" d > 0 and 0 ≤ β < 2d/Cf.","inline":true}],[{"text":"The eigenvalues ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"satisfy the upper and lower bounds ","element":"span"},{"style":{"height":17.6},"width":287.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-20.png","element":"img","alt":" 1/κ ≤ h/L ≤ 1","inline":true},{"text":", cf. ","element":"span"},{"href":"#id-58","text":"(3)","element":"a"},{"text":". An appropriate normalization of the constants ","element":"span"},{"style":{"height":16.4},"width":136.62,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-21.png","element":"img","alt":" d and β","inline":true,"padRight":true},{"text":"reduces the analysis to the case ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L ","element":"span"},{"text":"= 1","element":"span"},{"text":", which we consider in the following. Figure ","element":"span"},{"href":"#id-59","text":"2 ","element":"a"},{"text":"shows how the eigenvalues vary as a function of ","element":"span"},{"style":{"height":16.4},"width":178.9,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-22.png","element":"img","alt":" d, β and h","inline":true},{"text":", according to the formula ","element":"span"},{"href":"#id-60","text":"(24)","element":"a"},{"text":".","element":"span"}],[{"text":"We consider first the case ","element":"span"},{"style":{"height":16.4},"width":125.07,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-23.png","element":"img","alt":" β = 0","inline":true},{"text":": For small ","element":"span"},{"style":{"height":17.77},"width":449.16,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-24.png","element":"img","alt":" d—i.e., 0 < d ≤ 1/√κ","inline":true},{"text":"—the eigenvalues are complex conjugates, have real part ","element":"span"},{"style":{"height":12.8},"width":56.94,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-25.png","element":"img","alt":" −d","inline":true,"padRight":true},{"text":"(independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":"), and their imaginary part varies between ","element":"span"},{"style":{"height":20.8},"width":558.32,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-26.png","element":"img","alt":"�1/κ − d2 and√1 − d2 (as h","inline":true,"padRight":true},{"text":"changes). As ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"is increased above ","element":"span"},{"style":{"height":17.77},"width":105,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/12-27.png","element":"img","alt":" 1/√κ","inline":true,"padRight":true},{"text":"the eigenvalues can be","element":"span"}],[{"style":{"width":"97%"},"width":1683,"height":1376,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/13-0.png","element":"img"}],[{"id":"id-59","text":"Figure 2: This figure shows how the eigenvalues of the linearization change as a function of the ","element":"figcaption","subtype":"caption"},{"text":"damping parameters ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":136.86,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/13-1.png","element":"img","alt":" d and β","inline":true},{"text":", and as a function of the curvature at the equilibrium. The top left plot shows the behavior of the eigenvalues for ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":107.3,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/13-2.png","element":"img","alt":" β = 0","inline":true},{"text":", as the curvature ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h ","element":"figcaption","subtype":"caption"},{"text":"is varied from ","element":"figcaption","subtype":"caption"},{"text":"0 ","element":"figcaption","subtype":"caption"},{"text":"to ","element":"figcaption","subtype":"caption"},{"text":"1 ","element":"figcaption","subtype":"caption"},{"text":"(the different colors represent the different values of ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"d","element":"figcaption","subtype":"caption"},{"text":"). The top right plot shows the behavior of the eigenvalues for ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":141.4,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/13-3.png","element":"img","alt":" β = 0.2","inline":true,"padRight":true},{"text":"and the bottom right plot the behavior for ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":163.92,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/13-4.png","element":"img","alt":" β = 0.4","inline":true},{"text":", where the curvature ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h ","element":"figcaption","subtype":"caption"},{"text":"is varied from ","element":"figcaption","subtype":"caption"},{"text":"0 ","element":"figcaption","subtype":"caption"},{"text":"to ","element":"figcaption","subtype":"caption"},{"text":"1","element":"figcaption","subtype":"caption"},{"text":". The figure indicates that for ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":122.45,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/13-5.png","element":"img","alt":" β = 0","inline":true},{"text":", the eigenvalues are real for large values of ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"d ","element":"figcaption","subtype":"caption"},{"text":"and small values of ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h","element":"figcaption","subtype":"caption"},{"text":". As ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h ","element":"figcaption","subtype":"caption"},{"text":"is increased the eigenvalues may become complex conjugated, in which case a further increase in ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h ","element":"figcaption","subtype":"caption"},{"text":"affects only the imaginary part. The additional parameter ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":26,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/13-6.png","element":"img","alt":" β","inline":true,"padRight":true},{"text":"has the effect of reducing the real part for larger values of ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h","element":"figcaption","subtype":"caption"},{"text":". This increases the convergence rate for large values of ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h","element":"figcaption","subtype":"caption"},{"text":".","element":"figcaption","subtype":"caption"}],[{"text":"both real or complex conjugates, depending on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":". However, their worst-case real part is given by ","element":"span"},{"style":{"height":20.8},"width":306.42,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-0.png","element":"img","alt":"−d+�d2 − 1/κ","inline":true},{"text":", which rapidly increases for larger ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":". The worst-case convergence rate (for a fixed ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":") is given by the maximum real part as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"is varied between ","element":"span"},{"style":{"height":17.6},"width":182.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-1.png","element":"img","alt":" 1/κ and 1","inline":true},{"text":". Thus, in that sense, the optimal worst-case convergence rate is ","element":"span"},{"style":{"height":17.77},"width":374.88,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-2.png","element":"img","alt":" 1/√κ for d = 1/√κ.","inline":true}],[{"text":"Increasing ","element":"span"},{"style":{"height":16.4},"width":26,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-3.png","element":"img","alt":" β","inline":true,"padRight":true},{"text":"has the effect of increasing the convergence rate for larger values of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":", since the parameter ","element":"span"},{"style":{"height":16.4},"width":26,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-4.png","element":"img","alt":" β","inline":true,"padRight":true},{"text":"introduces additional damping as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"becomes large. Provided that ","element":"span"},{"style":{"height":16.4},"width":107.16,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-5.png","element":"img","alt":" β ≤ 1","inline":true},{"text":", the qualitative behavior of the eigenvalues remains the same: For small ","element":"span"},{"style":{"height":17.77},"width":695.24,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-6.png","element":"img","alt":" d, i.e., 0 < d ≤ 1/√κ − β/(2κ), the","inline":true,"padRight":true},{"text":"eigenvalues are complex conjugates with worst-case convergence rate ","element":"span"},{"style":{"height":17.6},"width":473.23,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-7.png","element":"img","alt":" −d − β/(2κ). As d is in-","inline":true,"padRight":true},{"text":"creased above ","element":"span"},{"style":{"height":17.77},"width":285.53,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-8.png","element":"img","alt":" 1/√κ + β/(2κ)","inline":true},{"text":", the eigenvalues can be both real or complex conjugated, depending on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":". Their worst-case real part is again achieved for ","element":"span"},{"style":{"height":17.6},"width":157.18,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-9.png","element":"img","alt":" h = 1/κ","inline":true,"padRight":true},{"text":"and is rapidly increasing for larger ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":". Again, the optimal worst-case convergence rate is ","element":"span"},{"style":{"height":17.77},"width":105,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-10.png","element":"img","alt":" 1/√κ","inline":true,"padRight":true},{"text":"obtained for ","element":"span"},{"style":{"height":17.77},"width":414.02,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-11.png","element":"img","alt":" d = −β/(2κ) + 1/√κ.","inline":true}],[{"text":"In continuous time, any desired convergence rate can be realized, by a simple reparametrization of time. A linear reparameterization, ","element":"span"},{"style":{"height":18.42},"width":423.07,"height":46.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-12.png","element":"img","alt":"ˆt = ctt, where ct > 0","inline":true,"padRight":true},{"text":"is constant, will simply scale the real parts and imaginary parts of the eigenvalues with ","element":"span"},{"style":{"height":10.62},"width":27.88,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-13.png","element":"img","alt":" ct","inline":true},{"text":". A general, nonlinear, but diffeomorphic transformation will lead to time-varying dynamics, which will be discussed in Section ","element":"span"},{"text":"3. ","element":"span"},{"text":"Thus, it might seem that any discussion of continuous-time convergence rates in the context of optimization is pointless. However, the analysis above tells us something different; it reveals how the convergence rate is affected by the condition number, which characterizes the shape of a local minimum. The analysis should be interpreted in the following way: provided that the time scale is fixed such that a convergence rate of ","element":"span"},{"style":{"height":17.93},"width":183.51,"height":44.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-14.png","element":"img","alt":" 1 (or 1s−1 ","inline":true,"padRight":true},{"text":"if the physical units are kept) is achieved for ","element":"span"},{"style":{"height":12},"width":111.57,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-15.png","element":"img","alt":" κ = 1","inline":true},{"text":", the analysis reveals how the convergence rate of any optimization algorithm of the type ","element":"span"},{"href":"#id-39","text":"(8) ","element":"a"},{"text":"deteriorates as ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-16.png","element":"img","alt":" κ","inline":true,"padRight":true},{"text":"increases.","element":"span"}],[{"text":"For the following analysis we introduce the notation ","element":"span"},{"style":{"height":12.22},"width":149.29,"height":30.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-17.png","element":"img","alt":" u1 ≻ u2","inline":true,"padRight":true},{"text":"if the function ","element":"span"},{"style":{"height":10.62},"width":41.98,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-18.png","element":"img","alt":" u1","inline":true,"padRight":true},{"text":"dominates the function ","element":"span"},{"style":{"height":10.62},"width":41.98,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-19.png","element":"img","alt":" u2","inline":true,"padRight":true},{"text":"for large arguments; that is, ","element":"span"},{"style":{"height":17.6},"width":995.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-20.png","element":"img","alt":" limκ→∞ u1(κ)/u2(κ) = ∞, where u1 and u2 are real-","inline":true,"padRight":true},{"text":"valued functions that are positive for large arguments. In the same way, the notation ","element":"span"},{"style":{"height":12.22},"width":164.22,"height":30.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-21.png","element":"img","alt":" u1 ≺ u2","inline":true,"padRight":true},{"text":"implies that the function ","element":"span"},{"style":{"height":15.02},"width":292.58,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-22.png","element":"img","alt":" u2 dominates u1","inline":true,"padRight":true},{"text":"for large arguments. We further use ","element":"span"},{"style":{"height":16.4},"width":319.45,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-23.png","element":"img","alt":" u1 ∼ u2 to imply","inline":true,"padRight":true},{"text":"that neither ","element":"span"},{"style":{"height":11.02},"width":165.87,"height":27.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-24.png","element":"img","alt":" u1 nor u2","inline":true,"padRight":true},{"text":"are dominant for large arguments; that is,","element":"span"}],[{"style":{"width":"69%"},"width":1204,"height":103,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-25.png","element":"img"}],[{"text":"We will analyze the performance of the algorithm given in Example 1 not only on a single function, but on a whole class of functions. In order to make the following statements precise, we will fix a compact set ","element":"span"},{"style":{"height":15.53},"width":173.82,"height":38.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-26.png","element":"img","alt":" A ⊂ R2n ","inline":true,"padRight":true},{"text":"that contains the origin and introduce the following class of functions (parametrized by the constants ","element":"span"},{"style":{"height":18.01},"width":417.05,"height":45.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-27.png","element":"img","alt":" ¯κ ≥ 1, Cf > 0, ¯Cf > 0).","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Definition 6 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"height":21.08},"width":141.62,"height":52.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-28.png","element":"img","alt":" F¯κ,Cf, ¯Cf","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denote the set of all functions such that each element ","element":"span"},{"style":{"height":21.08},"width":363.99,"height":52.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-29.png","element":"img","alt":" f ∈ F¯κ,Cf, ¯Cf satisfies","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"the following conditions: 1) ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"fontStyle":"italic"},"text":"has an isolated local minimum at the origin with condition number ","element":"span"},{"style":{"height":16.4},"width":221.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-30.png","element":"img","alt":"κ ≤ ¯κ, 2) f","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfies Assumption ","element":"span"},{"href":"#id-35","style":{"fontStyle":"italic"},"text":"2 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and the bounds in ","element":"span"},{"href":"#id-35","text":"(2)","element":"a"},{"style":{"fontStyle":"italic"},"text":", and 3) ","element":"span"},{"style":{"height":18.04},"width":546.15,"height":45.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/14-31.png","element":"img","alt":" Af ⊃ A, where Af is defined","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"according to Definition ","element":"span"},{"href":"#id-57","style":{"fontStyle":"italic"},"text":"4.","element":"a"}],[{"text":"These conditions are motivated as follows. The first prescribes the local geometry about the local minimum at the origin, which influences the convergence rate in significant ways. The second ensures smoothness of the gradient, and the third guarantees that there are no critical points in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"other than the origin.","element":"span"},{"text":"1","element":"span"}],[{"text":"Acceleration is obtained whenever the convergence rate (again, relative to the convergence rate achieved for ","element":"span"},{"style":{"height":12},"width":105.34,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-0.png","element":"img","alt":" κ = 1","inline":true},{"text":") scales with ","element":"span"},{"style":{"height":17.77},"width":105,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-1.png","element":"img","alt":" 1/√κ","inline":true,"padRight":true},{"text":"for large values of ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-2.png","element":"img","alt":" κ","inline":true},{"id":"id-61","text":". More precisely:","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Definition 7 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A momentum-based algorithm is accelerated if there exists constants ","element":"span"},{"style":{"height":15.02},"width":229.63,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-3.png","element":"img","alt":" κ0 ≥ 1 and","inline":true},{"style":{"height":14.22},"width":122.94,"height":35.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-4.png","element":"img","alt":"ca > 0","inline":true},{"style":{"fontStyle":"italic"},"text":", such that for any ","element":"span"},{"style":{"height":21.08},"width":450.78,"height":52.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-5.png","element":"img","alt":" κ ≥ κ0 and f ∈ Fκ,Cf, ¯Cf","inline":true},{"style":{"fontStyle":"italic"},"text":", the following bound holds (for some constant","element":"span"}],[{"style":{"width":"77%"},"width":1333,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-6.png","element":"img"}],[{"text":"We will now proceed to derive conditions on the parameters ","element":"span"},{"style":{"height":16.4},"width":144.44,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-7.png","element":"img","alt":" d and β","inline":true,"padRight":true},{"text":"of the algorithm given in Example 1 that guarantee accelerated convergence. We start with the case ","element":"span"},{"style":{"height":17.77},"width":407.99,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-8.png","element":"img","alt":" β = 0. For d ≻ 1/√κ","inline":true,"padRight":true},{"text":"it follows that the real part of the worst-case convergence rate scales with ","element":"span"},{"style":{"height":20.8},"width":379.07,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-9.png","element":"img","alt":" −d +�d2 − 1/κ ≈","inline":true},{"style":{"height":17.6},"width":181.22,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-10.png","element":"img","alt":"−1/(2dκ)","inline":true},{"text":", which makes acceleration impossible. For ","element":"span"},{"style":{"height":17.77},"width":186.23,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-11.png","element":"img","alt":" d ≺ 1/√κ","inline":true,"padRight":true},{"text":"the damping is too small; i.e., the real part of ","element":"span"},{"href":"#id-60","text":"(24) ","element":"a"},{"text":"scales worse than ","element":"span"},{"style":{"height":17.77},"width":105,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-12.png","element":"img","alt":" 1/√κ","inline":true},{"text":". Hence, acceleration is only achieved for ","element":"span"},{"style":{"height":17.77},"width":255.64,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-13.png","element":"img","alt":" d ∼ 1/√κ. A","inline":true,"padRight":true},{"text":"similar argument applies to the case ","element":"span"},{"style":{"height":16.4},"width":106.02,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-14.png","element":"img","alt":" 0 < β","inline":true,"padRight":true},{"text":"and yields ","element":"span"},{"style":{"height":17.77},"width":380.07,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-15.png","element":"img","alt":" d + β/(2κ) ∼ 1/√κ.","inline":true}],[{"id":"id-62","text":"The above analysis is summarized with the following proposition.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Proposition 8 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"In the nonconvex case (","element":"span"},{"style":{"height":17.92},"width":122.22,"height":44.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-16.png","element":"img","alt":"Cf > 0","inline":true},{"style":{"fontStyle":"italic"},"text":"), the algorithm given in Example 1 is accelerated for the set of parameters ","element":"span"},{"style":{"height":18.89},"width":693.3,"height":47.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-17.png","element":"img","alt":" d ∼ 1/√κ, 0 < d, and 0 ≤ β < 2d/Cf.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"In the convex case (","element":"span"},{"style":{"height":17.92},"width":139.1,"height":44.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-18.png","element":"img","alt":"Cf = 0","inline":true},{"style":{"fontStyle":"italic"},"text":"), the algorithm given in Example 1 is accelerated for the set of parameters ","element":"span"},{"style":{"height":17.77},"width":633.41,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-19.png","element":"img","alt":" d + β/(2κ) ∼ 1/√κ, 0 < d, 0 ≤ β.","inline":true}],[{"text":"We would like to emphasize that the bound in Definition ","element":"span"},{"href":"#id-61","text":"7 ","element":"a"},{"text":"is checked for each function ","element":"span"},{"style":{"height":16.4},"width":73.52,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-20.png","element":"img","alt":" f ∈","inline":true},{"style":{"height":19.33},"width":137.94,"height":48.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-21.png","element":"img","alt":"Fκ,Cf, ¯Cf","inline":true,"padRight":true},{"text":"individually, whereby the constant ","element":"span"},{"style":{"height":15.02},"width":49.19,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-22.png","element":"img","alt":" Ca","inline":true,"padRight":true},{"text":"may depend on the specific function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":". Given the (potentially large) compact set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":", Proposition ","element":"span"},{"href":"#id-62","text":"8 ","element":"a"},{"text":"therefore answers the following question: What are the algorithm parameters ensuring that for any ","element":"span"},{"style":{"height":19.33},"width":231.4,"height":48.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-23.png","element":"img","alt":" f ∈ Fκ,Cf, ¯Cf","inline":true,"padRight":true},{"text":"the time for reaching an ","element":"span"},{"style":{"height":12.4},"width":186.98,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-24.png","element":"img","alt":" ϵ accuracy","inline":true,"padRight":true},{"text":"scales with ","element":"span"},{"style":{"height":17.77},"width":190.77,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-25.png","element":"img","alt":"√κln(1/ϵ)","inline":true,"padRight":true},{"text":"up to constants, i.e., for small ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-26.png","element":"img","alt":" ϵ","inline":true},{"text":"? One important aspect of the analysis is that the convergence rate depends only on the local geometry of the objective function, as characterized by the constant ","element":"span"},{"style":{"height":8.4},"width":36.14,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-27.png","element":"img","alt":" κ.","inline":true}],[{"text":"The basic notion of local analysis, obtained from a derivation of eigenvalues, is well known (see, for example, ","element":"span"},{"href":"#id-63","referenceIndex":33,"text":"Polyak ","element":"a"},{"href":"#id-63","referenceIndex":33,"text":"(1987)","element":"a"},{"text":"). But Proposition ","element":"span"},{"href":"#id-62","text":"8 ","element":"a"},{"text":"goes beyond classical local analysis in that, by virtue of Proposition ","element":"span"},{"href":"#id-52","text":"3, ","element":"a"},{"text":"the local rate can be guaranteed for a large portion of the region of attraction of a given equilibrium. Key for this result is a nonlinear and global stability analysis. In the strongly convex setting, the results regarding the case ","element":"span"},{"style":{"height":16.4},"width":107.19,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-28.png","element":"img","alt":" β = 0","inline":true,"padRight":true},{"text":"have been derived in ","element":"span"},{"href":"#id-64","referenceIndex":15,"text":"Gadat et al. ","element":"a"},{"href":"#id-64","referenceIndex":15,"text":"(2018)","element":"a"},{"text":". Similar results regarding the case ","element":"span"},{"style":{"height":16.8},"width":107.19,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-29.png","element":"img","alt":" β ̸= 0","inline":true,"padRight":true},{"text":"can be found in ","element":"span"},{"href":"#id-12","referenceIndex":4,"text":"Attouch et al. ","element":"a"},{"href":"#id-12","referenceIndex":4,"text":"(2018)","element":"a"},{"text":".","element":"span"}],[{"text":"The bounds that we establish throughout the manuscript are restricted to initial conditions that are contained in a compact set within the region of attraction of a non-degenerate and isolated local minimum. This can be further motivated by considering the function shown in Figure ","element":"span"},{"href":"#id-56","text":"1. ","element":"a"},{"text":"Provided that the algorithm dissipates energy, we infer from the above discussion that the region of attraction of the origin (an open set) will comprise points in the state space that are arbitrarily close to the saddle ","element":"span"},{"style":{"height":17.6},"width":100.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-30.png","element":"img","alt":" (ˆx, 0)","inline":true,"padRight":true},{"text":"and the maximum ","element":"span"},{"style":{"height":17.6},"width":157.01,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-31.png","element":"img","alt":" (xmax, 0)","inline":true,"padRight":true},{"text":"(see Figure ","element":"span"},{"href":"#id-56","text":"1)","element":"a"},{"text":". Saddle points or maxima are equilibria, since any gradient-based algorithm cannot distinguish between them and minima. Hence, when initializing the algorithm close enough to ","element":"span"},{"style":{"height":17.6},"width":350.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/15-32.png","element":"img","alt":" (ˆx, 0) and (xmax, 0)","inline":true},{"text":", convergence of the algorithm can potentially take an arbitrarily long time (see ","element":"span"},{"href":"#id-20","referenceIndex":11,"text":"Du et al. ","element":"a"},{"href":"#id-20","referenceIndex":11,"text":"(2017) ","element":"a"},{"text":"for a formal analysis). In that sense, the restriction of initial conditions to a compact set within the region of attraction of a given equilibrium appears to be natural.","element":"span"}],[{"id":"id-30","style":{"fontWeight":"bold"},"text":"2.7 Example 2","element":"span"}],[{"text":"In discrete time, the stability analysis is not as straightforward, since the energy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"is in general not dissipated along trajectories. However, the specific structure of the discretization can be exploited for obtaining a nonlinear stability analysis that is valid beyond a neighborhood of a local minimum.","element":"span"}],[{"text":"As remarked in ","element":"span"},{"href":"#id-17","referenceIndex":28,"text":"Muehlebach and Jordan ","element":"a"},{"href":"#id-17","referenceIndex":28,"text":"(2019)","element":"a"},{"text":", the discretization ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"can be divided into two parts, an energy dissipation step and a symplectic Euler step ","element":"span"},{"href":"#id-45","referenceIndex":17,"text":"(Hairer et al., ","element":"a"},{"href":"#id-45","referenceIndex":17,"text":"2002, ","element":"a"},{"text":"p. 3),","element":"span"}],[{"id":"id-103","style":{"width":"95%"},"width":1648,"height":165,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-0.png","element":"img"}],[{"text":"with intermediate state ","element":"span"},{"style":{"height":17.6},"width":234.71,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-1.png","element":"img","alt":" ¯zk = (¯qk, ¯pk)","inline":true},{"text":". In order to simplify notation, we introduce the maps ","element":"span"},{"style":{"height":17.64},"width":157.4,"height":44.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-2.png","element":"img","alt":" Φd,T and","inline":true},{"style":{"height":14.7},"width":55.51,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-3.png","element":"img","alt":"ΦT","inline":true,"padRight":true},{"text":"to denote the energy dissipation and the symplectic Euler step, respectively. The symplectic Euler step is a well-known structure-preserving first-order integration scheme. Due to the symplectic integration, there exists a modified energy function that is nearly (up to exponentially small terms) conserved by the symplectic Euler step ","element":"span"},{"href":"#id-45","referenceIndex":17,"text":"(Hairer et al., ","element":"a"},{"href":"#id-45","referenceIndex":17,"text":"2002, ","element":"a"},{"text":"Chapter VI). The modified energy function can be computed by means of truncated Taylor-series expansions. In order to make the analysis rigorous, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is assumed to be analytic, which enables the estimation of higher-order derivatives using Cauchy’s integral formula. It will be shown that for the subsequent stability analysis the assumption of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"being analytic poses essentially no restriction, as any continuous function can be approximated arbitrarily closely by an analytic function on a compact domain (Stone-Weierstrass Theorem; ","element":"span"},{"href":"#id-65","referenceIndex":34,"text":"Rudin, ","element":"a"},{"href":"#id-65","referenceIndex":34,"text":"1976, ","element":"a"},{"text":"p. 159). The modified energy function is characterized by the following result.","element":"span"}],[{"id":"id-66","style":{"fontWeight":"bold"},"text":"Proposition 9 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be analytic on ","element":"span"},{"style":{"height":16.32},"width":49.29,"height":40.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-4.png","element":"img","alt":" Bcr","inline":true},{"style":{"fontStyle":"italic"},"text":", the closed ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"style":{"fontStyle":"italic"},"text":"-dimensional ball of radius ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"style":{"fontStyle":"italic"},"text":"centered at the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"origin, and let ","element":"span"},{"style":{"height":14.62},"width":55.7,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-5.png","element":"img","alt":" LH","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a Lipschitz constant of ","element":"span"},{"style":{"height":16.32},"width":316.05,"height":40.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-6.png","element":"img","alt":" ∇H on Bcr × Bcr","inline":true},{"style":{"fontStyle":"italic"},"text":". Then there exists a perturbed ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Hamiltonian ","element":"span"},{"style":{"height":20.33},"width":502.69,"height":50.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-7.png","element":"img","alt":"˜H : Bcr × Bcr → R such that","inline":true}],[{"id":"id-69","style":{"width":"75%"},"width":1311,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"height":17.6},"width":257.99,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-9.png","element":"img","alt":" 0 < T ≤ T0/3","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and for all ","element":"span"},{"style":{"height":20.75},"width":811.31,"height":51.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-10.png","element":"img","alt":" |z0| ≤ r2(1 + 3.63LHT(1 + eT0/3))−1, where","inline":true}],[{"style":{"width":"58%"},"width":1011,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-11.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"are constant. The perturbed Hamiltonian has the form","element":"span"}],[{"style":{"width":"48%"},"width":832,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-12.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":16.32},"width":363.26,"height":40.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-13.png","element":"img","alt":" F : Bcr × Bcr → R","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is an analytic function. The perturbed Hamiltonian ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-14.png","element":"img","alt":" ˜H","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"has the same ","element":"span"},{"style":{"fontStyle":"italic"},"text":"critical points as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and satisfies","element":"span"}],[{"id":"id-68","style":{"width":"81%"},"width":1415,"height":138,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/16-15.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Remarks:","element":"span"}],[{"text":"• A more general version of Proposition ","element":"span"},{"href":"#id-66","text":"9 ","element":"a"},{"text":"is stated and proved in Appendix ","element":"span"},{"text":"C.","element":"span"}],[{"text":"• The proof follows the reasoning of ","element":"span"},{"href":"#id-45","referenceIndex":17,"text":"Hairer et al. ","element":"a"},{"href":"#id-45","referenceIndex":17,"text":"(2002, ","element":"a"},{"text":"p. 307), which enables the construction of ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-0.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"by means of a truncated series expansion. Cauchy’s integral formula is used to assert the convergence of the truncated series. The result from ","element":"span"},{"href":"#id-45","referenceIndex":17,"text":"Hairer et al. ","element":"a"},{"href":"#id-45","referenceIndex":17,"text":"(2002, ","element":"a"},{"text":"p. 307) is extended by exploiting the specific structure of the underlying dynamics, which leads to the additional statements about the modified energy function ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-1.png","element":"img","alt":"˜H","inline":true},{"text":". These are crucial in the context of stability analysis.","element":"span"}],[{"text":"• The constants ","element":"span"},{"style":{"height":18.16},"width":225.9,"height":45.41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-2.png","element":"img","alt":" T0 and C∆ ˜H","inline":true,"padRight":true},{"text":"are determined as a function of the Lipschitz constant of ","element":"span"},{"style":{"height":14.8},"width":87.18,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-3.png","element":"img","alt":" ∇H,","inline":true,"padRight":true},{"text":"which can be regarded as the natural time constant for the dynamics governed by the Hamiltonian ","element":"span"},{"style":{"height":14.62},"width":270.03,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-4.png","element":"img","alt":" H. For LH = 1","inline":true,"padRight":true},{"text":"we obtain the following values: ","element":"span"},{"style":{"height":18.16},"width":440.45,"height":45.41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-5.png","element":"img","alt":" T0 ≈ 0.071, C∆ ˜H ≈ 8.4,","inline":true}],[{"id":"id-67","style":{"width":"72%"},"width":1259,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-6.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":17.6},"width":859.46,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-7.png","element":"img","alt":" 0 < T ≤ 0.023 and all z0 such that |z0| ≤ 0.45r","inline":true},{"text":". Choosing, for example, a time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"= 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"001 ","element":"span"},{"text":"leads to the bound ","element":"span"},{"style":{"height":15.13},"width":183.59,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-8.png","element":"img","alt":" 1.2·10−33 ","inline":true,"padRight":true},{"text":"on the right-hand side of expression ","element":"span"},{"href":"#id-67","text":"(29)","element":"a"},{"text":", indicating that ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-9.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"is virtually exactly conserved by the symplectic Euler scheme.","element":"span"}],[{"text":"• The estimate for the maximum time step ","element":"span"},{"style":{"height":17.6},"width":88.25,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-10.png","element":"img","alt":" T0/3","inline":true,"padRight":true},{"text":"is typically conservative. However, the proposition rigorously establishes that for a small enough time step, the perturbed Hamiltonian will be almost exactly conserved. The importance lies in the fact that the upper bound on the time step is only dependent on the Lipschitz constant of ","element":"span"},{"style":{"height":12.4},"width":75.36,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-11.png","element":"img","alt":" ∇H","inline":true},{"text":", which is directly related to the Lipschitz constant of ","element":"span"},{"style":{"height":16.4},"width":61.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-12.png","element":"img","alt":" ∇f","inline":true},{"text":". Due to the fact that Proposition ","element":"span"},{"href":"#id-66","text":"9 ","element":"a"},{"text":"is a statement about the integration of the conservative part of the dynamics, the maximum time step ","element":"span"},{"style":{"height":17.6},"width":88.25,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-13.png","element":"img","alt":" T0/3","inline":true,"padRight":true},{"text":"is independent of the parameters ","element":"span"},{"style":{"height":16.4},"width":203.39,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-14.png","element":"img","alt":" d, β, and κ.","inline":true}],[{"text":"• The bounds ","element":"span"},{"href":"#id-68","text":"(28) ","element":"a"},{"text":"enable a nonlinear stability analysis, based on the modified Hamiltonian ","element":"span"},{"style":{"height":16.41},"width":50.82,"height":41.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-15.png","element":"img","alt":"˜H.","inline":true,"padRight":true},{"text":"Due to the fact that ","element":"span"},{"style":{"height":20.41},"width":569.81,"height":51.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-16.png","element":"img","alt":" 15LHT < 1 for all T ≤ T0/3, ˜H","inline":true,"padRight":true},{"text":"is necessarily positive in a neighborhood of the origin, cf. ","element":"span"},{"href":"#id-68","text":"(28)","element":"a"},{"text":". Combined with the fact that the perturbed Hamiltonian has the same critical points as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":", this concludes that the level sets of ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-17.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"are compact in a region about the origin. The size of this region is determined by the critical points of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":", as argued in the analysis of Example 1; see Section ","element":"span"},{"href":"#id-29","text":"2.6.","element":"a"}],[{"text":"In the following, we will use the modified energy function ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-18.png","element":"img","alt":"˜H","inline":true},{"text":", whose existence is ensured by Proposition ","element":"span"},{"href":"#id-66","text":"9, ","element":"a"},{"text":"for analyzing the stability of the dynamics ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"in the large. As pointed out, the dynamics ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"can be subdivided into a dissipation step and a symplectic Euler step.","element":"span"}],[{"text":"In order to simplify the presentation we focus on the case where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is small and the map ","element":"span"},{"style":{"height":17.24},"width":82.36,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-19.png","element":"img","alt":" Φd,T","inline":true,"padRight":true},{"text":"is close to the identity (little damping). For stability analysis this is the most challenging setting, since, due to the almost vanishing damping, the convergence can be arbitrarily slow (the equilibrium is almost non-attractive). In case ","element":"span"},{"style":{"height":17.24},"width":82.36,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-20.png","element":"img","alt":" Φd,T","inline":true,"padRight":true},{"text":"is not close to the identity, which is, for example, obtained for a constant parameter ","element":"span"},{"style":{"height":16.4},"width":107.16,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-21.png","element":"img","alt":" β > 0","inline":true},{"text":", independent of ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-22.png","element":"img","alt":" κ","inline":true},{"text":", stability can be analyzed by means of the unperturbed energy function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":", as shown in Appendix ","element":"span"},{"text":"E. ","element":"span"},{"text":"We will thus concentrate on the following result.","element":"span"}],[{"id":"id-70","style":{"fontWeight":"bold"},"text":"Proposition 10 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let the dissipative forces ","element":"span"},{"style":{"height":17.6},"width":175.05,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-23.png","element":"img","alt":" fd(qk, pk)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be such that ","element":"span"},{"style":{"height":20.45},"width":527.59,"height":51.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-24.png","element":"img","alt":" −d2|pk|2 ≤ pTk fd(qk, pk) ≤","inline":true},{"style":{"height":19.13},"width":490.65,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-25.png","element":"img","alt":"−d1|pk|2 with 0 < d1 ≤ d2.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Then, there exists a maximum time step ","element":"span"},{"style":{"height":14.62},"width":177.52,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-26.png","element":"img","alt":" Tmax > 0","inline":true},{"style":{"fontStyle":"italic"},"text":", such that the origin is an asymptotically stable equilibrium of the dynamics ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"height":14.62},"width":170.2,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-27.png","element":"img","alt":" T ≤ Tmax","inline":true},{"style":{"fontStyle":"italic"},"text":", with domain of attraction at least ","element":"span"},{"style":{"height":18.04},"width":142.09,"height":45.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-28.png","element":"img","alt":" Af ∩A,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is any compact set. Up to exponentially small terms due to ","element":"span"},{"href":"#id-69","text":"(27)","element":"a"},{"style":{"fontStyle":"italic"},"text":", the maximum time step ","element":"span"},{"style":{"height":14.62},"width":80.46,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-29.png","element":"img","alt":"Tmax","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"depends on an upper bound on ","element":"span"},{"style":{"height":17.6},"width":322.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-30.png","element":"img","alt":" d2, d1/d2, and LH","inline":true},{"style":{"fontStyle":"italic"},"text":", the Lipschitz constant of ","element":"span"},{"style":{"height":12.4},"width":87.18,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/17-31.png","element":"img","alt":" ∇H.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Remarks:","element":"span"}],[{"text":"• The proof of Proposition ","element":"span"},{"href":"#id-70","text":"10, ","element":"a"},{"text":"which is included in Appendix ","element":"span"},{"text":"D, ","element":"span"},{"text":"is based on the Lyapunov function ","element":"span"},{"style":{"height":36.18},"width":638.4,"height":90.45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-0.png","element":"img","alt":"V (q, p) = ˜H(q, p) + Td12 ∇f(q)Tp,","inline":true}],[{"text":"where ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-1.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"denotes the perturbed energy function introduced in Proposition ","element":"span"},{"href":"#id-66","text":"9. ","element":"a"},{"text":"The function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"is motivated by the following observation. As the damping parameter ","element":"span"},{"style":{"height":15.02},"width":39.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-2.png","element":"img","alt":" d1","inline":true,"padRight":true},{"text":"decreases and ","element":"span"},{"style":{"height":17.24},"width":82.36,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-3.png","element":"img","alt":"Φd,T","inline":true,"padRight":true},{"text":"approaches the identity, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"reduces to the perturbed energy function ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-4.png","element":"img","alt":"˜H","inline":true},{"text":", which (up to exponentially small terms) is known to be conserved by ","element":"span"},{"style":{"height":14.7},"width":55.51,"height":36.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-5.png","element":"img","alt":" ΦT","inline":true,"padRight":true},{"text":". The correction ","element":"span"},{"style":{"height":19.53},"width":281.52,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-6.png","element":"img","alt":" Td1pT∇f(q)/2","inline":true,"padRight":true},{"text":"is required due to the fact that ","element":"span"},{"style":{"height":17.24},"width":82.36,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-7.png","element":"img","alt":" Φd,T","inline":true,"padRight":true},{"text":"is a pure contraction in the momentum variable, and as a result ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-8.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"is not necessarily decreasing through the application of ","element":"span"},{"style":{"height":17.24},"width":82.36,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-9.png","element":"img","alt":" Φd,T","inline":true,"padRight":true},{"text":". A similar correction has been previously used in the literature on dissipative systems; see, for example, ","element":"span"},{"href":"#id-71","referenceIndex":18,"text":"Hale ","element":"a"},{"href":"#id-71","referenceIndex":18,"text":"(1988) ","element":"a"},{"text":"or ","element":"span"},{"href":"#id-72","referenceIndex":19,"text":"Haraux ","element":"a"},{"href":"#id-72","referenceIndex":19,"text":"(1991)","element":"a"},{"text":", and in the context of the heavy-ball method in ","element":"span"},{"href":"#id-64","referenceIndex":15,"text":"Gadat et al. ","element":"a"},{"href":"#id-64","referenceIndex":15,"text":"(2018)","element":"a"},{"text":".","element":"span"}],[{"text":"• Proposition ","element":"span"},{"href":"#id-70","text":"10 ","element":"a"},{"text":"ensures that the region of attraction is the same as the continuous-time counterpart, and that the requirements on the time step for guaranteeing asymptotic stability is independent of the minimum damping ","element":"span"},{"style":{"height":15.02},"width":39.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-10.png","element":"img","alt":" d1","inline":true,"padRight":true},{"text":"(up to the exponentially small terms). This will be important in the following, where we will analyze how the convergence rate scales with ","element":"span"},{"style":{"height":8.4},"width":36.14,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-11.png","element":"img","alt":" κ.","inline":true}],[{"text":"• Provided that ","element":"span"},{"style":{"height":16.4},"width":117.48,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-12.png","element":"img","alt":" β > 0","inline":true,"padRight":true},{"text":"and the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is convex, the stability analysis simplifies substantially. In particular, a sufficient condition for stability is obtained by analyzing the total energy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"along the trajectories of ","element":"span"},{"href":"#id-43","text":"(13)","element":"a"},{"text":", as shown in Appendix ","element":"span"},{"text":"E.","element":"span"}],[{"text":"Next we will analyze the specific implications for the dynamics ","element":"span"},{"href":"#id-43","text":"(13)","element":"a"},{"text":". We recall that the contraction step ","element":"span"},{"style":{"height":17.24},"width":82.36,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-13.png","element":"img","alt":" Φd,T","inline":true,"padRight":true},{"text":"is given by","element":"span"}],[{"style":{"width":"71%"},"width":1231,"height":119,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-14.png","element":"img"}],[{"text":"which concludes that the upper and lower bounds ","element":"span"},{"style":{"height":15.02},"width":171.34,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-15.png","element":"img","alt":" d2 and d1","inline":true,"padRight":true},{"text":"of Proposition ","element":"span"},{"href":"#id-70","text":"10 ","element":"a"},{"text":"are given by ","element":"span"},{"style":{"height":15.02},"width":92.53,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-16.png","element":"img","alt":" d2 =","inline":true},{"style":{"height":18.81},"width":796.3,"height":47.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-17.png","element":"img","alt":"2d+ ¯Cfβ and d1 = 2d−Cfβ (assuming β ≥ 0","inline":true},{"text":"). The upper and lower bounds ","element":"span"},{"style":{"height":15.02},"width":162.57,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-18.png","element":"img","alt":" d2 and d1","inline":true,"padRight":true},{"text":"are therefore functions of ","element":"span"},{"style":{"height":8.4},"width":36.14,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-19.png","element":"img","alt":" κ.","inline":true}],[{"text":"Hence, according to Proposition ","element":"span"},{"href":"#id-70","text":"10, ","element":"a"},{"text":"provided that ","element":"span"},{"style":{"height":17.6},"width":447.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-20.png","element":"img","alt":" d1 > 0 and d2/d1 and d2","inline":true,"padRight":true},{"text":"are bounded with respect to ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-21.png","element":"img","alt":" κ","inline":true},{"text":", there exists a time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"independent ","element":"span"},{"text":"of ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-22.png","element":"img","alt":" κ","inline":true},{"text":", such that the origin is asymptotically stable, where the region of attraction includes any compact subset of ","element":"span"},{"style":{"height":18.04},"width":53.84,"height":45.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-23.png","element":"img","alt":" Af","inline":true},{"text":". Then, according to Proposition ","element":"span"},{"href":"#id-52","text":"3, ","element":"a"},{"text":"the convergence rate of any trajectory starting in ","element":"span"},{"style":{"height":18.04},"width":201.56,"height":45.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-24.png","element":"img","alt":" A ⊂ Af, A","inline":true,"padRight":true},{"text":"compact, is determined by the magnitude of the eigenvalues of the linearized dynamics. The eigenvalues are given by","element":"span"}],[{"style":{"width":"61%"},"width":1069,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-25.png","element":"img"}],[{"text":"enabling the calculation of the convergence rate for a given choice of ","element":"span"},{"style":{"height":16.4},"width":245.86,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-26.png","element":"img","alt":" T, d, β and h.","inline":true}],[{"text":"For the following discussion, we again assume that ","element":"span"},{"style":{"height":16.4},"width":140.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-27.png","element":"img","alt":" d and β","inline":true,"padRight":true},{"text":"are normalized such that ","element":"span"},{"style":{"height":17.6},"width":121.43,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-28.png","element":"img","alt":" 1/κ ≤","inline":true},{"style":{"height":16.4},"width":494.84,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-29.png","element":"img","alt":"h ≤ 1. We fix d, β, and T","inline":true,"padRight":true},{"text":"and analyze how the eigenvalues change as a function of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":", as done in Figure ","element":"span"},{"href":"#id-73","text":"3. ","element":"a"},{"text":"The worst-case convergence rate is determined by the maximum magnitude of the eigenvalues (over ","element":"span"},{"style":{"height":17.6},"width":261.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/18-30.png","element":"img","alt":" 1/κ ≤ h ≤ 1","inline":true},{"text":"). For very small values of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":", the eigenvalues are real, and one eigenvalue is very close to ","element":"span"},{"text":"1","element":"span"},{"text":". As ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"is increased, the eigenvalues become complex conjugates, where for ","element":"span"},{"style":{"height":16.4},"width":125.98,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-0.png","element":"img","alt":" β = 0","inline":true},{"text":", the eigenvalues are located along circles centered at the origin, whereas for ","element":"span"},{"style":{"height":16.4},"width":125.98,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-1.png","element":"img","alt":" β > 0","inline":true,"padRight":true},{"text":"their magnitude slightly decreases. If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"is increased further, the eigenvalues become real again and their magnitude increases. The worst-case convergence rate, i.e., the largest magnitude of ","element":"span"},{"style":{"height":18.22},"width":136.5,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-2.png","element":"img","alt":" |λ1,2| is","inline":true,"padRight":true},{"text":"therefore either achieved for ","element":"span"},{"style":{"height":17.6},"width":326.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-3.png","element":"img","alt":" h = 1/κ or h = 1.","inline":true}],[{"text":"We are interested in determining the conditions on ","element":"span"},{"style":{"height":16.4},"width":131.5,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-4.png","element":"img","alt":" d and β","inline":true,"padRight":true},{"text":"such that the convergence rate scales with ","element":"span"},{"style":{"height":17.77},"width":105,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-5.png","element":"img","alt":" 1/√κ","inline":true},{"text":". We consider first the case ","element":"span"},{"style":{"height":17.6},"width":347.07,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-6.png","element":"img","alt":" h = 1/κ and β = 0","inline":true},{"text":", and assume that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"is large enough such that the eigenvalues are real. Then, provided that ","element":"span"},{"style":{"height":17.77},"width":200.02,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-7.png","element":"img","alt":" d ∼ 1/√κ","inline":true,"padRight":true},{"text":"it follows that ","element":"span"},{"style":{"height":18.39},"width":348.5,"height":45.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-8.png","element":"img","alt":" λ1,2 ∼ 1 − T/√κ.","inline":true,"padRight":true},{"text":"However, if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"is chosen to be larger, i.e., ","element":"span"},{"style":{"height":17.77},"width":185.9,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-9.png","element":"img","alt":" d ≻ 1/√κ","inline":true},{"text":", it follows that for large ","element":"span"},{"style":{"height":10.8},"width":36.14,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-10.png","element":"img","alt":" κ,","inline":true}],[{"style":{"width":"73%"},"width":1263,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-11.png","element":"img"}],[{"text":"that is, the convergence rate scales worse than ","element":"span"},{"style":{"height":17.77},"width":312.33,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-12.png","element":"img","alt":" 1/√κ. In case d","inline":true,"padRight":true},{"text":"is small enough, such that the eigenvalues are complex conjugates, their magnitude is given by","element":"span"}],[{"style":{"width":"59%"},"width":1034,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-13.png","element":"img"}],[{"text":"which indicates that a scaling of the convergence rate with ","element":"span"},{"style":{"height":17.77},"width":105,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-14.png","element":"img","alt":" 1/√κ","inline":true,"padRight":true},{"text":"is only achieved for ","element":"span"},{"style":{"height":17.77},"width":197.04,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-15.png","element":"img","alt":" d ∼ 1/√κ.","inline":true,"padRight":true},{"text":"For ","element":"span"},{"style":{"height":16.4},"width":297.16,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-16.png","element":"img","alt":" h = 1 and β = 0","inline":true,"padRight":true},{"text":"it follows from ","element":"span"},{"style":{"height":18.39},"width":515.05,"height":45.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-17.png","element":"img","alt":" d ∼ 1/√κ that λ1,2 approach","inline":true}],[{"style":{"width":"65%"},"width":1135,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-18.png","element":"img"}],[{"text":"for large ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-19.png","element":"img","alt":" κ","inline":true},{"text":". Thus, for ","element":"span"},{"style":{"height":14.4},"width":117.73,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-20.png","element":"img","alt":" T ≤ 1","inline":true},{"text":", for example, the eigenvalues are complex conjugates and the worst-case convergence rate scales with ","element":"span"},{"style":{"height":17.77},"width":105,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-21.png","element":"img","alt":" 1/√κ","inline":true},{"text":". The condition ","element":"span"},{"style":{"height":17.77},"width":206.99,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-22.png","element":"img","alt":" d ∼ 1/√κ","inline":true,"padRight":true},{"text":"is therefore necessary and sufficient for ensuring that the worst-case magnitude of ","element":"span"},{"style":{"height":18.22},"width":94.85,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-23.png","element":"img","alt":" |λ1,2|","inline":true,"padRight":true},{"text":"scales with ","element":"span"},{"style":{"height":17.77},"width":105,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-24.png","element":"img","alt":" 1/√κ","inline":true},{"text":". A similar analysis applies to the case where ","element":"span"},{"style":{"height":16.4},"width":107.17,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-25.png","element":"img","alt":" β > 0","inline":true},{"text":", as shown below.","element":"span"}],[{"text":"As in the discussion of Section ","element":"span"},{"href":"#id-29","text":"2.6 ","element":"a"},{"text":"we say that the optimization algorithm ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"is accelerated if the convergence rate scales with ","element":"span"},{"style":{"height":17.77},"width":105,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-26.png","element":"img","alt":" 1/√κ","inline":true,"padRight":true},{"text":"(see Definition ","element":"span"},{"href":"#id-61","text":"7)","element":"a"},{"text":", where exponentially small terms resulting from the application of Proposition ","element":"span"},{"href":"#id-66","text":"9 ","element":"a"},{"text":"are neglected. As a consequence, the above discussion allows us to translate the results from Proposition ","element":"span"},{"href":"#id-62","text":"8 ","element":"a"},{"text":"almost verbatim to the discrete-time setting. This results in a broad characterization of the parameters ","element":"span"},{"style":{"height":16.4},"width":133.54,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-27.png","element":"img","alt":" d and β","inline":true,"padRight":true},{"text":"leading to acceleration.","element":"span"}],[{"id":"id-75","style":{"fontWeight":"bold"},"text":"Proposition 11 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"In the nonconvex case, (","element":"span"},{"style":{"height":17.64},"width":148.17,"height":44.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-28.png","element":"img","alt":"Cf > 0","inline":true},{"style":{"fontStyle":"italic"},"text":"), there exists a time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T > ","element":"span"},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":", such that the algorithm given in Example 2 is accelerated provided that ","element":"span"},{"style":{"height":18.62},"width":627.87,"height":46.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-29.png","element":"img","alt":" d ∼ 1/√κ, 0 < d, 0 ≤ β < 2d/Cf.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Example 2 is accelerated provided that ","element":"span"},{"style":{"height":17.77},"width":673.07,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-30.png","element":"img","alt":" d + β/(2κ) ∼ 1/√κ, 0 < d, 0 ≤ β","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and either ","element":"span"},{"style":{"height":16.4},"width":131.66,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-31.png","element":"img","alt":" β ∼ 1,","inline":true},{"style":{"height":17.77},"width":463.13,"height":44.42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-32.png","element":"img","alt":"β ∼ 1/√κ, or β ≺ 1/√κ.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"The requirements on ","element":"span"},{"style":{"height":16.4},"width":26,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-33.png","element":"img","alt":" β","inline":true,"padRight":true},{"text":"are needed for guaranteeing asymptotic stability of the origin for a time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"that is small enough, but independent of ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-34.png","element":"img","alt":" κ","inline":true},{"text":", as implied by Proposition ","element":"span"},{"href":"#id-70","text":"10 ","element":"a"},{"text":"and Appendix ","element":"span"},{"text":"E. ","element":"span"},{"text":"It therefore remains to analyze the behavior of the eigenvalues ","element":"span"},{"style":{"height":17.42},"width":68.8,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-35.png","element":"img","alt":" λ1,2","inline":true,"padRight":true},{"text":"as a function of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"for ","element":"span"},{"style":{"height":17.6},"width":243.1,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-36.png","element":"img","alt":"1/κ ≤ h ≤ 1.","inline":true}],[{"style":{"width":"95%"},"width":1658,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-37.png","element":"img"}],[{"text":"and if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"is increased further they become real again. In case the eigenvalues are complex conjugates, their magnitude is given by","element":"span"}],[{"id":"id-74","style":{"width":"64%"},"width":1106,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/19-38.png","element":"img"}],[{"text":"We consider first the case ","element":"span"},{"style":{"height":17.6},"width":171.97,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-0.png","element":"img","alt":" h = 1/κ","inline":true},{"text":": The eigenvalues are real provided that ","element":"span"},{"style":{"height":17.6},"width":261.46,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-1.png","element":"img","alt":" d + β/(2κ) +","inline":true},{"style":{"height":17.77},"width":301.94,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-2.png","element":"img","alt":"T/(2κ) ≥ 1/√κ","inline":true},{"text":", which, given the assumptions on ","element":"span"},{"style":{"height":16.4},"width":26,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-3.png","element":"img","alt":" β","inline":true},{"text":", implies that ","element":"span"},{"style":{"height":17.77},"width":545.1,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-4.png","element":"img","alt":" d ≥ 1/√κ for large κ. In case","inline":true},{"style":{"height":17.77},"width":195.16,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-5.png","element":"img","alt":"d ∼ 1/√κ","inline":true,"padRight":true},{"text":"it follows that ","element":"span"},{"style":{"height":18.39},"width":330.57,"height":45.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-6.png","element":"img","alt":" λ1,2 ∼ 1 − T/√κ","inline":true},{"text":", which therefore yields accelerated convergence. In case ","element":"span"},{"style":{"height":17.77},"width":185.9,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-7.png","element":"img","alt":" d ≻ 1/√κ","inline":true,"padRight":true},{"text":"it follows that","element":"span"}],[{"style":{"width":"87%"},"width":1520,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-8.png","element":"img"}],[{"text":"due to the fact that ","element":"span"},{"style":{"height":19.13},"width":463.94,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-9.png","element":"img","alt":" 1/(d2κ) → 0 for large κ","inline":true},{"text":". Thus, accelerated convergence is impossible for ","element":"span"},{"style":{"height":17.77},"width":190.2,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-10.png","element":"img","alt":"d ≻ 1/√κ","inline":true},{"text":". In case the eigenvalues are complex conjugates for ","element":"span"},{"style":{"height":17.6},"width":156.26,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-11.png","element":"img","alt":" h = 1/κ","inline":true},{"text":", it follows from ","element":"span"},{"href":"#id-74","text":"(33) ","element":"a"},{"text":"that ","element":"span"},{"style":{"height":17.77},"width":185.89,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-12.png","element":"img","alt":"d ∼ 1/√κ","inline":true,"padRight":true},{"text":"is necessary for accelerated convergence.","element":"span"}],[{"text":"We therefore proceed to the case ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"= 1","element":"span"},{"text":", where due to the fact that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"vanishes for large ","element":"span"},{"style":{"height":15.2},"width":103.53,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-13.png","element":"img","alt":" κ, the","inline":true,"padRight":true},{"text":"two eigenvalues approach constant values,","element":"span"}],[{"style":{"width":"75%"},"width":1310,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-14.png","element":"img"}],[{"text":"for ","element":"span"},{"style":{"height":16.4},"width":326.48,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-15.png","element":"img","alt":" β∞ := limκ→∞ β","inline":true},{"text":", which is either constant or zero. This results in a convergence rate that is independent of ","element":"span"},{"style":{"height":14.8},"width":178.54,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-16.png","element":"img","alt":" κ, since T","inline":true,"padRight":true},{"text":"is chosen small enough to guarantee convergence.","element":"span"}],[{"text":"In particular, this leads to the conclusion that, for example, the following heavy-ball scheme,","element":"span"}],[{"style":{"width":"82%"},"width":1417,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-17.png","element":"img"}],[{"text":"leads to accelerated convergence for a sufficiently small value of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":". In case the Lipschitz constant of ","element":"span"},{"style":{"height":16.4},"width":61.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-18.png","element":"img","alt":" ∇f","inline":true,"padRight":true},{"text":"is bounded by one, the bound ","element":"span"},{"style":{"height":14.4},"width":197.92,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-19.png","element":"img","alt":" T ≤ 0.001","inline":true,"padRight":true},{"text":"for guaranteeing acceleration is obtained by following the proof of Proposition ","element":"span"},{"href":"#id-70","text":"10.","element":"a"},{"text":"1 ","element":"span"},{"text":"Note that compared to Nesterov’s original scheme, a gradient evaluation at ","element":"span"},{"style":{"height":16.4},"width":160.23,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-20.png","element":"img","alt":" qk + βpk","inline":true,"padRight":true},{"text":"is not required.","element":"span"}],[{"text":"We would like to emphasize that our analysis captures the computational complexity up to constants. For a given objective function, choosing ","element":"span"},{"style":{"height":16.8},"width":117.51,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/20-21.png","element":"img","alt":" β ̸= 0","inline":true,"padRight":true},{"text":"typically allows larger steps ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":", which is likely to reduce the number of iterations needed for convergence. On strongly convex functions, for example, the “Constant step scheme III” presented in ","element":"span"},{"href":"#id-3","referenceIndex":31,"text":"Nesterov ","element":"a"},{"href":"#id-3","referenceIndex":31,"text":"(2004, ","element":"a"},{"text":"p. 81), seems to provide a good compromise between a straightforward implementation and fast convergence (considering constants).","element":"span"}],[{"text":"Local results akin to Proposition ","element":"span"},{"href":"#id-75","text":"11 ","element":"a"},{"text":"which are based on a quadratic objective function are well known. A summary can be found for example in ","element":"span"},{"href":"#id-6","referenceIndex":24,"text":"Lessard et al. ","element":"a"},{"href":"#id-6","referenceIndex":24,"text":"(2016)","element":"a"},{"text":". In addition to unifying the continuous-time and discrete-time analysis, an important aspect of Proposition ","element":"span"},{"href":"#id-75","text":"11 ","element":"a"},{"text":"is to establish a rate that is valid for a large portion of the region of attraction of a given equilibrium. Furthermore, the fact that the heavy-ball scheme leads to accelerated convergence for a sufficiently small value of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"appears to be new. If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is chosen to be too large, however, for example by tuning ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"on quadratic functions, this may lead to convergence issues even on strongly convex functions, as has been reported in ","element":"span"},{"href":"#id-6","referenceIndex":24,"text":"Lessard et al. ","element":"a"},{"href":"#id-6","referenceIndex":24,"text":"(2016)","element":"a"},{"text":".","element":"span"}],[{"style":{"width":"97%"},"width":1683,"height":1376,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/21-0.png","element":"img"}],[{"id":"id-73","text":"Figure 3: This figure shows how the eigenvalues of the linearization change as a function of the ","element":"figcaption","subtype":"caption"},{"text":"damping parameters ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":131.08,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/21-1.png","element":"img","alt":" d and β","inline":true},{"text":", and as a function of the curvature at the equilibrium, ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h","element":"figcaption","subtype":"caption"},{"text":". The top left plot shows the behavior of the eigenvalues for ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":107.3,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/21-2.png","element":"img","alt":" β = 0","inline":true},{"text":", as the curvature ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h ","element":"figcaption","subtype":"caption"},{"text":"is varied from ","element":"figcaption","subtype":"caption"},{"text":"0","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":".","element":"figcaption","subtype":"caption"},{"text":"1 ","element":"figcaption","subtype":"caption"},{"text":"to ","element":"figcaption","subtype":"caption"},{"text":"1 ","element":"figcaption","subtype":"caption"},{"text":"(the different colors represent the different values of ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"d","element":"figcaption","subtype":"caption"},{"text":"). The top right plot shows the behavior of the eigenvalues for ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":141.4,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/21-3.png","element":"img","alt":" β = 0.2","inline":true,"padRight":true},{"text":"and the bottom right plot the behavior for ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":141.1,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/21-4.png","element":"img","alt":" β = 0.4","inline":true},{"text":", as the curvature ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h ","element":"figcaption","subtype":"caption"},{"text":"is varied from ","element":"figcaption","subtype":"caption"},{"text":"0","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":".","element":"figcaption","subtype":"caption"},{"text":"1 ","element":"figcaption","subtype":"caption"},{"text":"to ","element":"figcaption","subtype":"caption"},{"text":"1","element":"figcaption","subtype":"caption"},{"text":". For all the plots the time step is set to ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"T ","element":"figcaption","subtype":"caption"},{"text":"= 0","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":".","element":"figcaption","subtype":"caption"},{"text":"8","element":"figcaption","subtype":"caption"},{"text":". The figure indicates that for ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":108.25,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/21-5.png","element":"img","alt":" β = 0","inline":true},{"text":", the eigenvalues move first along the real axis and then along concentric circles (as ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h ","element":"figcaption","subtype":"caption"},{"text":"changes). The additional parameter ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":159.96,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/21-6.png","element":"img","alt":" β has the","inline":true,"padRight":true},{"text":"effect of reducing the magnitude of the eigenvalues for larger values of ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"h","element":"figcaption","subtype":"caption"},{"text":". While this can increase the convergence rate, it bears also the risk of instability, when ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":275.44,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/21-7.png","element":"img","alt":" d, β, and T are","inline":true,"padRight":true},{"text":"chosen to be too large.","element":"figcaption","subtype":"caption"}]]},{"heading":"3. The Time-Varying Case","paragraphs":[[{"text":"Next we will consider the case where the dynamics ","element":"span"},{"style":{"fontStyle":"italic"},"text":"g ","element":"span"},{"text":"are time dependent. The main motivation for introducing time-dependent dynamics lies in improving the convergence rate for functions that have an almost vanishing curvature at a local minimum.","element":"span"}],[{"text":"We slightly abuse notation and reuse the variables introduced in Section ","element":"span"},{"href":"#id-28","text":"2.4. ","element":"a"},{"text":"It will be clear from context whether we refer to the time-varying or time-invariant case.","element":"span"}],[{"text":"Following Section ","element":"span"},{"text":"2, ","element":"span"},{"text":"we model a momentum-based optimization algorithm as a continuous-time or discrete-time dynamical system of the form","element":"span"}],[{"id":"id-77","style":{"width":"85%"},"width":1479,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-0.png","element":"img"}],[{"text":"where the dynamics satisfy the following assumptions:","element":"span"},{"href":"#id-76","text":"1","element":"a"}],[{"style":{"fontWeight":"bold"},"text":"Assumption 5 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The dynamics ","element":"span"},{"style":{"height":17.82},"width":167.24,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-1.png","element":"img","alt":" gq and gp","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are continuously differentiable in all their arguments. The derivative in the second and third arguments is Lipschitz continuous, uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"text":"We define the map from ","element":"span"},{"style":{"height":19.13},"width":1222.46,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-2.png","element":"img","alt":" (q0, p0, t0) → (q(t), p(t)) as ϕt : R2n × I → R2n, for t ∈ I, t ≥ t0,","inline":true,"padRight":true},{"text":"and define the dynamical system ","element":"span"},{"href":"#id-77","text":"(37)","element":"a"},{"text":"-","element":"span"},{"href":"#id-77","text":"(38) ","element":"a"},{"text":"to be a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"momentum-based optimization algorithm ","element":"span"},{"text":"for the function ","element":"span"},{"style":{"height":16.4},"width":197.89,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-3.png","element":"img","alt":" f if x∗ = 0","inline":true,"padRight":true},{"text":"is an asymptotically stable equilibrium in the sense of Lyapunov (uniformly in the initial time ","element":"span"},{"style":{"height":13.82},"width":32.76,"height":34.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-4.png","element":"img","alt":" t0","inline":true},{"text":"). Due to the continuity assumptions on the dynamics ","element":"span"},{"style":{"height":17.82},"width":163.7,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-5.png","element":"img","alt":" gq and gp","inline":true},{"text":", the continuous-time existence and uniqueness results, as well as the continuity of trajectories with respect to their initial conditions continues to hold ","element":"span"},{"href":"#id-37","referenceIndex":3,"text":"(Arnol’d, ","element":"a"},{"href":"#id-37","referenceIndex":3,"text":"1992, ","element":"a"},{"text":"p. 93, Corollaries 3 and 4).","element":"span"}],[{"text":"The examples presented in Section ","element":"span"},{"href":"#id-26","text":"2.2 ","element":"a"},{"text":"and Section ","element":"span"},{"href":"#id-27","text":"2.3 ","element":"a"},{"text":"are modified by allowing the constants ","element":"span"},{"style":{"height":16.4},"width":326.96,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-6.png","element":"img","alt":"d > 0 and β ≥ 0","inline":true,"padRight":true},{"text":"to be time varying. We restrict ourselves to the case where ","element":"span"},{"style":{"height":16.8},"width":313.09,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-7.png","element":"img","alt":" d and β converge","inline":true,"padRight":true},{"text":"for large time, which leads to asymptotically time-invariant dynamics. This has the advantage that the asymptotic convergence rate is independent of the initial time ","element":"span"},{"style":{"height":13.82},"width":32.76,"height":34.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-8.png","element":"img","alt":" t0","inline":true},{"text":", and, under mild continuity conditions, the (optimal) convergence rates from Section ","element":"span"},{"href":"#id-28","text":"2.4 ","element":"a"},{"text":"will be recovered for large ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". Thus, ","element":"span"},{"href":"#id-39","text":"(8) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"are extended by replacing the constants ","element":"span"},{"style":{"height":16.4},"width":137.75,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-9.png","element":"img","alt":" d and β","inline":true,"padRight":true},{"text":"with the functions ","element":"span"},{"style":{"height":17.42},"width":372.34,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-10.png","element":"img","alt":" d : R≥0 → R>0 and","inline":true},{"style":{"height":17.82},"width":282.88,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-11.png","element":"img","alt":"β : R≥0 → R≥0","inline":true},{"text":", which are assumed to be continuously differentiable and convergent,","element":"span"}],[{"style":{"width":"73%"},"width":1270,"height":61,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-12.png","element":"img"}],[{"text":"The stability analysis of Example 1 (Section ","element":"span"},{"href":"#id-26","text":"2.2) ","element":"a"},{"text":"translates to the time-varying case.","element":"span"},{"text":"2 ","element":"span"},{"text":"This implies that the algorithm ","element":"span"},{"href":"#id-39","text":"(8) ","element":"a"},{"text":"with time-varying parameters ","element":"span"},{"style":{"height":16.4},"width":128.63,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-13.png","element":"img","alt":" d and β","inline":true,"padRight":true},{"text":"is a momentum-based optimization algorithm according to the above definition.","element":"span"}],[{"text":"The stability analysis of Example 2 is extended to the time-varying case by rewriting the dynamics ","element":"span"},{"href":"#id-77","text":"(37) ","element":"a"},{"text":"in the following way:","element":"span"}],[{"id":"id-76","style":{"width":"99%"},"width":1726,"height":292,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/22-14.png","element":"img"}],[{"text":"where the linear time-varying part vanishes for ","element":"span"},{"style":{"height":11.6},"width":136.65,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-0.png","element":"img","alt":" t → ∞","inline":true},{"text":". The remainder ","element":"span"},{"style":{"height":10.62},"width":62.71,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-1.png","element":"img","alt":" rNL","inline":true,"padRight":true},{"text":"is of second order in ","element":"span"},{"style":{"height":10.62},"width":32.29,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-2.png","element":"img","alt":"zt","inline":true,"padRight":true},{"text":"(uniformly in ","element":"span"},{"style":{"height":12.8},"width":104.17,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-3.png","element":"img","alt":" t ∈ I","inline":true},{"text":"). Thus, if the linear time-invariant part has all eigenvalues strictly within the unit circle, the dynamics ","element":"span"},{"href":"#id-76","text":"(40) ","element":"a"},{"text":"are asymptotically stable, uniformly in ","element":"span"},{"style":{"height":17.35},"width":61.6,"height":43.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-4.png","element":"img","alt":" t0.1 ","inline":true,"padRight":true},{"text":"Consequently, the analysis from Section ","element":"span"},{"href":"#id-27","text":"2.3 ","element":"a"},{"text":"yields the following conditions, cf. ","element":"span"},{"href":"#id-47","text":"(16)","element":"a"},{"text":":","element":"span"}],[{"style":{"width":"79%"},"width":1378,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-5.png","element":"img"}],[{"text":"for all eigenvalues ","element":"span"},{"style":{"height":22.54},"width":447.74,"height":56.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-6.png","element":"img","alt":" h of He = d2f/dx2��x=0.","inline":true}],[{"id":"id-31","style":{"fontWeight":"bold"},"text":"3.1 Characterizing the convergence rate","element":"span"}],[{"text":"As in Section ","element":"span"},{"href":"#id-28","text":"2.4 ","element":"a"},{"text":"we argue that a linear analysis can be used to characterize the convergence rate of the nonlinear dynamics up to constants. The results are derived only for non-degenerate isolated local minima. We believe, however, that the discussion also provides insights into the case where the curvature vanishes at a local minimum; see Section ","element":"span"},{"href":"#id-32","text":"3.2 ","element":"a"},{"text":"for further discussion of this point.","element":"span"}],[{"text":"A similar analysis of the role of choosing a time-varying damping parameter can be found in ","element":"span"},{"href":"#id-12","referenceIndex":4,"text":"Attouch et al. ","element":"a"},{"href":"#id-12","referenceIndex":4,"text":"(2018)","element":"a"},{"text":". The emphasis of the following section lies in providing intuition through a unified treatment between the degenerate and non-degenerate case.","element":"span"}],[{"text":"We make the following assumption.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Assumption 6 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let the linearized dynamics","element":"span"}],[{"id":"id-78","style":{"width":"60%"},"width":1048,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-7.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"be such that there is an estimate","element":"span"}],[{"style":{"width":"76%"},"width":1320,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":15.12},"width":353.16,"height":37.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-9.png","element":"img","alt":" Cl ≥ 1 and α > 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are constant, and ","element":"span"},{"style":{"height":17.02},"width":306.48,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-10.png","element":"img","alt":" ρ : R≥0 → R≥0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is continuous, and monotonically decreasing.","element":"span"}],[{"id":"id-82","style":{"fontWeight":"bold"},"text":"Proposition 12 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let Assumption ","element":"span"},{"href":"#id-78","style":{"fontStyle":"italic"},"text":"6 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"be satisfied and let the region of attraction of the equilibrium at the origin of the nonlinear dynamics ","element":"span"},{"href":"#id-77","text":"(37) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"be denoted by ","element":"span"},{"style":{"height":15.53},"width":174.12,"height":38.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-11.png","element":"img","alt":" R ⊂ R2n","inline":true},{"style":{"fontStyle":"italic"},"text":". Then, for any compact set ","element":"span"},{"style":{"height":13.6},"width":127.93,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-12.png","element":"img","alt":"A ⊂ R","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and any initial time ","element":"span"},{"style":{"height":14.62},"width":110.02,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-13.png","element":"img","alt":" t0 ∈ I","inline":true},{"style":{"fontStyle":"italic"},"text":", there exists a finite constant ","element":"span"},{"style":{"height":19.21},"width":114.49,"height":48.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-14.png","element":"img","alt":"ˆC ≥ 1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that for all ","element":"span"},{"style":{"height":15.42},"width":137.28,"height":38.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-15.png","element":"img","alt":" z0 ∈ A,","inline":true}],[{"style":{"width":"64%"},"width":1113,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-16.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"The set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"is compact, nonempty, and contains the origin. Lemma ","element":"span"},{"href":"#id-79","text":"16 ","element":"a"},{"text":"(see Appendix ","element":"span"},{"text":"B) ","element":"span"},{"text":"implies that there is a ball ","element":"span"},{"style":{"height":15.24},"width":365.66,"height":38.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-17.png","element":"img","alt":" Bδ of radius δ > 0","inline":true},{"text":", centered at the origin, such that any trajectory starting in ","element":"span"},{"style":{"height":14.84},"width":293.41,"height":37.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-18.png","element":"img","alt":" Bδ at time τ ∈ I","inline":true,"padRight":true},{"text":"converges with rate ","element":"span"},{"style":{"height":17.6},"width":505.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-19.png","element":"img","alt":" (ρ(t)/ρ(τ)) exp(−α(t − τ))","inline":true},{"text":". The following claim can be verified with the arguments of Proposition ","element":"span"},{"href":"#id-52","text":"3.","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Claim: ","element":"span"},{"text":"There exists a finite time ","element":"span"},{"style":{"height":14.8},"width":294.61,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-20.png","element":"img","alt":" Tm > t0, Tm ∈ I","inline":true,"padRight":true},{"text":"such that for all ","element":"span"},{"style":{"height":17.6},"width":460.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/23-21.png","element":"img","alt":" z0 ∈ A, ϕTm(z0, t0) ∈ Bδ.","inline":true}],[{"text":"Moreover, the uniform continuity of ","element":"span"},{"style":{"height":16.4},"width":496.28,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-0.png","element":"img","alt":" ϕt for all t ∈ I, t0 ≤ t ≤ Tm","inline":true},{"text":", implies further that ","element":"span"},{"style":{"height":17.6},"width":163.54,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-1.png","element":"img","alt":" ϕt(A, t0)","inline":true,"padRight":true},{"text":"is bounded for any ","element":"span"},{"style":{"height":14.8},"width":330.82,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-2.png","element":"img","alt":" t ∈ I, t0 ≤ t ≤ Tm","inline":true},{"text":". Combined with Lemma ","element":"span"},{"href":"#id-79","text":"16, ","element":"a"},{"text":"this yields the following bound","element":"span"}],[{"id":"id-80","style":{"width":"86%"},"width":1495,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-3.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":18.81},"width":657.75,"height":47.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-4.png","element":"img","alt":" z0 ∈ A, where CA ≥ δ and ˜C ≥ 1","inline":true,"padRight":true},{"text":"are positive constants. We fix ","element":"span"},{"style":{"height":15.42},"width":139.64,"height":38.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-5.png","element":"img","alt":" z0 ∈ A","inline":true},{"text":", consider the trajectory ","element":"span"},{"style":{"height":17.6},"width":425.13,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-6.png","element":"img","alt":" z(t) := ϕt(z0, t0), t ∈ I","inline":true},{"text":", and apply the mean value theorem,","element":"span"}],[{"style":{"width":"77%"},"width":1334,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-7.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":70.82,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-8.png","element":"img","alt":" ξ(t)","inline":true,"padRight":true},{"text":"lies between ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"text":"and the origin. Due to the fact that the dynamics are assumed to have Lipschitz continuous derivatives (uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":"), we obtain the following bound","element":"span"}],[{"style":{"width":"68%"},"width":1185,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-9.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.43},"width":54.19,"height":43.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-10.png","element":"img","alt":"¯CA","inline":true,"padRight":true},{"text":"denotes the Lipschitz constant of ","element":"span"},{"style":{"height":17.6},"width":220.86,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-11.png","element":"img","alt":" ∂g/∂z on A","inline":true,"padRight":true},{"text":"(uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":"). According to ","element":"span"},{"href":"#id-80","text":"(42)","element":"a"},{"text":", the trajectory ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"is integrable (in continuous time) and absolutely summable (in discrete time). We obtain, by virtue of Lemma ","element":"span"},{"href":"#id-81","text":"15,","element":"a"}],[{"style":{"width":"82%"},"width":1430,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-12.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.02},"width":45.19,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-13.png","element":"img","alt":" Cz","inline":true,"padRight":true},{"text":"is constant. The constant ","element":"span"},{"style":{"height":15.02},"width":45.19,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-14.png","element":"img","alt":" Cz","inline":true,"padRight":true},{"text":"is related to an upper bound on the integral (in continuous time) or the sum (in discrete time) of ","element":"span"},{"style":{"height":17.6},"width":285.01,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-15.png","element":"img","alt":" |z(t)| over t ∈ I","inline":true},{"text":", which according to ","element":"span"},{"href":"#id-80","text":"(42)","element":"a"},{"text":", is guaranteed to be finite.","element":"span"}],[{"id":"id-32","style":{"fontWeight":"bold"},"text":"3.2 Example 1","element":"span"}],[{"text":"The energy function, as defined in ","element":"span"},{"href":"#id-53","text":"(11)","element":"a"},{"text":", is not explicitly dependent on time, even if the parameters ","element":"span"},{"style":{"height":16.4},"width":137.25,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-16.png","element":"img","alt":"d and β","inline":true,"padRight":true},{"text":"are chosen to be time-varying. Following the discussion of Section ","element":"span"},{"href":"#id-29","text":"2.6, ","element":"a"},{"text":"we conclude that ","element":"span"},{"style":{"height":18.04},"width":53.84,"height":45.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-17.png","element":"img","alt":"Af","inline":true,"padRight":true},{"text":"is contained in the region of attraction of the equilibrium at the origin, as long as","element":"span"}],[{"style":{"width":"44%"},"width":777,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-18.png","element":"img"}],[{"text":"As a result, Proposition ","element":"span"},{"href":"#id-82","text":"12 ","element":"a"},{"text":"implies that the convergence rate of the trajectories starting from ","element":"span"},{"style":{"height":13.6},"width":82.71,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-19.png","element":"img","alt":" A ⊂","inline":true},{"style":{"height":18.04},"width":111.44,"height":45.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-20.png","element":"img","alt":"Af, A","inline":true,"padRight":true},{"text":"compact, is given by the linearization of ","element":"span"},{"href":"#id-39","text":"(8) ","element":"a"},{"text":"about the origin.","element":"span"}],[{"text":"After a change of coordinates that diagonalizes ","element":"span"},{"style":{"height":14.62},"width":50.27,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-21.png","element":"img","alt":" He","inline":true},{"text":", the linearized dynamics of a single coordinate are given by","element":"span"}],[{"id":"id-88","style":{"width":"81%"},"width":1407,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/24-22.png","element":"img"}],[{"style":{"width":"59%"},"width":1029,"height":314,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-0.png","element":"img"}],[{"id":"id-87","text":"Figure 4: The figure shows ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"d","element":"figcaption","subtype":"caption"},{"text":"(","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"t","element":"figcaption","subtype":"caption"},{"text":") ","element":"figcaption","subtype":"caption"},{"text":"defined according to ","element":"figcaption","subtype":"caption"},{"href":"#id-83","text":"(49) ","element":"a","subtype":"caption"},{"text":"with ","element":"figcaption","subtype":"caption"},{"style":{"height":19.64},"width":600.75,"height":49.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-1.png","element":"img","alt":" d(0) = 1 and d∞ = 1/√10 as an","inline":true,"padRight":true},{"text":"example. Compared to the time-invariant dynamics (obtained in the asymptotic limit), which converge with rate ","element":"figcaption","subtype":"caption"},{"style":{"height":15.02},"width":56.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-2.png","element":"img","alt":" d∞","inline":true},{"text":", the time-varying terms improve the convergence rate by the shaded area, as indicated with ","element":"figcaption","subtype":"caption"},{"href":"#id-84","text":"(47)","element":"a","subtype":"caption"},{"text":".","element":"figcaption","subtype":"caption"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":497.18,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-3.png","element":"img","alt":" δq(t) ∈ R, δp(t) ∈ R, and h","inline":true,"padRight":true},{"text":"denotes the corresponding eigenvalue of ","element":"span"},{"style":{"height":14.62},"width":50.27,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-4.png","element":"img","alt":" He","inline":true},{"text":". As in Section ","element":"span"},{"href":"#id-29","text":"2.6, ","element":"a"},{"text":"upper and lower bounds on the eigenvalues of ","element":"span"},{"style":{"height":14.62},"width":50.27,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-5.png","element":"img","alt":" He","inline":true,"padRight":true},{"text":"are given by ","element":"span"},{"style":{"height":17.6},"width":232.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-6.png","element":"img","alt":" 1/κ ≤ h ≤ 1","inline":true},{"text":". The analysis simplifies by expressing the dynamics using the coordinates ","element":"span"},{"style":{"height":16},"width":46.3,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-7.png","element":"img","alt":" δ˜q","inline":true},{"text":", which are defined by","element":"span"}],[{"id":"id-84","style":{"width":"68%"},"width":1186,"height":112,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-8.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"height":19.41},"width":414.1,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-9.png","element":"img","alt":"¯d(t) := d(t) + β(t)h/2","inline":true},{"text":", and yields","element":"span"}],[{"id":"id-85","style":{"width":"68%"},"width":1186,"height":59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-10.png","element":"img"}],[{"text":"The dynamics ","element":"span"},{"href":"#id-85","text":"(48) ","element":"a"},{"text":"represent an undamped linear harmonic oscillator whose frequency varies with time. This time-varying frequency can be interpreted as the time-varying generalization of the square-root term in ","element":"span"},{"href":"#id-60","text":"(24)","element":"a"},{"text":". The non-square-root term in ","element":"span"},{"href":"#id-60","text":"(24) ","element":"a"},{"text":"is captured by the coordinate change ","element":"span"},{"href":"#id-84","text":"(47)","element":"a"},{"text":".","element":"span"}],[{"text":"In the following, the choice","element":"span"}],[{"id":"id-83","style":{"width":"67%"},"width":1161,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-11.png","element":"img"}],[{"text":"is discussed. This leads to a time-invariant right-hand side of ","element":"span"},{"href":"#id-85","text":"(48)","element":"a"},{"text":", and simplifies the subsequent analysis. The more general case (no restriction on ","element":"span"},{"style":{"height":15.4},"width":29.72,"height":38.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-12.png","element":"img","alt":"¯d","inline":true},{"text":") can be analyzed numerically, or by approximating ","element":"span"},{"style":{"height":15.4},"width":29.72,"height":38.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-13.png","element":"img","alt":"¯d","inline":true,"padRight":true},{"text":"with a rational function and applying tools from asymptotic analysis ","element":"span"},{"href":"#id-86","referenceIndex":39,"text":"(White, ","element":"a"},{"href":"#id-86","referenceIndex":39,"text":"2010)","element":"a"},{"text":". The solutions of ","element":"span"},{"href":"#id-83","text":"(49) ","element":"a"},{"text":"monotonically approach ","element":"span"},{"style":{"height":15.02},"width":138.4,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-14.png","element":"img","alt":" d∞ as t","inline":true,"padRight":true},{"text":"increases, whereby the time-invariant case discussed in Section ","element":"span"},{"href":"#id-29","text":"2.6 ","element":"a"},{"text":"is recovered with ","element":"span"},{"style":{"height":17.6},"width":1029.87,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-15.png","element":"img","alt":" d(0) = d∞ (d(0) = d∞ implies d(t) = d∞ for all t ≥ 0).","inline":true,"padRight":true},{"text":"The coordinate transformation ","element":"span"},{"href":"#id-84","text":"(47) ","element":"a"},{"text":"therefore implies that, compared to the time-invariant case, the convergence rate can be improved by choosing ","element":"span"},{"style":{"height":17.6},"width":193.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-16.png","element":"img","alt":" d(0) > d∞","inline":true},{"text":", as this increases the area underneath the curve ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"text":". Figure ","element":"span"},{"href":"#id-87","text":"4 ","element":"a"},{"text":"illustrates the situation.","element":"span"}],[{"text":"This intuition can be made quantitative by noticing that the solutions of ","element":"span"},{"href":"#id-83","text":"(49) ","element":"a"},{"text":"are given by","element":"span"}],[{"style":{"width":"70%"},"width":1213,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/25-17.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.08},"width":47.19,"height":37.71,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-0.png","element":"img","alt":" Cd","inline":true,"padRight":true},{"text":"is constant and related to ","element":"span"},{"style":{"height":17.6},"width":91.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-1.png","element":"img","alt":" d(t0)","inline":true},{"text":". This implies","element":"span"}],[{"style":{"width":"87%"},"width":1517,"height":233,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-2.png","element":"img"}],[{"text":"Due to the fact that ","element":"span"},{"href":"#id-85","text":"(48) ","element":"a"},{"text":"reduces to a time-invariant harmonic oscillator, the solutions to ","element":"span"},{"href":"#id-88","text":"(46) ","element":"a"},{"text":"can be expressed in closed form with elementary functions. The fundamental solutions to ","element":"span"},{"href":"#id-88","text":"(46) ","element":"a"},{"text":"are therefore given by","element":"span"},{"text":"1","element":"span"}],[{"id":"id-90","style":{"width":"64%"},"width":1115,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-3.png","element":"img"}],[{"text":"The rate of the exponential decay behaves in the same way as the real part of the expression ","element":"span"},{"href":"#id-60","text":"(24) ","element":"a"},{"text":"(","element":"span"},{"style":{"height":15.02},"width":56.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-4.png","element":"img","alt":"d∞","inline":true,"padRight":true},{"text":"corresponds to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"in ","element":"span"},{"href":"#id-60","text":"(24)","element":"a"},{"text":") and as a result, the discussion for the time-invariant case applies here in the same way. ","element":"span"},{"text":"This concludes that accelerated convergence occurs as long as ","element":"span"},{"style":{"height":17.77},"width":230.5,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-5.png","element":"img","alt":"d∞ ∼ 1/√κ","inline":true},{"text":". Compared to the time-invariant case, however, the convergence is improved by the factor ","element":"span"},{"style":{"height":17.6},"width":238.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-6.png","element":"img","alt":" ρtv(t)/ρtv(t0)","inline":true},{"text":". For large values of ","element":"span"},{"style":{"height":15.02},"width":56.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-7.png","element":"img","alt":" d∞","inline":true,"padRight":true},{"text":"the difference is not substantial (at most a constant factor of size ","element":"span"},{"style":{"height":17.6},"width":783.43,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-8.png","element":"img","alt":" limt→∞ ρtv(t)/ρtv(t0) = 2d∞/(d∞ + d(t0))","inline":true,"padRight":true},{"text":"can be gained). However, the improvement becomes crucial for small values of ","element":"span"},{"style":{"height":17.6},"width":419.86,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-9.png","element":"img","alt":" d∞, since ρtv(t)/ρtv(t0)","inline":true,"padRight":true},{"text":"behaves as","element":"span"}],[{"style":{"width":"29%"},"width":505,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-10.png","element":"img"}],[{"text":"for small ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". Thus choosing ","element":"span"},{"style":{"height":17.77},"width":221.76,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-11.png","element":"img","alt":" d∞ ∼ 1/√κ","inline":true},{"text":", ensures convergence roughly with ","element":"span"},{"style":{"height":17.6},"width":419.61,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-12.png","element":"img","alt":" 1/(1+d(t0)(t−t0)) for","inline":true,"padRight":true},{"text":"small ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", and exponential convergence with rate ","element":"span"},{"style":{"height":17.77},"width":288.42,"height":44.42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-13.png","element":"img","alt":" 1/√κ for large t","inline":true},{"text":"; the transition occurs for ","element":"span"},{"style":{"height":17.6},"width":177,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-14.png","element":"img","alt":" (t−t0) ≈","inline":true},{"style":{"height":17.6},"width":100.35,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-15.png","element":"img","alt":"1/d∞","inline":true},{"text":". The situation is shown in Figure ","element":"span"},{"href":"#id-89","text":"5.","element":"a"}],[{"text":"Summarizing, we therefore conclude according to Proposition ","element":"span"},{"href":"#id-82","text":"12 ","element":"a"},{"text":"that choosing ","element":"span"},{"style":{"height":17.77},"width":233.28,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-16.png","element":"img","alt":" d∞ ∼ 1/√κ","inline":true,"padRight":true},{"text":"implies that the trajectories of the time-varying generalization of ","element":"span"},{"href":"#id-39","text":"(8) ","element":"a"},{"text":"satisfy the estimate","element":"span"}],[{"style":{"width":"90%"},"width":1571,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-17.png","element":"img"}],[{"text":"for some constant ","element":"span"},{"style":{"height":15.02},"width":56.05,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-18.png","element":"img","alt":" Ctv","inline":true,"padRight":true},{"text":"(that might strongly depend on ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-19.png","element":"img","alt":" κ","inline":true},{"text":") and for all ","element":"span"},{"style":{"height":18.44},"width":542.1,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-20.png","element":"img","alt":" (q(0), p(0)) ∈ A ⊂ Af, where","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"is compact. For any ","element":"span"},{"style":{"height":14},"width":105.32,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-21.png","element":"img","alt":" κ ≥ 1","inline":true},{"text":", the right-hand side can be bounded by","element":"span"}],[{"style":{"width":"66%"},"width":1152,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-22.png","element":"img"}],[{"text":"which implies that the convergence rate is at least ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/t","element":"span"},{"text":") ","element":"span"},{"text":"for any fixed function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"that has non-degenerate isolated local minima (the minima can have an arbitrarily small curvature). We believe that this provides useful insights for the case where the isolated minimum at the origin is degenerate. For example, given a function with an isolated minimum that is degenerate, we can add a regularization of the type ","element":"span"},{"style":{"height":19.13},"width":384.5,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/26-23.png","element":"img","alt":" ϵ|x|2/2, where ϵ > 0","inline":true,"padRight":true},{"text":"is small. The bound (54) applies to the resulting regularized function, and implies a convergence rate of the order ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/t","element":"span"},{"text":")","element":"span"},{"text":".","element":"span"}],[{"style":{"width":"78%"},"width":1350,"height":623,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-0.png","element":"img"}],[{"id":"id-89","text":"Figure 5: The plot shows the convergence rate of the fundamental solutions ","element":"figcaption","subtype":"caption"},{"href":"#id-90","text":"(52) ","element":"a","subtype":"caption"},{"text":"as a function of ","element":"figcaption","subtype":"caption"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-1.png","element":"img","alt":" κ","inline":true}],[{"style":{"width":"89%"},"width":1546,"height":156,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-2.png","element":"img"}],[{"text":"It is important to notice that the results in this section are stated in terms of the distance between the current iterate and the optimizer, i.e., ","element":"span"},{"style":{"height":17.6},"width":324.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-3.png","element":"img","alt":" |(q(t) − x∗, p(t))|","inline":true},{"text":". In the case of smooth and convex functions, this yield the following bound on the function value:","element":"span"}],[{"style":{"width":"61%"},"width":1067,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-4.png","element":"img"}],[{"text":"provided that the algorithm is initialized with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"(0) = 0","element":"span"},{"text":". Thus, our analysis recovers the well-known optimal rate ","element":"span"},{"style":{"height":19.13},"width":148.24,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-5.png","element":"img","alt":" O(1/t2)","inline":true,"padRight":true},{"text":"(in terms of function value) for smooth and convex functions.","element":"span"}],[{"id":"id-33","style":{"fontWeight":"bold"},"text":"3.3 Example 2","element":"span"}],[{"text":"We consider the time-varying generalization of ","element":"span"},{"href":"#id-43","text":"(13)","element":"a"},{"text":", where the parameters ","element":"span"},{"style":{"height":16.4},"width":353.72,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-6.png","element":"img","alt":" d > 0 and β ≥ 0","inline":true,"padRight":true},{"text":"are assumed to be time varying. We focus on the case where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is convex and ","element":"span"},{"style":{"height":16.4},"width":42.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-7.png","element":"img","alt":" βk","inline":true,"padRight":true},{"text":"is chosen to be ","element":"span"},{"style":{"height":17.6},"width":364.78,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-8.png","element":"img","alt":" βk = T(1 − 2dkT)","inline":true},{"text":", as this simplifies the stability analysis and enables explicit closed-form evaluation of certain expressions. The stability analysis carried out in Appendix ","element":"span"},{"text":"E ","element":"span"},{"text":"applies likewise to the time-varying case; it suffices to replace ","element":"span"},{"style":{"height":16.4},"width":441.39,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-9.png","element":"img","alt":" d with dk and β with βk","inline":true},{"text":". This concludes that the origin is asymptotically stable as long as ","element":"span"},{"style":{"height":19.98},"width":547.01,"height":49.95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-10.png","element":"img","alt":" 0 < dkT < 1, 0 < T ≤ 1/√L.","inline":true}],[{"text":"After a change of coordinates that diagonalizes ","element":"span"},{"style":{"height":14.62},"width":50.27,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-11.png","element":"img","alt":" He","inline":true},{"text":", the linearized dynamics of a single coordinate are given by","element":"span"}],[{"id":"id-91","style":{"width":"98%"},"width":1703,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-12.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":535.54,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-13.png","element":"img","alt":" δq(k) ∈ R, δp(k) ∈ R, and h","inline":true,"padRight":true},{"text":"denotes the corresponding eigenvalue of ","element":"span"},{"style":{"height":14.62},"width":50.27,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-14.png","element":"img","alt":" He","inline":true,"padRight":true},{"text":"with upper and lower bounds ","element":"span"},{"style":{"height":17.6},"width":237.85,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-15.png","element":"img","alt":" 1/κ ≤ h ≤ 1","inline":true},{"text":". In analogy to the transformation ","element":"span"},{"href":"#id-84","text":"(47) ","element":"a"},{"text":"in the continuous-time case, we define","element":"span"}],[{"style":{"width":"83%"},"width":1449,"height":145,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/27-16.png","element":"img"}],[{"text":"which transforms ","element":"span"},{"href":"#id-91","text":"(55) ","element":"a"},{"text":"into","element":"span"}],[{"id":"id-93","style":{"width":"88%"},"width":1523,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-0.png","element":"img"}],[{"text":"This transformation is well known in the literature on linear difference equations (see, for example, ","element":"span"},{"href":"#id-92","referenceIndex":13,"text":"Elaydi, ","element":"a"},{"href":"#id-92","referenceIndex":13,"text":"2005, ","element":"a"},{"text":"p. 369). As in the continuous-time case, the analysis is considerably simplified if ","element":"span"},{"style":{"height":16.4},"width":42.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-1.png","element":"img","alt":" βk","inline":true,"padRight":true},{"text":"is chosen in such a way that the coefficient multiplying ","element":"span"},{"style":{"height":17.6},"width":100.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-2.png","element":"img","alt":" δ˜q(k)","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-93","text":"(57) ","element":"a"},{"text":"remains constant. This can be achieved, for example, with","element":"span"}],[{"id":"id-94","style":{"width":"71%"},"width":1235,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":10.62},"width":52.78,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-4.png","element":"img","alt":" αtv","inline":true,"padRight":true},{"text":"is constant and independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":". This equation defines a recurrence relation for ","element":"span"},{"style":{"height":16.4},"width":130.74,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-5.png","element":"img","alt":" βk, and","inline":true,"padRight":true},{"text":"through ","element":"span"},{"style":{"height":17.6},"width":350.41,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-6.png","element":"img","alt":" βk = T(1 − 2dkT)","inline":true,"padRight":true},{"text":"a relation for ","element":"span"},{"style":{"height":15.24},"width":40.72,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-7.png","element":"img","alt":" dk","inline":true},{"text":". The choice ","element":"span"},{"href":"#id-94","text":"(58) ","element":"a"},{"text":"is motivated by the fact that 1) the recurrence relation is independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"and 2) the difference equations ","element":"span"},{"href":"#id-93","text":"(57) ","element":"a"},{"text":"is time-invariant, which enables closed-form solutions. Thus, any trajectory ","element":"span"},{"style":{"height":17.6},"width":100.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-8.png","element":"img","alt":" δq(k)","inline":true,"padRight":true},{"text":"satisfying ","element":"span"},{"href":"#id-91","text":"(55) ","element":"a"},{"text":"is a linear combination of the two fundamental solutions","element":"span"}],[{"id":"id-96","style":{"width":"82%"},"width":1434,"height":138,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-9.png","element":"img"}],[{"text":"This explicitly characterizes the convergence rate of ","element":"span"},{"href":"#id-91","text":"(55) ","element":"a"},{"text":"and, by virtue of Proposition ","element":"span"},{"href":"#id-82","text":"12, ","element":"a"},{"text":"the convergence rate of ","element":"span"},{"href":"#id-43","text":"(13)","element":"a"},{"text":", (for time-varying ","element":"span"},{"style":{"height":16.4},"width":176.8,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-10.png","element":"img","alt":" dk and βk","inline":true},{"text":"). Similar to the continuous-time discussion in Section ","element":"span"},{"href":"#id-32","text":"3.2, ","element":"a"},{"text":"the convergence rate can be improved by choosing ","element":"span"},{"style":{"height":15.24},"width":40.71,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-11.png","element":"img","alt":" dk","inline":true,"padRight":true},{"text":"to be close to one for small ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":". For the specific choice ","element":"span"},{"style":{"height":19.64},"width":1304.82,"height":49.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-12.png","element":"img","alt":" T = 0.5, β0 = 0, d0 = 1, limk→∞ dk = d∞ = 1/√2κ, L = 1, the","inline":true,"padRight":true},{"text":"dependence of the convergence rate on ","element":"span"},{"style":{"height":8.4},"width":25,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-13.png","element":"img","alt":" κ","inline":true,"padRight":true},{"text":"is illustrated in Figure ","element":"span"},{"href":"#id-95","text":"6. ","element":"a"},{"text":"As in the continuous-time case, the introduction of the time-varying parameter ","element":"span"},{"style":{"height":15.24},"width":40.71,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-14.png","element":"img","alt":" dk","inline":true,"padRight":true},{"text":"ensures that the fundamental solutions converge approximately with ","element":"span"},{"style":{"height":17.6},"width":460.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-15.png","element":"img","alt":" 1/(1 + d0Tk) for small k","inline":true},{"text":", whereas for large ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"the convergence is linear with rate ","element":"span"},{"style":{"height":19.64},"width":212.2,"height":49.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-16.png","element":"img","alt":" 1 − T/√2κ","inline":true},{"text":". The transition occurs for ","element":"span"},{"style":{"height":19.64},"width":222.67,"height":49.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-17.png","element":"img","alt":" k ∼√2κ/T","inline":true},{"text":". This indicates that even for very large ","element":"span"},{"style":{"height":10.8},"width":36.15,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-18.png","element":"img","alt":" κ,","inline":true,"padRight":true},{"text":"the fundamental solutions converge roughly with ","element":"span"},{"style":{"height":17.6},"width":261.02,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-19.png","element":"img","alt":" 1/(1 + d0Tk).","inline":true}]]},{"heading":"4. Conclusions","paragraphs":[[{"text":"We have presented a convergence-rate analysis of momentum-based optimization algorithms from a dynamical systems point of view. Our analysis emphasizes the importance of the curvature properties about an isolated local minimum, which, up to a multiplicative constant, dictate the convergence rate. In addition, we find that reducing the damping over time improves the convergence rate by sublinear terms, which is important for objective functions that have minima with almost vanishing curvature. The use of momentum ensures robustness of the convergence rate against small changes in the curvature and leads, in many cases, to acceleration.","element":"span"}],[{"text":"We also provided a rigorous motivation for the use of symplectic discretization schemes, showing that they enable the computation of a modified energy function that is (almost exactly) preserved by the conservative parts of the dynamics. The modified energy function provides a means for stability analysis, which implies, for example, that certain heavy-ball-type methods are in fact accelerated. Thus, an evaluation of the gradient at a shifted position is not necessary for achieving convergence rates that scale with ","element":"span"},{"style":{"height":17.77},"width":311.33,"height":44.43,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/28-20.png","element":"img","alt":" 1/√κ for large κ.","inline":true}],[{"style":{"width":"76%"},"width":1329,"height":644,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/29-0.png","element":"img"}],[{"id":"id-95","text":"Figure 6: The plot shows the convergence rate of the fundamental solutions ","element":"figcaption","subtype":"caption"},{"href":"#id-96","text":"(59) ","element":"a","subtype":"caption"},{"text":"as a function of ","element":"figcaption","subtype":"caption"},{"style":{"height":8.4},"width":36.15,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/29-1.png","element":"img","alt":" κ.","inline":true,"padRight":true},{"text":"For small values of ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"k","element":"figcaption","subtype":"caption"},{"text":", the convergence rate is roughly ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":251.3,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/29-2.png","element":"img","alt":" 1/(1 + d0Tk)","inline":true},{"text":", whereas for larger values of ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"k ","element":"figcaption","subtype":"caption"},{"text":"the convergence is linear with rate ","element":"figcaption","subtype":"caption"},{"style":{"height":19.64},"width":221.64,"height":49.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/29-3.png","element":"img","alt":" 1 − T/√2κ.","inline":true}],[{"text":"Our analysis emphasizes fundamental similarities between convergence rate analysis in continuous time and discrete time, and provides intuitive and rigorous explanations for various phenomena encountered in optimization, without resorting to convexity.","element":"span"}]]},{"heading":"Acknowledgments","paragraphs":[[{"text":"We thank the Branco Weiss Fellowship, administered by ETH Zurich, for the generous support and the Office of Naval Research under grant number N00014-18-1-2764. This work was also funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)–MU 4710/2-1.","element":"span"}]]},{"heading":"Appendix A. Proof of Proposition 3","paragraphs":[[{"text":"For proving the proposition we rely on the following lemmas.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Lemma 13 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider the trajectories of a time-varying dynamical system","element":"span"}],[{"id":"id-51","style":{"width":"68%"},"width":1177,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-0.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"with ","element":"span"},{"style":{"height":17.13},"width":840.68,"height":42.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-1.png","element":"img","alt":" A ∈ R2n×2n and B : I → R2n×2n, where B","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is either continuous (in the continuous-time case) or ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is invertible (in the discrete-time case). Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be such that the trajectories ","element":"span"},{"style":{"height":18.73},"width":301.55,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-2.png","element":"img","alt":"w+(t) = Aw(t)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"converge exponentially with rate ","element":"span"},{"style":{"height":17.6},"width":813.22,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-3.png","element":"img","alt":" α, i.e., |w(t)| ≤ Cw|w(0)| exp(−αt), for all","inline":true},{"style":{"height":19.13},"width":325.41,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-4.png","element":"img","alt":"w(0) ∈ R2n, t ∈ I","inline":true},{"style":{"fontStyle":"italic"},"text":", and for some constant ","element":"span"},{"style":{"height":17.6},"width":342.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-5.png","element":"img","alt":" Cw ≥ 1. Then, x(t)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfies the estimate","element":"span"}],[{"style":{"width":"80%"},"width":1399,"height":259,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-6.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"We consider the continuous-time case first. Due to the continuity assumptions on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"text":", the dynamics are guaranteed to have a unique solution, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"text":", satisfying","element":"span"}],[{"style":{"width":"78%"},"width":1349,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-7.png","element":"img"}],[{"text":"where ","element":"span"},{"text":"exp ","element":"span"},{"text":"denotes the matrix exponential. The two-norm of ","element":"span"},{"style":{"height":17.6},"width":247.74,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-8.png","element":"img","alt":" exp(A(t−τ))","inline":true,"padRight":true},{"text":"is, by assumption, upper bounded by ","element":"span"},{"style":{"height":17.6},"width":745.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-9.png","element":"img","alt":" Cw exp(−α(t − τ)), for all t, τ ∈ I, t ≥ τ","inline":true},{"text":", yielding the following estimate:","element":"span"}],[{"id":"id-97","style":{"width":"81%"},"width":1405,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-10.png","element":"img"}],[{"text":"Applying the Gr¨onwall inequality to ","element":"span"},{"href":"#id-97","text":"(64) ","element":"a"},{"text":"yields the desired result. The discrete-time case is analogous. Due to the assumption of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"+ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"text":"being full rank, the dynamics are guaranteed to have a unique solution ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"text":", which satisfies","element":"span"}],[{"style":{"width":"73%"},"width":1269,"height":128,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-11.png","element":"img"}],[{"text":"see, for example, ","element":"span"},{"href":"#id-98","referenceIndex":1,"text":"Agarwal ","element":"a"},{"href":"#id-98","referenceIndex":1,"text":"(2000, ","element":"a"},{"text":"p. 59). By assumption, the two-norm of ","element":"span"},{"style":{"height":14.33},"width":89.31,"height":35.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-12.png","element":"img","alt":" At−τ ","inline":true,"padRight":true},{"text":"is bounded by ","element":"span"},{"style":{"height":17.6},"width":745.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-13.png","element":"img","alt":"Cw exp(−α(t − τ)), for all t, τ ∈ I, t ≥ τ","inline":true},{"text":". As a result, the following estimate is obtained:","element":"span"}],[{"style":{"width":"84%"},"width":1452,"height":279,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/30-14.png","element":"img"}],[{"text":"Applying the Gr¨onwall inequality ","element":"span"},{"href":"#id-98","referenceIndex":1,"text":"(Agarwal, ","element":"a"},{"href":"#id-98","referenceIndex":1,"text":"2000, ","element":"a"},{"text":"p. 186), yields","element":"span"}],[{"style":{"width":"78%"},"width":1362,"height":339,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-0.png","element":"img"}],[{"id":"id-49","style":{"fontWeight":"bold"},"text":"Lemma 14 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let Assumption ","element":"span"},{"href":"#id-48","style":{"fontStyle":"italic"},"text":"4 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"be fulfilled. Then there exists an open ball ","element":"span"},{"style":{"height":14.84},"width":49.1,"height":37.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-1.png","element":"img","alt":" Bδ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"(with radius ","element":"span"},{"style":{"height":15.6},"width":125.7,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-2.png","element":"img","alt":" δ > 0)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"centered at the origin and a constant ","element":"span"},{"style":{"height":18.41},"width":114.49,"height":46.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-3.png","element":"img","alt":"˜C ≥ 1","inline":true},{"style":{"fontStyle":"italic"},"text":", such that for all ","element":"span"},{"style":{"height":14.84},"width":155.83,"height":37.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-4.png","element":"img","alt":" z0 ∈ Bδ,","inline":true}],[{"style":{"width":"69%"},"width":1194,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-5.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"We consider the continuous-time case first. Assumption ","element":"span"},{"href":"#id-48","text":"4 ","element":"a"},{"text":"implies that the dynamics ","element":"span"},{"href":"#id-36","text":"(4) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-38","text":"(5) ","element":"a"},{"text":"are (asymptotically) stable. This implies that for any ","element":"span"},{"style":{"height":12.4},"width":112.95,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-6.png","element":"img","alt":" ε > 0","inline":true,"padRight":true},{"text":"there exists a constant ","element":"span"},{"style":{"height":17.6},"width":167.94,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-7.png","element":"img","alt":" δ(ε) > 0","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":19.13},"width":1084.68,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-8.png","element":"img","alt":" |ϕt(z0)| < ε for all z0 ∈ R2n with |z0| < δ(ε). For any ε > 0","inline":true},{"text":", we consider the trajectory starting at ","element":"span"},{"style":{"height":19.13},"width":494.24,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-9.png","element":"img","alt":" z0 ∈ R2n, with |z0| < δ(ε)","inline":true,"padRight":true},{"text":"and reformulate the dynamics by applying the mean value theorem:","element":"span"}],[{"style":{"width":"84%"},"width":1458,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-10.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.13},"width":206.37,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-11.png","element":"img","alt":" ξ(t) ∈ R2n ","inline":true,"padRight":true},{"text":"lies between the origin and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"text":". The continuity assumption on the dynamics leads to the following estimate:","element":"span"}],[{"id":"id-99","style":{"width":"77%"},"width":1336,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-12.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":17.82},"width":341.22,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-13.png","element":"img","alt":" t ∈ I and where Cg","inline":true,"padRight":true},{"text":"is a Lipschitz constant of ","element":"span"},{"style":{"height":17.6},"width":116.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-14.png","element":"img","alt":" ∂g/∂z","inline":true,"padRight":true},{"text":"in a neighborhood of the origin. Applying Lemma ","element":"span"},{"href":"#id-51","text":"13 ","element":"a"},{"text":"then yields","element":"span"}],[{"style":{"width":"73%"},"width":1266,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-15.png","element":"img"}],[{"text":"We fix ","element":"span"},{"style":{"height":18.62},"width":520.1,"height":46.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-16.png","element":"img","alt":" ε > 0 such that ClCgε < α/2","inline":true},{"text":". This leads, in turn, to a refinement of the estimate ","element":"span"},{"href":"#id-99","text":"(72)","element":"a"},{"text":", ","element":"span"},{"style":{"height":36.98},"width":78.08,"height":92.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-17.png","element":"img","alt":"�� ∂g∂z","inline":true}],[{"style":{"width":"99%"},"width":1726,"height":215,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-18.png","element":"img"}],[{"text":"Applying Lemma ","element":"span"},{"href":"#id-51","text":"13 ","element":"a"},{"text":"once more therefore implies","element":"span"}],[{"style":{"width":"69%"},"width":1207,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-19.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":19.13},"width":459.96,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/31-20.png","element":"img","alt":" z0 ∈ R2n with |z0| < δ(ε)","inline":true},{"text":", which leads to the desired result. The same arguments (with slightly modified constants) apply to the discrete-time case.","element":"span"}]]},{"heading":"Appendix B. Proof of Proposition 12","paragraphs":[[{"text":"For proving the proposition we rely on the following lemmas.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Lemma 15 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider the trajectories of a time-varying dynamical system","element":"span"}],[{"id":"id-81","style":{"width":"69%"},"width":1202,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-0.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"with ","element":"span"},{"style":{"height":17.13},"width":1098.04,"height":42.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-1.png","element":"img","alt":" A : I → R2n×2n and B : I → R2n×2n, where A and B","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are either continuous (in the continuous-time case) or ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") + ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is invertible for all ","element":"span"},{"style":{"height":12.8},"width":105.9,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-2.png","element":"img","alt":" t ∈ I","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"(in the discrete-time case). Let ","element":"span"},{"style":{"height":19.13},"width":657.62,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-3.png","element":"img","alt":"φ(t, τ) ∈ R2n×2n, t, τ ∈ I, t ≥ τ","inline":true},{"style":{"fontStyle":"italic"},"text":", be defined by ","element":"span"},{"style":{"height":18.73},"width":427.5,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-4.png","element":"img","alt":" φ(t, τ)+ = A(t)φ(t, τ)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"(for a fixed ","element":"span"},{"style":{"height":15.6},"width":122.59,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-5.png","element":"img","alt":" τ) and","inline":true},{"style":{"height":17.6},"width":205.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-6.png","element":"img","alt":"φ(τ, τ) = I","inline":true},{"style":{"fontStyle":"italic"},"text":", and assume ","element":"span"},{"style":{"height":17.6},"width":268.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-7.png","element":"img","alt":" φ(t, τ) satisfies","inline":true}],[{"style":{"width":"20%"},"width":350,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":17.64},"width":133.5,"height":44.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-9.png","element":"img","alt":" Cφ ≥ 1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is constant, ","element":"span"},{"style":{"height":17.02},"width":224.28,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-10.png","element":"img","alt":" ρ : I → R≥0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is continuous, monotonically decreasing, and ","element":"span"},{"style":{"height":17.6},"width":268.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-11.png","element":"img","alt":" limt→∞ ρ(t) =","inline":true,"padRight":true},{"text":"0","element":"span"},{"style":{"fontStyle":"italic"},"text":". Then, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"satisfies the estimate","element":"span"}],[{"style":{"width":"88%"},"width":1521,"height":282,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-12.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"We consider the continuous-time case first. Due to the continuity assumptions on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"text":", the dynamics are guaranteed to have a unique solution ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"text":", satisfying","element":"span"}],[{"style":{"width":"72%"},"width":1259,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-13.png","element":"img"}],[{"text":"The bound on the fundamental solution matrix ","element":"span"},{"style":{"height":17.6},"width":119.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-14.png","element":"img","alt":" φ(t, τ)","inline":true,"padRight":true},{"text":"yields the following estimate:","element":"span"}],[{"id":"id-100","style":{"width":"74%"},"width":1294,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-15.png","element":"img"}],[{"text":"Applying the Gr¨onwall inequality to ","element":"span"},{"href":"#id-100","text":"(81) ","element":"a"},{"text":"yields the desired result. The discrete-time case is analogous. Due to the assumption of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") + ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"text":"being full rank, the dynamics are guaranteed to have a unique solution ","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"text":", which satisfies","element":"span"}],[{"style":{"width":"77%"},"width":1334,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-16.png","element":"img"}],[{"text":"see, for example, ","element":"span"},{"href":"#id-98","referenceIndex":1,"text":"(Agarwal, ","element":"a"},{"href":"#id-98","referenceIndex":1,"text":"2000, ","element":"a"},{"text":"p. 59). As a result, the bound on ","element":"span"},{"style":{"height":17.6},"width":119.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-17.png","element":"img","alt":" φ(t, τ)","inline":true,"padRight":true},{"text":"leads to the following estimate","element":"span"}],[{"style":{"width":"83%"},"width":1450,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/32-18.png","element":"img"}],[{"text":"Applying the Gr¨onwall inequality ","element":"span"},{"href":"#id-98","referenceIndex":1,"text":"(Agarwal, ","element":"a"},{"href":"#id-98","referenceIndex":1,"text":"2000, ","element":"a"},{"text":"p. 186), yields","element":"span"}],[{"style":{"width":"78%"},"width":1353,"height":313,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-0.png","element":"img"}],[{"text":"where the fact that ","element":"span"},{"style":{"height":15.2},"width":442.92,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-1.png","element":"img","alt":" 1 + x ≤ ex, for all x ≥ 0","inline":true,"padRight":true},{"text":"has been used.","element":"span"}],[{"id":"id-79","style":{"fontWeight":"bold"},"text":"Lemma 16 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let Assumption ","element":"span"},{"href":"#id-78","style":{"fontStyle":"italic"},"text":"6 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"be fulfilled. Then there exists an open ball ","element":"span"},{"style":{"height":14.84},"width":49.1,"height":37.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-2.png","element":"img","alt":" Bδ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"(with radius ","element":"span"},{"style":{"height":15.6},"width":125.7,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-3.png","element":"img","alt":" δ > 0)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"centered at the origin and a constant ","element":"span"},{"style":{"height":18.41},"width":114.49,"height":46.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-4.png","element":"img","alt":"˜C ≥ 1","inline":true},{"style":{"fontStyle":"italic"},"text":", such that for all ","element":"span"},{"style":{"height":15.24},"width":410.95,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-5.png","element":"img","alt":" z0 ∈ Bδ, and all τ ∈ I,","inline":true}],[{"style":{"width":"75%"},"width":1302,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-6.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"We consider the continuous-time case first and fix the initial time ","element":"span"},{"style":{"height":12.8},"width":115.09,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-7.png","element":"img","alt":" τ ∈ I","inline":true},{"text":". Assumption ","element":"span"},{"href":"#id-78","text":"6 ","element":"a"},{"text":"implies that the dynamics ","element":"span"},{"href":"#id-77","text":"(37) ","element":"a"},{"text":"are (asymptotically) stable, uniformly in ","element":"span"},{"style":{"height":8},"width":23,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-8.png","element":"img","alt":" τ","inline":true},{"text":". This implies that for any ","element":"span"},{"style":{"height":12.4},"width":100.53,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-9.png","element":"img","alt":"ε > 0","inline":true,"padRight":true},{"text":"there exists a constant ","element":"span"},{"style":{"height":17.6},"width":155.51,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-10.png","element":"img","alt":" δ(ε) > 0","inline":true,"padRight":true},{"text":"(independent of ","element":"span"},{"style":{"height":8},"width":23,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-11.png","element":"img","alt":" τ","inline":true},{"text":") such that ","element":"span"},{"style":{"height":17.6},"width":559.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-12.png","element":"img","alt":" |ϕt(z0, τ)| < ε for all t ≥ τ and","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":19.13},"width":733.5,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-13.png","element":"img","alt":" z0 ∈ R2n with |z0| < δ(ε). For any ε > 0","inline":true},{"text":", we consider the trajectory starting at ","element":"span"},{"style":{"height":17.35},"width":208,"height":43.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-14.png","element":"img","alt":" z0 ∈ R2n at","inline":true,"padRight":true},{"text":"time ","element":"span"},{"style":{"height":17.6},"width":407.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-15.png","element":"img","alt":" τ ∈ I, with |z0| < δ(ε)","inline":true,"padRight":true},{"text":"and reformulate the dynamics by applying the mean value theorem:","element":"span"}],[{"style":{"width":"86%"},"width":1502,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-16.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.13},"width":196.22,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-17.png","element":"img","alt":" ξ(t) ∈ R2n ","inline":true,"padRight":true},{"text":"lies between the origin and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"text":". The assumptions on the dynamics, leads to the following estimate:","element":"span"}],[{"id":"id-101","style":{"width":"78%"},"width":1357,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-18.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":14.8},"width":209.99,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-19.png","element":"img","alt":" t ∈ I, t ≥ τ","inline":true},{"text":", and where ","element":"span"},{"style":{"height":17.82},"width":47.19,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-20.png","element":"img","alt":" Cg","inline":true,"padRight":true},{"text":"is a time-independent Lipschitz constant of ","element":"span"},{"style":{"height":17.6},"width":116.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-21.png","element":"img","alt":" ∂g/∂z","inline":true,"padRight":true},{"text":"in a neighborhood of the origin.","element":"span"}],[{"text":"By assumption, the linear dynamics decay at least with ","element":"span"},{"style":{"height":17.6},"width":312.02,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-22.png","element":"img","alt":" exp(−α(t − τ)).","inline":true,"padRight":true},{"text":"Thus, applying Lemma ","element":"span"},{"href":"#id-81","text":"15 ","element":"a"},{"text":"yields","element":"span"}],[{"style":{"width":"85%"},"width":1477,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-23.png","element":"img"}],[{"text":"We fix ","element":"span"},{"style":{"height":18.62},"width":520.1,"height":46.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-24.png","element":"img","alt":" ε > 0 such that ClCgε < α/2","inline":true},{"text":". This leads, in turn, to a refinement of the estimate ","element":"span"},{"href":"#id-101","text":"(88)","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"99%"},"width":1726,"height":361,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/33-25.png","element":"img"}],[{"text":"Applying Lemma ","element":"span"},{"href":"#id-81","text":"15 ","element":"a"},{"text":"once more therefore implies","element":"span"}],[{"style":{"width":"77%"},"width":1336,"height":135,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-0.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":19.13},"width":459.96,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-1.png","element":"img","alt":" z0 ∈ R2n with |z0| < δ(ε)","inline":true},{"text":", which leads to the desired result. The same arguments (with slightly modified constants) apply to the discrete-time case.","element":"span"}]]},{"heading":"Appendix C. Proof of Proposition 9","paragraphs":[[{"text":"We prove the following generalization of Proposition ","element":"span"},{"href":"#id-66","text":"9:","element":"a"}],[{"style":{"fontWeight":"bold"},"text":"Proposition 17 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be analytic on ","element":"span"},{"style":{"height":16.32},"width":49.29,"height":40.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-2.png","element":"img","alt":" Bcr","inline":true},{"style":{"fontStyle":"italic"},"text":", the closed ball of radius ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"style":{"fontStyle":"italic"},"text":"about the origin, and let ","element":"span"},{"style":{"height":14.62},"width":55.7,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-3.png","element":"img","alt":" LH","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a Lipschitz constant of ","element":"span"},{"style":{"height":17.6},"width":888.19,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-4.png","element":"img","alt":" ∇H, i.e., |∇H(z)| ≤ LH|z| for all z ∈ Bcr × Bcr","inline":true},{"style":{"fontStyle":"italic"},"text":". Then there exists a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"perturbed Hamiltonian ","element":"span"},{"style":{"height":20.32},"width":332.26,"height":50.81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-5.png","element":"img","alt":"˜H : Bcr × Bcr → R","inline":true},{"style":{"fontStyle":"italic"},"text":", whose trajectories","element":"span"}],[{"id":"id-102","style":{"width":"78%"},"width":1351,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-6.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"are such that","element":"span"}],[{"style":{"width":"72%"},"width":1247,"height":48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-7.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"height":17.6},"width":612.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-8.png","element":"img","alt":" 0 < T ≤ T0/3 and all z0 such that","inline":true}],[{"id":"id-113","style":{"width":"59%"},"width":1028,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-9.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":15.02},"width":184.89,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-10.png","element":"img","alt":" T0 and CE","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are constants given by","element":"span"}],[{"style":{"width":"50%"},"width":872,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-11.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"The perturbed Hamiltonian has the form","element":"span"}],[{"id":"id-112","style":{"width":"74%"},"width":1279,"height":89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-12.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":20.33},"width":578.24,"height":50.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-13.png","element":"img","alt":"˜H and F : Bcr × Bcr → R satisfy","inline":true}],[{"style":{"width":"97%"},"width":1677,"height":62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-14.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"with ","element":"span"},{"style":{"height":19.56},"width":244.11,"height":48.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-15.png","element":"img","alt":" CF := 356L2H","inline":true},{"style":{"fontStyle":"italic"},"text":". Moreover, ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-16.png","element":"img","alt":" ˜H","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"has the same critical points as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and satisfies","element":"span"}],[{"id":"id-115","style":{"width":"95%"},"width":1656,"height":62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-17.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Finally, at time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"style":{"fontStyle":"italic"},"text":", the difference of the perturbed energy is bounded by","element":"span"}],[{"style":{"width":"80%"},"width":1388,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-18.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where","element":"span"}],[{"style":{"width":"81%"},"width":1402,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/34-19.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"We will complexify the position and momentum variables; i.e., we take ","element":"span"},{"style":{"height":16},"width":374.97,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-0.png","element":"img","alt":" q ∈ Cn, p ∈ Cn. Due","inline":true,"padRight":true},{"text":"to the fact that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is analytic, it can be interpreted as a mapping from ","element":"span"},{"style":{"height":12.8},"width":145.46,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-1.png","element":"img","alt":" Cn to C","inline":true},{"text":"; a similar reasoning applies to ","element":"span"},{"style":{"height":21.85},"width":326.01,"height":54.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-2.png","element":"img","alt":" H. Let Bcρ ⊂ C2n","inline":true,"padRight":true},{"text":"be the closed ball of radius ","element":"span"},{"style":{"height":15.6},"width":108.8,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-3.png","element":"img","alt":" ρ > 0","inline":true},{"text":", centered about the origin, and let ","element":"span"},{"text":"the norm ","element":"span"},{"style":{"height":18.62},"width":97,"height":46.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-4.png","element":"img","alt":" || · ||ρ","inline":true,"padRight":true},{"text":"be defined as","element":"span"}],[{"style":{"width":"100%"},"width":1728,"height":381,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-5.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":337.98,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-6.png","element":"img","alt":" ˜z(t) := (˜q(t), ˜p(t))","inline":true,"padRight":true},{"text":"is a trajectory satisfying ","element":"span"},{"href":"#id-102","text":"(93)","element":"a"},{"text":", and ","element":"span"},{"style":{"height":16.4},"width":33.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-7.png","element":"img","alt":" fi","inline":true,"padRight":true},{"text":"are analytic functions, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . ","element":"span"},{"text":". Performing a Taylor expansion of ","element":"span"},{"style":{"height":17.6},"width":219.94,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-8.png","element":"img","alt":" ˜z(t) about 0","inline":true,"padRight":true},{"text":"and evaluating at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"yields, after rearranging terms,","element":"span"}],[{"id":"id-104","style":{"width":"92%"},"width":1602,"height":194,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-9.png","element":"img"}],[{"text":"where the Lie derivative ","element":"span"},{"style":{"height":15.02},"width":292.38,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-10.png","element":"img","alt":" Di : C∞ → C∞ ","inline":true,"padRight":true},{"text":"is defined by","element":"span"}],[{"style":{"width":"67%"},"width":1174,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-11.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":12.8},"width":68.31,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-12.png","element":"img","alt":" C∞ ","inline":true,"padRight":true},{"text":"denotes the set of infinitely differentiable functions mapping from ","element":"span"},{"style":{"height":17.94},"width":362.45,"height":44.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-13.png","element":"img","alt":" C2n to C2n (see also","inline":true,"padRight":true},{"href":"#id-45","referenceIndex":17,"text":"Hairer et al., ","element":"a"},{"href":"#id-45","referenceIndex":17,"text":"2002, ","element":"a"},{"text":"Ch. IX). The notation on the left-hand side of the previous equation should be read in the following order: The operator ","element":"span"},{"style":{"height":14.62},"width":48.13,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-14.png","element":"img","alt":" Di","inline":true,"padRight":true},{"text":"is applied to ","element":"span"},{"style":{"height":14.4},"width":23.5,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-15.png","element":"img","alt":" ¯g","inline":true},{"text":", yielding the function ","element":"span"},{"style":{"height":16.4},"width":247.43,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-16.png","element":"img","alt":" Di¯g, which is","inline":true,"padRight":true},{"text":"then evaluated at ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":". Applying the operator ","element":"span"},{"style":{"height":14.62},"width":48.13,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-17.png","element":"img","alt":" Di","inline":true,"padRight":true},{"text":"to the function ","element":"span"},{"style":{"height":16},"width":73.14,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-18.png","element":"img","alt":" Di¯g","inline":true,"padRight":true},{"text":"is denoted by ","element":"span"},{"style":{"height":19.62},"width":89.65,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-19.png","element":"img","alt":" D2i ¯g.","inline":true}],[{"text":"We require ","element":"span"},{"style":{"height":17.6},"width":368.22,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-20.png","element":"img","alt":" ˜z(T) = (qk+1, pk+1)","inline":true},{"text":", which leads due to ","element":"span"},{"href":"#id-103","text":"(26) ","element":"a"},{"text":"to","element":"span"}],[{"id":"id-105","style":{"width":"75%"},"width":1308,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-21.png","element":"img"}],[{"text":"Equating equal powers of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"in ","element":"span"},{"href":"#id-104","text":"(103) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-105","text":"(105) ","element":"a"},{"text":"yields the following recursive scheme for computing the functions ","element":"span"},{"style":{"height":16.4},"width":46.89,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-22.png","element":"img","alt":" fi:","inline":true}],[{"id":"id-106","style":{"width":"81%"},"width":1413,"height":247,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-23.png","element":"img"}],[{"text":"where the last sum ranges over all strictly positive integers ","element":"span"},{"style":{"height":15.6},"width":173.34,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-24.png","element":"img","alt":" k1, . . . , ki","inline":true,"padRight":true},{"text":"that sum up to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":". We will construct the perturbed Hamiltonian ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-25.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"by appropriately truncating the series ","element":"span"},{"style":{"height":19.24},"width":356.89,"height":48.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/35-26.png","element":"img","alt":"� fi(z)T i, and will","inline":true,"padRight":true},{"text":"show that the resulting truncated series has the desired properties.","element":"span"}],[{"text":"We start by explicitly computing the function ","element":"span"},{"style":{"height":16.4},"width":52.29,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-0.png","element":"img","alt":" f2:","inline":true}],[{"id":"id-107","style":{"width":"95%"},"width":1649,"height":198,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-1.png","element":"img"}],[{"text":"and note that it is of the form ","element":"span"},{"style":{"height":19.13},"width":639.02,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-2.png","element":"img","alt":" A2(z)f1(z), where A2(z) ∈ C2n×2n","inline":true},{"text":". We further note that ","element":"span"},{"style":{"height":17.6},"width":181.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-3.png","element":"img","alt":" f2(z) is in","inline":true,"padRight":true},{"text":"fact a Hamiltonian vector field with corresponding Hamiltonian ","element":"span"},{"style":{"height":19.53},"width":242.25,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-4.png","element":"img","alt":" −pT∇f(q)/2","inline":true},{"text":". From an induction argument relying on ","element":"span"},{"href":"#id-106","text":"(107) ","element":"a"},{"text":"and the definition of the Lie derivative, it follows that each ","element":"span"},{"style":{"height":17.42},"width":170.5,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-5.png","element":"img","alt":" fj can be","inline":true,"padRight":true},{"text":"written as ","element":"span"},{"style":{"height":18.22},"width":603.25,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-6.png","element":"img","alt":" fj(z) = Aj(z)f1(z), where A1(z)","inline":true,"padRight":true},{"text":"is the identity, ","element":"span"},{"style":{"height":17.6},"width":107.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-7.png","element":"img","alt":" A2(z)","inline":true,"padRight":true},{"text":"is defined in ","element":"span"},{"href":"#id-107","text":"(108)","element":"a"},{"text":", and","element":"span"}],[{"id":"id-108","style":{"width":"85%"},"width":1486,"height":139,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-8.png","element":"img"}],[{"text":"Moreover, as shown in ","element":"span"},{"href":"#id-45","referenceIndex":17,"text":"Hairer et al. ","element":"a"},{"href":"#id-45","referenceIndex":17,"text":"(2002, ","element":"a"},{"text":"p. 295,Theorem 3.2), the vector fields ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-9.png","element":"img","alt":" fj","inline":true,"padRight":true},{"text":"are guaranteed to be Hamiltonian for all ","element":"span"},{"style":{"height":16},"width":111.47,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-10.png","element":"img","alt":" j ≥ 1.","inline":true}],[{"style":{"width":"95%"},"width":1655,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-11.png","element":"img"}],[{"text":"integral formula. More precisely, Cauchy’s integral formula provides us with the following bound on ","element":"span"},{"style":{"height":17.6},"width":116.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-12.png","element":"img","alt":" ∂¯g/∂z","inline":true,"padRight":true},{"text":"for any function ","element":"span"},{"style":{"height":14.4},"width":23.49,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-13.png","element":"img","alt":" ¯g","inline":true,"padRight":true},{"text":"analytic in ","element":"span"},{"style":{"height":16.65},"width":63.44,"height":41.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-14.png","element":"img","alt":" Bcr:","inline":true}],[{"style":{"width":"78%"},"width":1350,"height":116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-15.png","element":"img"}],[{"text":"for any two constants ","element":"span"},{"style":{"height":16.4},"width":501.08,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-16.png","element":"img","alt":" ρ, δ such that 0 ≤ δ < ρ ≤ r","inline":true},{"text":". A similar argument yields the following bound on the Lie-derivative of a function ","element":"span"},{"style":{"height":14.4},"width":23.49,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-17.png","element":"img","alt":" ¯g","inline":true,"padRight":true},{"text":"analytic in ","element":"span"},{"style":{"height":16.65},"width":49.29,"height":41.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-18.png","element":"img","alt":" Bcr ","inline":true,"padRight":true},{"href":"#id-45","referenceIndex":17,"text":"(Hairer et al., ","element":"a"},{"href":"#id-45","referenceIndex":17,"text":"2002, ","element":"a"},{"text":"p. 308),","element":"span"}],[{"style":{"width":"64%"},"width":1114,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-19.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.6},"width":265.42,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-20.png","element":"img","alt":" 0 ≤ σ < ρ ≤ r","inline":true},{"text":". In the following these two inequalities are used to bound the right-hand side of ","element":"span"},{"href":"#id-108","text":"(109)","element":"a"},{"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Claim: ","element":"span"},{"text":"The function ","element":"span"},{"style":{"height":17.82},"width":47.73,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-21.png","element":"img","alt":" Aj","inline":true,"padRight":true},{"text":"is bounded above by","element":"span"}],[{"style":{"width":"71%"},"width":1244,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-22.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":16},"width":111.47,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-23.png","element":"img","alt":" j ≥ 1.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof of the claim: ","element":"span"},{"text":"Explicit calculations show that the claim holds for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"= 1","element":"span"},{"text":". We thus fix the index ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J > ","element":"span"},{"text":"1 ","element":"span"},{"text":"and estimate ","element":"span"},{"style":{"height":19.95},"width":1001.19,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-24.png","element":"img","alt":" ||Aj||r−(j−1)δ for all 1 ≤ j ≤ J, where δ = r/(2(J −1))","inline":true,"padRight":true},{"text":"is fixed, which, for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J","element":"span"},{"text":", yields the desired bound on ","element":"span"},{"style":{"height":15.5},"width":53.72,"height":38.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-25.png","element":"img","alt":" AJ","inline":true},{"text":". To simplify notation we denote ","element":"span"},{"style":{"height":19.95},"width":396.36,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-26.png","element":"img","alt":" || · ||r−(j−1)δ as || · ||j.","inline":true,"padRight":true},{"text":"The right-hand side of ","element":"span"},{"href":"#id-108","text":"(109) ","element":"a"},{"text":"is upper bounded by","element":"span"}],[{"style":{"width":"97%"},"width":1688,"height":299,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/36-27.png","element":"img"}],[{"text":"Due to the fact that ","element":"span"},{"style":{"height":16.4},"width":739.57,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-0.png","element":"img","alt":" k1 + · · · + ki = j and k1 > 0, . . . ki > 0","inline":true},{"text":", it follows ","element":"span"},{"style":{"height":17.6},"width":433.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-1.png","element":"img","alt":" ki ≤ j − (i − 1), which","inline":true,"padRight":true},{"text":"implies ","element":"span"},{"style":{"height":19.95},"width":609.36,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-2.png","element":"img","alt":" ||¯g||j−(i−1) ≤ ||¯g||ki for all i ≤ j","inline":true,"padRight":true},{"text":"and for any function ","element":"span"},{"style":{"height":14.4},"width":23.49,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-3.png","element":"img","alt":" ¯g","inline":true,"padRight":true},{"text":"analytic on ","element":"span"},{"style":{"height":16.65},"width":49.29,"height":41.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-4.png","element":"img","alt":" Bcr","inline":true},{"text":". Therefore, the ","element":"span"},{"text":"above bound reduces to","element":"span"}],[{"style":{"width":"94%"},"width":1638,"height":221,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-5.png","element":"img"}],[{"text":"By introducing the constant ","element":"span"},{"style":{"height":21.41},"width":156.24,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-6.png","element":"img","alt":"ˆδ := δ/r","inline":true},{"text":", which, by definition of ","element":"span"},{"style":{"height":21.41},"width":329.05,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-7.png","element":"img","alt":" δ satisfies ˆδ ≤ 1/2","inline":true},{"text":", we can reformulate the above bound as","element":"span"}],[{"id":"id-109","style":{"width":"83%"},"width":1450,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-8.png","element":"img"}],[{"text":"By exploiting the structure of ","element":"span"},{"style":{"height":17.6},"width":107.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-9.png","element":"img","alt":" A2(z)","inline":true,"padRight":true},{"text":"it can be verified that","element":"span"}],[{"id":"id-110","style":{"width":"75%"},"width":1301,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-10.png","element":"img"}],[{"text":"which will simplify the following development. Following ","element":"span"},{"href":"#id-45","referenceIndex":17,"text":"Hairer et al. ","element":"a"},{"href":"#id-45","referenceIndex":17,"text":"(2002, ","element":"a"},{"text":"p. 309), we introduce the constants ","element":"span"},{"style":{"height":17.42},"width":264.15,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-11.png","element":"img","alt":" bj for all j ≥ 1","inline":true,"padRight":true},{"text":"in the following way","element":"span"}],[{"style":{"width":"72%"},"width":1246,"height":138,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-12.png","element":"img"}],[{"text":"It can be verified with an induction argument taking ","element":"span"},{"href":"#id-109","text":"(115) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-110","text":"(116) ","element":"a"},{"text":"into account, that","element":"span"}],[{"style":{"width":"64%"},"width":1123,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-13.png","element":"img"}],[{"text":"The variables ","element":"span"},{"style":{"height":17.42},"width":33.72,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-14.png","element":"img","alt":" bj","inline":true,"padRight":true},{"text":"are well-defined for any ","element":"span"},{"style":{"height":16},"width":113.45,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-15.png","element":"img","alt":" j ≥ 1","inline":true,"padRight":true},{"text":"and not just ","element":"span"},{"style":{"height":16},"width":212.25,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-16.png","element":"img","alt":" 1 ≤ j ≤ J","inline":true},{"text":". In order to bound the constants ","element":"span"},{"style":{"height":17.42},"width":33.73,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-17.png","element":"img","alt":" bj","inline":true},{"text":", we formally introduce the generating function ","element":"span"},{"style":{"height":17.6},"width":424.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-18.png","element":"img","alt":" b(ζ), b : C → C, ζ ∈ C","inline":true},{"text":", and note that","element":"span"}],[{"id":"id-111","style":{"width":"99%"},"width":1724,"height":438,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-19.png","element":"img"}],[{"text":"where the order of summation has been interchanged to arrive at ","element":"span"},{"href":"#id-111","text":"(118)","element":"a"},{"text":". It can be verified by means of the implicit function theorem that, given any ","element":"span"},{"style":{"height":13.2},"width":115.07,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-20.png","element":"img","alt":" v ∈ C","inline":true},{"text":", the equation ","element":"span"},{"style":{"height":17.6},"width":509.87,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-21.png","element":"img","alt":" 2b − exp(b) + 1 = v can be","inline":true,"padRight":true},{"text":"uniquely solved for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"b ","element":"span"},{"text":"as long as ","element":"span"},{"style":{"height":17.6},"width":201.03,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-22.png","element":"img","alt":" exp(b) ̸= 2","inline":true},{"text":". This implies that ","element":"span"},{"style":{"height":17.6},"width":75.01,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-23.png","element":"img","alt":" b(ζ)","inline":true,"padRight":true},{"text":"is well-defined and analytic for ","element":"span"},{"style":{"height":17.6},"width":328.01,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-24.png","element":"img","alt":"|b1ζ| ≤ 2ln(2) − 1","inline":true},{"text":", or equivalently, ","element":"span"},{"style":{"height":17.6},"width":381.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/37-25.png","element":"img","alt":" |ζ| ≤ (2ln(2) − 1)/b1","inline":true},{"text":". Furthermore, explicit calculations show that ","element":"span"},{"style":{"height":17.6},"width":99.1,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-0.png","element":"img","alt":" |b(ζ)|","inline":true,"padRight":true},{"text":"is bounded by ln","element":"span"},{"style":{"height":17.6},"width":510.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-1.png","element":"img","alt":"(2) for |ζ| ≤ (2ln(2) − 1)/b1","inline":true,"padRight":true},{"text":"(see ","element":"span"},{"href":"#id-45","referenceIndex":17,"text":"Hairer et al., ","element":"a"},{"href":"#id-45","referenceIndex":17,"text":"2002, ","element":"a"},{"text":"p. 310). Cauchy’s inequality therefore implies","element":"span"}],[{"style":{"width":"83%"},"width":1437,"height":140,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-2.png","element":"img"}],[{"text":"Evaluating the above bound for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J ","element":"span"},{"text":"yields","element":"span"}],[{"style":{"width":"99%"},"width":1725,"height":382,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N ","element":"span"},{"text":"is chosen to be the largest integer such that ","element":"span"},{"style":{"height":17.6},"width":792.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-4.png","element":"img","alt":" NT ≤ T0, with T0 := (2ln(2) − 1)/(2eLH).","inline":true,"padRight":true},{"text":"The choice for the integer ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N ","element":"span"},{"text":"is motivated by the following observation: Due to the bound on ","element":"span"},{"style":{"height":19.95},"width":148.9,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-5.png","element":"img","alt":"||Aj||r/2","inline":true,"padRight":true},{"text":"the terms in the above sum behave as ","element":"span"},{"style":{"height":19.13},"width":221.48,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-6.png","element":"img","alt":" (Tj/(eT0))j","inline":true},{"text":", which attains a minimum for ","element":"span"},{"style":{"height":16},"width":75.24,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-7.png","element":"img","alt":" j ≈","inline":true},{"style":{"height":17.6},"width":97.25,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-8.png","element":"img","alt":"T0/T","inline":true},{"text":". The fact that all ","element":"span"},{"style":{"height":17.42},"width":161.36,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-9.png","element":"img","alt":" fj, j ≥ 1","inline":true},{"text":", are Hamiltonian vector fields proves ","element":"span"},{"href":"#id-112","text":"(96)","element":"a"},{"text":".","element":"span"}],[{"id":"id-114","style":{"width":"95%"},"width":1658,"height":679,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-10.png","element":"img"}],[{"text":"which proves the first bound in ","element":"span"},{"href":"#id-113","text":"(97)","element":"a"},{"text":". The second bound is proved analogously; that is,","element":"span"}],[{"style":{"width":"82%"},"width":1427,"height":590,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/38-11.png","element":"img"}],[{"text":"The fact that ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-0.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"has the same critical points is derived from the following observation. First, all critical points of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"are necessarily critical points of ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-1.png","element":"img","alt":"˜H","inline":true},{"text":", which follows from ","element":"span"},{"href":"#id-114","text":"(124) ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":17.6},"width":181.31,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-2.png","element":"img","alt":" ∇H(z) =","inline":true,"padRight":true},{"text":"0","element":"span"},{"text":", and second, ","element":"span"},{"style":{"height":20.41},"width":236.76,"height":51.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-3.png","element":"img","alt":" ∇ ˜H(ˆz) = 0","inline":true,"padRight":true},{"text":"implies, according to ","element":"span"},{"href":"#id-114","text":"(124)","element":"a"},{"text":", that ","element":"span"},{"style":{"height":17.6},"width":583.75,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-4.png","element":"img","alt":" |∇H(ˆz)| ≤ 15LHT|∇H(ˆz)| ≤","inline":true},{"style":{"height":17.6},"width":499.37,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-5.png","element":"img","alt":"0.36|∇H(ˆz)| (for T ≤ T0/3","inline":true},{"text":"), concluding ","element":"span"},{"style":{"height":17.6},"width":223.33,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-6.png","element":"img","alt":" ∇H(ˆz) = 0.","inline":true}],[{"text":"The first bound in ","element":"span"},{"href":"#id-115","text":"(98) ","element":"a"},{"text":"is derived by noticing that","element":"span"}],[{"style":{"width":"78%"},"width":1348,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-7.png","element":"img"}],[{"text":"for any path ","element":"span"},{"style":{"height":19.13},"width":291.31,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-8.png","element":"img","alt":" γ : [0, 1] → C2n ","inline":true,"padRight":true},{"text":"that connects the origin with ","element":"span"},{"style":{"height":17.6},"width":471.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-9.png","element":"img","alt":" z; i.e., γ(0) = 0, γ(1) = z","inline":true},{"text":". We choose ","element":"span"},{"style":{"height":19.53},"width":909.95,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-10.png","element":"img","alt":"γ(t) such that ∇H(γ(t))T ˙γ(t) = |∇H(γ(t))||˙γ(t)|","inline":true},{"text":". This can be done by considering the gradient flow ","element":"span"},{"style":{"height":17.6},"width":85.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-11.png","element":"img","alt":" ˜γ(t),","inline":true}],[{"style":{"width":"66%"},"width":1156,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-12.png","element":"img"}],[{"text":"which is guaranteed to converge to the origin for any ","element":"span"},{"style":{"height":22.57},"width":421.12,"height":56.42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-13.png","element":"img","alt":" z ∈ Af ∩(Bcr/2 ×Bcr/2)","inline":true},{"text":". As a result, choosing","element":"span"}],[{"style":{"width":"99%"},"width":1726,"height":440,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-14.png","element":"img"}],[{"text":"The same argument (with the same path) applies to ","element":"span"},{"style":{"height":17.6},"width":150.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-15.png","element":"img","alt":" |∇F(z)|","inline":true},{"text":", which yields the second bound in ","element":"span"},{"href":"#id-115","text":"(98)","element":"a"},{"text":".","element":"span"}],[{"id":"id-116","style":{"width":"99%"},"width":1725,"height":173,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-16.png","element":"img"}],[{"text":"(the choice becomes evident), and as above, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N ","element":"span"},{"text":"to be the largest integer such that ","element":"span"},{"style":{"height":14.62},"width":275.71,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-17.png","element":"img","alt":" NT ≤ T0. For","inline":true,"padRight":true},{"text":"small enough ","element":"span"},{"style":{"height":16.4},"width":113.9,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-18.png","element":"img","alt":" ζ ∈ C","inline":true},{"text":", we denote the map from ","element":"span"},{"style":{"height":20.14},"width":899,"height":50.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-19.png","element":"img","alt":" ˜z(0) = z0 to ˜z(t = ζ), where ˜z(t) satisfies ˙˜z(t) =","inline":true},{"style":{"height":24.4},"width":708.17,"height":61.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-20.png","element":"img","alt":"�Nj=1 fj(˜z(t))ζj, by ˜ϕζ : C2n → C2n","inline":true},{"text":". The symplectic Euler step ","element":"span"},{"style":{"height":18.44},"width":499.11,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-21.png","element":"img","alt":" Φζ(z0) and ˜ϕζ(z0) are both","inline":true,"padRight":true},{"text":"analytic in ","element":"span"},{"style":{"height":16.4},"width":21,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-22.png","element":"img","alt":" ζ","inline":true,"padRight":true},{"text":"and by the above construction, their Taylor expansion in ","element":"span"},{"style":{"height":16.4},"width":267.17,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-23.png","element":"img","alt":" ζ about ζ = 0","inline":true,"padRight":true},{"text":"agrees for the first ","element":"span"},{"style":{"height":15.2},"width":214.48,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-24.png","element":"img","alt":" N terms (z0","inline":true,"padRight":true},{"text":"is fixed). Thus, the function ","element":"span"},{"style":{"height":20.38},"width":841.6,"height":50.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-25.png","element":"img","alt":" ˜g(ζ)/ζN+1, where ˜g(ζ) := Φζ(z0) − ˜ϕζ(z0) is","inline":true,"padRight":true},{"text":"analytic. Applying the maximum modulus principle implies","element":"span"}],[{"id":"id-118","style":{"width":"79%"},"width":1371,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-26.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-27.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"is chosen as ","element":"span"},{"style":{"height":17.6},"width":216.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-28.png","element":"img","alt":" ϵ = eT0/N","inline":true,"padRight":true},{"text":"(again, the choice becomes evident subsequently). It remains to bound ","element":"span"},{"style":{"height":17.6},"width":323.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-29.png","element":"img","alt":" |˜g(ζ)| for |ζ| = ϵ","inline":true},{"text":", which will be done by estimating ","element":"span"},{"style":{"height":18.44},"width":584.97,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-30.png","element":"img","alt":" |Φζ(z0) − z0| and | ˜ϕζ(z0) − z0|","inline":true,"padRight":true},{"text":"separately. For the first term we obtain (by definition of ","element":"span"},{"href":"#id-105","text":"(105)","element":"a"},{"text":")","element":"span"}],[{"id":"id-117","style":{"width":"92%"},"width":1606,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/39-31.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":14.4},"width":123.39,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-0.png","element":"img","alt":" N ≥ 3","inline":true,"padRight":true},{"text":"has been used for the last inequality. For bounding the second term we first derive a bound on ","element":"span"},{"style":{"height":25.38},"width":429.29,"height":63.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-1.png","element":"img","alt":" ∇ ˜H(z), for all z ∈ Bcr/2","inline":true},{"text":", similar to ","element":"span"},{"href":"#id-114","text":"(124) ","element":"a"},{"text":"(unlike ","element":"span"},{"href":"#id-114","text":"(124) ","element":"a"},{"text":"we have ","element":"span"},{"style":{"height":17.6},"width":199.01,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-2.png","element":"img","alt":" ϵ = eT0/N","inline":true},{"text":"). We obtain","element":"span"}],[{"style":{"width":"84%"},"width":1468,"height":505,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-3.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":21.7},"width":172.69,"height":54.26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-4.png","element":"img","alt":" z ∈ Bcr/2","inline":true},{"text":". The gradient of ","element":"span"},{"style":{"height":16.41},"width":75.36,"height":41.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-5.png","element":"img","alt":" ∇ ˜H","inline":true,"padRight":true},{"text":"has therefore a Lipschitz constant of less than 2.8","element":"span"},{"style":{"height":14.62},"width":120.71,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-6.png","element":"img","alt":"LH for","inline":true},{"style":{"height":21.7},"width":158.81,"height":54.26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-7.png","element":"img","alt":"z ∈ Bcr/2","inline":true},{"text":", which implies that","element":"span"}],[{"style":{"width":"68%"},"width":1191,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-8.png","element":"img"}],[{"text":"as long as the corresponding trajectory stays within ","element":"span"},{"style":{"height":21.7},"width":83.26,"height":54.26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-9.png","element":"img","alt":" Bcr/2","inline":true},{"text":". Note that the last inequality stems from ","element":"span"},{"text":"the fact that ","element":"span"},{"style":{"height":17.6},"width":994.38,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-10.png","element":"img","alt":" T0 < T(N + 1) and thus ϵ < eT(N + 1)/N ≤ 4eT/3","inline":true},{"text":". Due to the condition ","element":"span"},{"href":"#id-116","text":"(132)","element":"a"},{"text":", the trajectory ","element":"span"},{"style":{"height":17.6},"width":117.42,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-11.png","element":"img","alt":" ˜ϕϵ(z0)","inline":true,"padRight":true},{"text":"necessarily stays within ","element":"span"},{"style":{"height":21.7},"width":83.26,"height":54.26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-12.png","element":"img","alt":" Bcr/2","inline":true},{"text":". Moreover, from ","element":"span"},{"style":{"height":17.6},"width":478.85,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-13.png","element":"img","alt":" ϵ ≤ eT0/3 (N ≥ 3 since","inline":true},{"style":{"height":17.6},"width":177.99,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-14.png","element":"img","alt":"T ≤ T0/3","inline":true},{"text":"), it can be inferred that","element":"span"}],[{"style":{"width":"68%"},"width":1181,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-15.png","element":"img"}],[{"text":"which implies that ","element":"span"},{"style":{"height":17.6},"width":510.5,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-16.png","element":"img","alt":" | ˜ϕϵ(z0) − z0| ≤ 3.36ϵLH|z0|","inline":true},{"text":". Combined with ","element":"span"},{"href":"#id-117","text":"(134)","element":"a"},{"text":", this leads to ","element":"span"},{"style":{"height":21.43},"width":229.38,"height":53.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-17.png","element":"img","alt":" |˜g(ζ)| ≤ ϵ ˜Cg","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"height":21.43},"width":717.98,"height":53.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-18.png","element":"img","alt":" |ζ| = ϵ and ˜Cg := (4.36 + eT0/3)LH|z0|","inline":true},{"text":", which, according to ","element":"span"},{"href":"#id-118","text":"(133)","element":"a"},{"text":", implies","element":"span"}],[{"id":"id-120","style":{"width":"85%"},"width":1483,"height":186,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-19.png","element":"img"}],[{"text":"The fact that ","element":"span"},{"style":{"height":17.6},"width":396.86,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-20.png","element":"img","alt":" N ≤ T0/T < N + 1","inline":true,"padRight":true},{"text":"has been used to obtain the last inequality. The above result justifies the choice ","element":"span"},{"style":{"height":17.6},"width":210.86,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-21.png","element":"img","alt":" ϵ = eT0/N.","inline":true}],[{"text":"We derive the bound on ","element":"span"},{"style":{"height":20.41},"width":468.94,"height":51.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-22.png","element":"img","alt":"˜H( ˜ϕT (z0)) − ˜H(ΦT (z0))","inline":true,"padRight":true},{"text":"in a similar way. Consider the analytic function ","element":"span"},{"style":{"height":25.38},"width":1130.25,"height":63.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-23.png","element":"img","alt":" ˆg(ζ) = ˜H( ˜ϕζ(z0)) − ˜H(Φζ(z0)), where z0 ∈ Bcr/2, T, and N","inline":true,"padRight":true},{"text":"are fixed. The modified ","element":"span"},{"text":"Hamiltonian ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-24.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"is dependent on ","element":"span"},{"style":{"height":16.4},"width":21,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-25.png","element":"img","alt":" ζ","inline":true,"padRight":true},{"text":"and has the form","element":"span"}],[{"style":{"width":"62%"},"width":1076,"height":136,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-26.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":18.22},"width":1065.07,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-27.png","element":"img","alt":" H1 = H, and Ω∇Hj(z) = fj(z) for all j with 1 ≤ j ≤ N","inline":true},{"text":". The function ","element":"span"},{"style":{"height":19.54},"width":274.12,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-28.png","element":"img","alt":" ˆg(ζ)/(ζN+1) is","inline":true,"padRight":true},{"text":"analytic, as can be seen, for example, by a Taylor expansion of ","element":"span"},{"style":{"height":21.26},"width":592.38,"height":53.14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-29.png","element":"img","alt":"˜H in z about z = Φζ(z0), where","inline":true},{"style":{"height":18.44},"width":338.43,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-30.png","element":"img","alt":"˜ϕζ(z0) and Φζ(z0)","inline":true,"padRight":true},{"text":"agree up to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"text":"th order. We invoke the maximum principle, similar to ","element":"span"},{"href":"#id-118","text":"(133)","element":"a"},{"text":", which implies,","element":"span"}],[{"style":{"width":"63%"},"width":1104,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/40-31.png","element":"img"}],[{"text":"and where ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-0.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"is again set to ","element":"span"},{"style":{"height":17.6},"width":199.02,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-1.png","element":"img","alt":" ϵ = eT0/N","inline":true},{"text":". A bound for ","element":"span"},{"style":{"height":17.6},"width":78.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-2.png","element":"img","alt":" ˆg(ζ)","inline":true,"padRight":true},{"text":"is obtained in the following way (applying Taylor’s theorem):","element":"span"}],[{"id":"id-119","style":{"width":"95%"},"width":1642,"height":232,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":12},"width":22,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-4.png","element":"img","alt":" η","inline":true,"padRight":true},{"text":"lies between ","element":"span"},{"style":{"height":18.44},"width":254.61,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-5.png","element":"img","alt":" Φζ(z0) and z0","inline":true},{"text":". The gradient of ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-6.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"is necessarily bounded by ","element":"span"},{"style":{"height":20.41},"width":210.79,"height":51.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-7.png","element":"img","alt":" |∇ ˜H(z)| ≤","inline":true},{"style":{"height":17.6},"width":156.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-8.png","element":"img","alt":"2.8LH|z|","inline":true},{"text":", which therefore implies that the Hessian of ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-9.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"in the expression is bounded by ","element":"span"},{"style":{"height":14.62},"width":108.45,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-10.png","element":"img","alt":" 2.8LH","inline":true,"padRight":true},{"text":"provided that ","element":"span"},{"style":{"height":21.7},"width":202.27,"height":54.26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-11.png","element":"img","alt":" η is in Bcr/2","inline":true},{"text":". The bound ","element":"span"},{"href":"#id-117","text":"(134) ","element":"a"},{"text":"implies that this is the case for","element":"span"}],[{"style":{"width":"70%"},"width":1212,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-12.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":492.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-13.png","element":"img","alt":" ϵ ≤ eT0/3 and ϵ < 4eT/3","inline":true,"padRight":true},{"text":"have been used. Combining ","element":"span"},{"href":"#id-119","text":"(142) ","element":"a"},{"text":"with the bound from ","element":"span"},{"href":"#id-117","text":"(134) ","element":"a"},{"text":"yields","element":"span"}],[{"style":{"width":"94%"},"width":1631,"height":193,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-14.png","element":"img"}],[{"text":"Applying the same chain of arguments as in ","element":"span"},{"href":"#id-120","text":"(138) ","element":"a"},{"text":"allows us to conclude:","element":"span"}],[{"style":{"width":"79%"},"width":1366,"height":145,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-15.png","element":"img"}]]},{"heading":"Appendix D. Proof of Proposition 10","paragraphs":[[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"We choose ","element":"span"},{"style":{"height":17.12},"width":690.05,"height":42.79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-16.png","element":"img","alt":" r > 0 such that A ⊂ Bcr×Bcr, where Bcr ","inline":true,"padRight":true},{"text":"is the closed ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":"-dimensional ball of radius ","element":"span"},{"style":{"fontStyle":"italic"},"text":"r ","element":"span"},{"text":"centered at the origin. We use the Stone-Weierstrass theorem ","element":"span"},{"href":"#id-65","referenceIndex":34,"text":"(Rudin, ","element":"a"},{"href":"#id-65","referenceIndex":34,"text":"1976, ","element":"a"},{"text":"p. 159) to approximate","element":"span"}],[{"style":{"width":"93%"},"width":1613,"height":234,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-17.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":12.4},"width":106.6,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-18.png","element":"img","alt":" ϵ > 0","inline":true,"padRight":true},{"text":"will be chosen subsequently. We integrate ","element":"span"},{"style":{"height":20.61},"width":726.47,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-19.png","element":"img","alt":" d2 ˜f/dx2 and impose ∇ ˜f(0) = 0, which","inline":true,"padRight":true},{"text":"yields the function ","element":"span"},{"style":{"height":21.03},"width":560.7,"height":52.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-20.png","element":"img","alt":" ∇ ˜f on Bc2r that is ϵ-close to ∇f","inline":true},{"text":". We integrate once more and impose ","element":"span"},{"style":{"height":20.61},"width":172.82,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-21.png","element":"img","alt":" ˜f(0) = 0.","inline":true,"padRight":true},{"text":"Moreover, by choosing ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-22.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"small enough, the origin is guaranteed to be the only critical point of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":") ","element":"span"},{"text":"(with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"replaced with ","element":"span"},{"style":{"height":21.45},"width":892.37,"height":53.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-23.png","element":"img","alt":"˜f) for all z ∈ A, since A ⊂ Af and A ⊂ Bcr × Bcr","inline":true},{"text":". Due to the fact that ","element":"span"},{"style":{"height":20.21},"width":66.94,"height":50.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-24.png","element":"img","alt":" ˜f is","inline":true,"padRight":true},{"text":"a polynomial and therefore analytic, we can invoke Proposition ","element":"span"},{"href":"#id-66","text":"9 ","element":"a"},{"text":"with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"replaced by ","element":"span"},{"style":{"height":20.21},"width":216.02,"height":50.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-25.png","element":"img","alt":"˜f. Next, we","inline":true,"padRight":true},{"text":"will show that the origin is an asymptotically stable equilibrium of the dynamics ","element":"span"},{"href":"#id-43","text":"(13)","element":"a"},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is replaced by ","element":"span"},{"style":{"height":20.21},"width":31.4,"height":50.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/41-26.png","element":"img","alt":"˜f","inline":true},{"text":", with region of attraction at least ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":".","element":"span"}],[{"style":{"width":"95%"},"width":1658,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-0.png","element":"img"}],[{"text":"in the following way:","element":"span"}],[{"style":{"width":"71%"},"width":1236,"height":103,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-1.png","element":"img"}],[{"text":"where by assumption ","element":"span"},{"style":{"height":17.6},"width":166.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-2.png","element":"img","alt":" Λ(qk, pk)","inline":true,"padRight":true},{"text":"is symmetric and positive definite. Due to the assumption on the dissipative forces, the singular values of ","element":"span"},{"style":{"height":17.6},"width":166.06,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-3.png","element":"img","alt":" Λ(qk, pk)","inline":true,"padRight":true},{"text":"are upper and lower bounded by ","element":"span"},{"style":{"height":15.2},"width":190.94,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-4.png","element":"img","alt":" d2 and d1,","inline":true,"padRight":true},{"text":"respectively. Provided that the time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is small enough, ","element":"span"},{"style":{"height":14.62},"width":172.62,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-5.png","element":"img","alt":" T ≤ Tmax","inline":true},{"text":", we conclude from Proposition ","element":"span"},{"href":"#id-66","text":"9 ","element":"a"},{"text":"that there exists a modified energy function ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-6.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"that has the same critical points as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H","element":"span"},{"text":", and such that the bound ","element":"span"},{"href":"#id-69","text":"(27) ","element":"a"},{"text":"holds for all ","element":"span"},{"style":{"height":16.65},"width":250.16,"height":41.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-7.png","element":"img","alt":" z0 ∈ Bcr × Bcr","inline":true},{"text":". Morse theory implies that ","element":"span"},{"style":{"height":20.41},"width":96,"height":51.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-8.png","element":"img","alt":" ˜H(z)","inline":true,"padRight":true},{"text":"is locally positive ","element":"span"},{"text":"definite and has compact level sets for ","element":"span"},{"style":{"height":13.6},"width":108.54,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-9.png","element":"img","alt":" z ∈ A","inline":true},{"text":". The same applies therefore to the function","element":"span"}],[{"style":{"width":"72%"},"width":1245,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-10.png","element":"img"}],[{"text":"provided that ","element":"span"},{"style":{"height":15.2},"width":391.77,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-11.png","element":"img","alt":" T ≤ Tmax, where Tmax","inline":true,"padRight":true},{"text":"is chosen to be small enough (independently of ","element":"span"},{"style":{"height":16.4},"width":37.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-12.png","element":"img","alt":" fd","inline":true},{"text":"; this can be done since the positive definiteness of ","element":"span"},{"style":{"height":16.01},"width":39,"height":40.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-13.png","element":"img","alt":"˜H","inline":true,"padRight":true},{"text":"is not affected by the dissipative forces ","element":"span"},{"style":{"height":16.4},"width":64.83,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-14.png","element":"img","alt":" fd).","inline":true}],[{"style":{"width":"96%"},"width":1659,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-15.png","element":"img"}],[{"text":"simple structure of ","element":"span"},{"style":{"height":17.24},"width":82.36,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-16.png","element":"img","alt":" Φd,T","inline":true,"padRight":true},{"text":". In order to simplify the notation we hide all constants that may depend on ","element":"span"},{"style":{"height":17.6},"width":103.17,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-17.png","element":"img","alt":"d2/d1","inline":true},{"text":", an upper bound on ","element":"span"},{"style":{"height":15.2},"width":119.05,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-18.png","element":"img","alt":" d2, LH","inline":true},{"text":", or higher orders of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"with the notation ","element":"span"},{"style":{"height":17.6},"width":82.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-19.png","element":"img","alt":" O(·)","inline":true},{"text":". It will be shown that ","element":"span"},{"style":{"height":17.6},"width":170.91,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-20.png","element":"img","alt":" V (qk, pk)","inline":true,"padRight":true},{"text":"necessarily decays along the step ","element":"span"},{"style":{"height":17.24},"width":173.55,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-21.png","element":"img","alt":" Φd,T ◦ΦT","inline":true,"padRight":true},{"text":". To that extent, we define ","element":"span"},{"style":{"height":18.44},"width":255.36,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-22.png","element":"img","alt":" z2 = Φd,T (z1)","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":17.6},"width":228.53,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-23.png","element":"img","alt":" z1 = ΦT (z0)","inline":true,"padRight":true},{"text":"and consider ","element":"span"},{"style":{"height":17.6},"width":269.99,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-24.png","element":"img","alt":" V (z2) − V (z0)","inline":true},{"text":", which, due to the fact that ","element":"span"},{"style":{"height":11.6},"width":133.07,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-25.png","element":"img","alt":" q2 = q1","inline":true},{"text":", is bounded by","element":"span"}],[{"style":{"width":"82%"},"width":1430,"height":481,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-26.png","element":"img"}],[{"text":"where the arguments of ","element":"span"},{"style":{"height":12.8},"width":31,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-27.png","element":"img","alt":" Λ","inline":true,"padRight":true},{"text":"have been dropped and the results from Proposition ","element":"span"},{"href":"#id-66","text":"9 ","element":"a"},{"text":"have been used, which enable the following bound:","element":"span"}],[{"style":{"width":"75%"},"width":1307,"height":252,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-28.png","element":"img"}],[{"text":"Due to Proposition ","element":"span"},{"href":"#id-66","text":"9, ","element":"a"},{"text":"it follows that ","element":"span"},{"style":{"height":20.41},"width":382.61,"height":51.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-29.png","element":"img","alt":"˜H(ΦT (z0)) − ˜H(z0)","inline":true,"padRight":true},{"text":"is bounded by a term of the order ","element":"span"},{"style":{"height":20.33},"width":347.99,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-30.png","element":"img","alt":"|∇H(z0)|2Te−T0/T ","inline":true,"padRight":true},{"text":". This yields","element":"span"}],[{"style":{"height":36.18},"width":1410.91,"height":90.45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-31.png","element":"img","alt":"V (q2, p2) ≤ ˜H(q0, p0) − TpT1 Λp1 + Td12 ∇ ˜f(q1)Tp1 + |∇H(z0)|2O(Te−T0/T )","inline":true,"padRight":true},{"text":"+ ","element":"span"},{"style":{"height":20.61},"width":685.45,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-32.png","element":"img","alt":" O(d1T 2)|p1|2 + O(d1T 2)|∇ ˜f(q1)||p1|","inline":true}],[{"style":{"width":"90%"},"width":1556,"height":158,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/42-33.png","element":"img"}],[{"text":"Due to the continuity of ","element":"span"},{"style":{"height":20.21},"width":67.76,"height":50.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-0.png","element":"img","alt":" ∇ ˜f","inline":true,"padRight":true},{"text":"we can bound ","element":"span"},{"style":{"height":20.61},"width":946.34,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-1.png","element":"img","alt":" |∇ ˜f(q1) − ∇ ˜f(q0)| by T(L + ϵ)|p1|, where L is the","inline":true,"padRight":true},{"text":"Lipschitz constant of ","element":"span"},{"style":{"height":16.4},"width":61.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-2.png","element":"img","alt":" ∇f","inline":true},{"text":". Moreover, ","element":"span"},{"style":{"height":14.62},"width":52.7,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-3.png","element":"img","alt":" LH","inline":true,"padRight":true},{"text":"is an upper bound on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L","element":"span"},{"text":". This implies","element":"span"}],[{"style":{"width":"98%"},"width":1698,"height":350,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-4.png","element":"img"}],[{"text":"provided that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is chosen small enough such that ","element":"span"},{"style":{"height":19.81},"width":226.38,"height":49.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-5.png","element":"img","alt":" −pT1 Λp1T/2","inline":true,"padRight":true},{"text":"dominates the ","element":"span"},{"style":{"height":19.13},"width":355.05,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-6.png","element":"img","alt":" O(d1T 2)|p1|2 terms","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":20.61},"width":347.83,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-7.png","element":"img","alt":" −T 2d1|∇ ˜f(q0)|2/4","inline":true,"padRight":true},{"text":"dominates the ","element":"span"},{"style":{"height":20.61},"width":338.04,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-8.png","element":"img","alt":" O(d1T 3)|∇ ˜f(q0)|2 ","inline":true,"padRight":true},{"text":"terms. Expanding the term ","element":"span"},{"style":{"height":19.81},"width":227.59,"height":49.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-9.png","element":"img","alt":" −TpT1 Λp1/2","inline":true,"padRight":true},{"text":"and applying Young’s inequality,","element":"span"}],[{"style":{"width":"50%"},"width":880,"height":25,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-10.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":"Y","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"style":{"height":20.61},"width":297.4,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-11.png","element":"img","alt":"1T 2pT0 ∇ ˜f(q0) ≤","inline":true}],[{"style":{"width":"14%"},"width":254,"height":13,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-12.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":15.02},"width":136.38,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-13.png","element":"img","alt":" CY > 0","inline":true,"padRight":true},{"text":"is an arbitrary constant, results in","element":"span"}],[{"style":{"width":"84%"},"width":1462,"height":210,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-14.png","element":"img"}],[{"text":"where for the last inequality ","element":"span"},{"style":{"height":14.4},"width":115.13,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-15.png","element":"img","alt":" T ≤ 1","inline":true,"padRight":true},{"text":"has been used. Note that the term ","element":"span"},{"style":{"height":16.33},"width":155.48,"height":40.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-16.png","element":"img","alt":" Te−T0/T ","inline":true,"padRight":true},{"text":"decays quickly for small ","element":"span"},{"style":{"height":20.33},"width":591.56,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-17.png","element":"img","alt":" T. Thus for d1 ≥ O(e−T0/T /T)","inline":true,"padRight":true},{"text":"the function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":"strictly decreases along the trajectories of ","element":"span"},{"style":{"height":17.24},"width":186.81,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-18.png","element":"img","alt":"Φd,T ◦ ΦT","inline":true,"padRight":true},{"text":". The lower bound on ","element":"span"},{"style":{"height":15.02},"width":39.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-19.png","element":"img","alt":" d1","inline":true,"padRight":true},{"text":"vanishes exponentially for small ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":". This concludes that the origin is asymptotically stable under the dynamics ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"(with ","element":"span"},{"style":{"height":20.21},"width":31.39,"height":50.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-20.png","element":"img","alt":"˜f","inline":true,"padRight":true},{"text":"instead of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":") provided that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is chosen small enough, i.e. ","element":"span"},{"style":{"height":14.62},"width":177.69,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-21.png","element":"img","alt":" T ≤ Tmax","inline":true},{"text":", where up to the exponential terms in ","element":"span"},{"style":{"height":15.2},"width":146.44,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-22.png","element":"img","alt":" d1, Tmax","inline":true,"padRight":true},{"text":"only depends on the ratio ","element":"span"},{"style":{"height":17.6},"width":256.26,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-23.png","element":"img","alt":" d2/d1, and LH","inline":true},{"text":". Due to the fact that the first-order approximations of the dynamics resulting from ","element":"span"},{"style":{"height":20.21},"width":142.55,"height":50.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-24.png","element":"img","alt":" f and ˜f","inline":true,"padRight":true},{"text":"about the equilibrium are ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-25.png","element":"img","alt":" ϵ","inline":true},{"text":"-close (by construction of ","element":"span"},{"style":{"height":20.21},"width":31.39,"height":50.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-26.png","element":"img","alt":"˜f","inline":true},{"text":"), the origin is likewise asymptotically stable for the dynamics resulting from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":", and its region of attraction contains at least a small neighborhood of the origin. Consequently, due to the continuity of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":", we may choose ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-27.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"small enough such that ","element":"span"},{"style":{"height":17.6},"width":755.03,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-28.png","element":"img","alt":" V (˜q2, ˜p2) − V (q0, p0) < 0 for all (q0, p0)","inline":true,"padRight":true},{"text":"outside a neighborhood of the origin, where ","element":"span"},{"style":{"height":15.2},"width":103.2,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-29.png","element":"img","alt":" ˜q2, ˜p2","inline":true,"padRight":true},{"text":"denotes the trajectory resulting from ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"(with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"instead of ","element":"span"},{"style":{"height":20.2},"width":31.39,"height":50.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-30.png","element":"img","alt":"˜f","inline":true},{"text":"). This shows that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"is likewise contained in the region of attraction of the origin under the dynamics ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"instead of ","element":"span"},{"style":{"height":20.2},"width":37.06,"height":50.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-31.png","element":"img","alt":"˜f.","inline":true}]]},{"heading":"Appendix E. Discrete-time stability (convex case)","paragraphs":[[{"text":"By using the change of variables ","element":"span"},{"style":{"height":17.6},"width":350.33,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-32.png","element":"img","alt":" (qk, pk) → (ˆqk, ˆpk),","inline":true}],[{"style":{"width":"72%"},"width":1245,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/43-33.png","element":"img"}],[{"text":"the discrete-time algorithm ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"can be reformulated as","element":"span"}],[{"style":{"width":"67%"},"width":1167,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-0.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":413.31,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-1.png","element":"img","alt":" τ := β/(1 − 2dT), and","inline":true}],[{"style":{"width":"63%"},"width":1090,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-2.png","element":"img"}],[{"text":"The change of variables is motivated by the fact that the convexity and smoothness of the objective function implies ","element":"span"},{"style":{"height":17.6},"width":314.3,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-3.png","element":"img","alt":" f(ˆqk+1) ≤ f(yk),","inline":true}],[{"style":{"width":"75%"},"width":1296,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-4.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"L ","element":"span"},{"text":"is the Lipschitz constant of the gradient of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":". In the following it will be shown that the energy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"H ","element":"span"},{"text":"decreases along ","element":"span"},{"style":{"height":16},"width":165.23,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-5.png","element":"img","alt":" ˆqk and ˆpk","inline":true,"padRight":true},{"text":"provided that certain conditions on the parameters ","element":"span"},{"style":{"height":16.4},"width":194.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-6.png","element":"img","alt":" T, d, and β","inline":true,"padRight":true},{"text":"are met. We evaluate ","element":"span"},{"style":{"height":17.6},"width":273.11,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-7.png","element":"img","alt":" H(ˆqk+1, ˆpk+1),","inline":true}],[{"style":{"width":"98%"},"width":1702,"height":325,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-8.png","element":"img"}],[{"text":"Due to the fact that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is convex it holds that","element":"span"}],[{"style":{"width":"71%"},"width":1230,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-9.png","element":"img"}],[{"text":"and as a result","element":"span"}],[{"style":{"width":"94%"},"width":1639,"height":168,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-10.png","element":"img"}],[{"text":"The right-hand side of the above expression is guaranteed to decrease provided that the following matrix","element":"span"}],[{"id":"id-121","style":{"width":"71%"},"width":1236,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-11.png","element":"img"}],[{"text":"is positive definite. Thus, the minimum of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is an asymptotically stable equilibrium of the dynamics ","element":"span"},{"href":"#id-43","text":"(13) ","element":"a"},{"text":"provided that ","element":"span"},{"href":"#id-121","text":"(160) ","element":"a"},{"text":"is positive semidefinite. In particular, choosing ","element":"span"},{"style":{"height":17.6},"width":437.46,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-12.png","element":"img","alt":" T = τ, i.e., β = T(1 −","inline":true},{"style":{"height":19.98},"width":650.92,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-13.png","element":"img","alt":"2dT), 0 < T ≤ 1/√L, 0 < dT < 1","inline":true},{"text":", ensures asymptotic stability of the minimum. Moreover, for any fixed ","element":"span"},{"style":{"height":16.4},"width":114.91,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/44-14.png","element":"img","alt":" β > 0","inline":true,"padRight":true},{"text":"the above matrix can be made negative definite by choosing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"sufficiently small.","element":"span"}]]},{"heading":"Appendix F. Smoothness assumptions on f","paragraphs":[[{"text":"This section discusses the implications of Assumption ","element":"span"},{"href":"#id-34","text":"1 ","element":"a"},{"text":"compared to Assumption ","element":"span"},{"href":"#id-35","text":"2. ","element":"a"},{"text":"We therefore consider any function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"that satisfies Assumption ","element":"span"},{"href":"#id-34","text":"1 ","element":"a"},{"text":"and construct a sequence of functions ","element":"span"},{"style":{"height":17.42},"width":167.33,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-0.png","element":"img","alt":" fj, where","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"j > ","element":"span"},{"text":"0 ","element":"span"},{"text":"is an integer, in the following way: Let ","element":"span"},{"style":{"height":17.42},"width":240.29,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-1.png","element":"img","alt":" fj : Rn → R","inline":true,"padRight":true},{"text":"be such that ","element":"span"},{"style":{"height":18.22},"width":431.45,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-2.png","element":"img","alt":" fj(0) = 0, ∇fj(0) = 0,","inline":true}],[{"id":"id-122","style":{"width":"99%"},"width":1725,"height":140,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.02},"width":271.15,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-4.png","element":"img","alt":" sj : Rn → R≥0","inline":true,"padRight":true},{"text":"is an infinitely differentiable function that has support on ","element":"span"},{"style":{"height":21.7},"width":125.62,"height":54.26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-5.png","element":"img","alt":" Bc1/(2j) ","inline":true,"padRight":true},{"text":"and satisfies ","element":"span"},{"style":{"height":20.19},"width":299.53,"height":50.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-6.png","element":"img","alt":"�Rn sj(¯x)d¯x = 1","inline":true},{"text":". As a consequence, it holds that","element":"span"}],[{"style":{"width":"60%"},"width":1038,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-7.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":18.81},"width":662.78,"height":47.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-8.png","element":"img","alt":" x ∈ Rn and j > 0 (assuming ¯Cf > Cf","inline":true},{"text":"). In other words, ","element":"span"},{"style":{"height":17.42},"width":72.73,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-9.png","element":"img","alt":" ∇fj","inline":true,"padRight":true},{"text":"converges uniformly to ","element":"span"},{"style":{"height":16.4},"width":139.02,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-10.png","element":"img","alt":" ∇f; the","inline":true,"padRight":true},{"text":"same holds for ","element":"span"},{"style":{"height":17.42},"width":138.25,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-11.png","element":"img","alt":" fj → f","inline":true,"padRight":true},{"text":"on any compact subset of ","element":"span"},{"style":{"height":12},"width":52.52,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-12.png","element":"img","alt":" Rn","inline":true},{"text":". As a result of ","element":"span"},{"href":"#id-122","text":"(161)","element":"a"},{"text":", any essential upper or lower bound on the curvature of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"are translated to the functions ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-13.png","element":"img","alt":" fj","inline":true},{"text":". This implies, for example, that for large enough ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"the origin is an isolated non-degenerate critical point of ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-14.png","element":"img","alt":" fj","inline":true},{"text":". In addition, each ","element":"span"},{"style":{"height":17.42},"width":49.89,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-15.png","element":"img","alt":" fj,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"j > ","element":"span"},{"text":"0","element":"span"},{"text":", is infinitely differentiable and satisfies Assumption ","element":"span"},{"href":"#id-35","text":"2. ","element":"a"},{"text":"We therefore consider the dynamical system ","element":"span"},{"href":"#id-77","text":"(37)","element":"a"},{"text":", where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"is replaced with ","element":"span"},{"style":{"height":17.42},"width":50.89,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-16.png","element":"img","alt":" fj:","inline":true}],[{"id":"id-123","style":{"width":"83%"},"width":1450,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-17.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":18.22},"width":387.79,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-18.png","element":"img","alt":" zj(t) := (qj(t), pj(t))","inline":true},{"text":". Compared to ","element":"span"},{"href":"#id-77","text":"(37)","element":"a"},{"text":", the dependence on ","element":"span"},{"style":{"height":17.42},"width":72.73,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-19.png","element":"img","alt":" ∇fj","inline":true,"padRight":true},{"text":"is made explicit. Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"text":"be the trajectory satisfying ","element":"span"},{"href":"#id-123","text":"(162)","element":"a"},{"text":", where ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-20.png","element":"img","alt":" fj","inline":true,"padRight":true},{"text":"is replaced with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":". In the continuous-time case, ","element":"span"},{"href":"#id-123","text":"(162) ","element":"a"},{"text":"implies","element":"span"}],[{"id":"id-124","style":{"width":"85%"},"width":1481,"height":238,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-21.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.88},"width":427.08,"height":44.71,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-22.png","element":"img","alt":" Cg1 > 0 and Cg2 > 0","inline":true,"padRight":true},{"text":"are Lipschitz constants related to ","element":"span"},{"style":{"height":12},"width":32.81,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-23.png","element":"img","alt":" gs","inline":true},{"text":". By virtue of the Gr¨onwall inequality, ","element":"span"},{"href":"#id-124","text":"(163) ","element":"a"},{"text":"readily implies ","element":"span"},{"style":{"height":18.22},"width":229.81,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-24.png","element":"img","alt":" zj(t) → z(t)","inline":true,"padRight":true},{"text":"pointwise for any ","element":"span"},{"style":{"height":12.8},"width":93.56,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-25.png","element":"img","alt":" t ∈ I","inline":true},{"text":". A similar argument applies to the discrete-time case.","element":"span"}],[{"text":"We show next that, in case the equilibrium at the origin is uniformly stable (uniformity with respect to ","element":"span"},{"style":{"height":16.4},"width":141.56,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-26.png","element":"img","alt":" j and t0","inline":true},{"text":"), the convergence ","element":"span"},{"style":{"height":18.22},"width":233.82,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-27.png","element":"img","alt":" zj(t) → z(t)","inline":true,"padRight":true},{"text":"is in fact uniform in ","element":"span"},{"style":{"height":12.8},"width":97.57,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-28.png","element":"img","alt":" t ∈ I","inline":true},{"text":". As a consequence, the results from Proposition ","element":"span"},{"href":"#id-82","text":"12 ","element":"a"},{"text":"and similarly Proposition ","element":"span"},{"href":"#id-52","text":"3 ","element":"a"},{"text":"generalize to the case where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"satisfies merely Assumption ","element":"span"},{"href":"#id-34","text":"1 ","element":"a"},{"text":"instead of Assumption ","element":"span"},{"href":"#id-35","text":"2, ","element":"a"},{"text":"as shown below.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Proposition 18 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let the origin be a stable equilibrium for ","element":"span"},{"href":"#id-123","text":"(162)","element":"a"},{"style":{"fontStyle":"italic"},"text":", uniformly in ","element":"span"},{"style":{"height":16.4},"width":346.12,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-29.png","element":"img","alt":" j and t0 (for j > 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"sufficiently large). If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"converges to the origin for ","element":"span"},{"style":{"height":18.22},"width":241.82,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-30.png","element":"img","alt":" t → ∞, zj(t)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"converges to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":")","element":"span"},{"style":{"fontStyle":"italic"},"text":", uniformly in ","element":"span"},{"style":{"height":12.8},"width":102.7,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-31.png","element":"img","alt":"t ∈ I.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"We pick any ","element":"span"},{"style":{"height":12.4},"width":113.96,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-32.png","element":"img","alt":" ϵ > 0","inline":true,"padRight":true},{"text":"and show that there exists an integer ","element":"span"},{"style":{"fontStyle":"italic"},"text":"N","element":"span"},{"text":", independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", such that ","element":"span"},{"style":{"height":18.22},"width":852.29,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-33.png","element":"img","alt":"|zj(t) − z(t)| < ϵ for all j > N and all t ≥ t0","inline":true},{"text":". Due to the uniform stability of the origin, there exists a ","element":"span"},{"style":{"height":13.2},"width":110.69,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-34.png","element":"img","alt":" δ > 0","inline":true,"padRight":true},{"text":"(independent of ","element":"span"},{"style":{"height":16.4},"width":229.03,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-35.png","element":"img","alt":" t0 ∈ I and j","inline":true},{"text":") such that ","element":"span"},{"style":{"height":18.22},"width":736.91,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/45-36.png","element":"img","alt":" |zj(t0)| < δ implies |zj(t)| < ϵ/2 for all","inline":true,"padRight":true},{"style":{"height":16.4},"width":299.19,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-0.png","element":"img","alt":"t ≥ t0 and j > 0","inline":true,"padRight":true},{"text":"sufficiently large. The trajectory ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":") ","element":"span"},{"text":"converges to the origin. Hence, there exists a finite time ","element":"span"},{"style":{"height":17.6},"width":520.66,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-1.png","element":"img","alt":" T ∈ I such that |z(T)| < δ/2","inline":true},{"text":". Applying the Gr¨onwall inequality to ","element":"span"},{"href":"#id-124","text":"(163) ","element":"a"},{"text":"yields","element":"span"}],[{"style":{"width":"84%"},"width":1460,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-2.png","element":"img"}],[{"text":"a similar conclusion holds in the discrete-time case. Thus, choosing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"large enough guarantees that ","element":"span"},{"style":{"height":18.22},"width":914.57,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-3.png","element":"img","alt":"|zj(t) − z(t)| < δ/2 ≤ ϵ/2 for all t ∈ I, t0 ≤ t ≤ T","inline":true},{"text":". Therefore ","element":"span"},{"style":{"height":18.22},"width":205.77,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-4.png","element":"img","alt":" |zj(T)| < δ","inline":true},{"text":", which readily implies ","element":"span"},{"style":{"height":18.22},"width":461.21,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-5.png","element":"img","alt":"|zj(t)| < ϵ/2 for all t ≥ T","inline":true},{"text":". Combined with ","element":"span"},{"style":{"height":17.6},"width":445.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-6.png","element":"img","alt":" |z(t)| < ϵ/2 for all t ≥ T","inline":true},{"text":", this yields ","element":"span"},{"style":{"height":18.22},"width":307.63,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-7.png","element":"img","alt":" |zj(t) − z(t)| < ϵ","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"height":13.82},"width":119.63,"height":34.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-8.png","element":"img","alt":" t ≥ t0.","inline":true}],[{"text":"Verifying that the origin is stable (uniformly in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":") is typically straightforward. In order to illustrate the ideas we consider the discussion of Section ","element":"span"},{"href":"#id-29","text":"2.6. ","element":"a"},{"text":"The total energy ","element":"span"},{"style":{"height":17.42},"width":278.69,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-9.png","element":"img","alt":" Hj (where f is","inline":true,"padRight":true},{"text":"replaced with ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-10.png","element":"img","alt":" fj","inline":true},{"text":") is well-defined for any ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j > ","element":"span"},{"text":"0","element":"span"},{"text":". For large enough ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":", the curvature of ","element":"span"},{"style":{"height":17.42},"width":122.21,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-11.png","element":"img","alt":" fj is a","inline":true,"padRight":true},{"text":"localized average of the curvature of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":". The origin is a non-degenerate isolated critical point of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":", and therefore the same applies to ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-12.png","element":"img","alt":" fj","inline":true,"padRight":true},{"text":"for large enough ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":". This concludes that in a neighborhood of the origin the total energy ","element":"span"},{"style":{"height":17.02},"width":51.28,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-13.png","element":"img","alt":" Hj","inline":true,"padRight":true},{"text":"is upper and lower bounded:","element":"span"}],[{"style":{"width":"44%"},"width":765,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-14.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j > ","element":"span"},{"text":"0 ","element":"span"},{"text":"sufficiently large. These upper and lower bounds are independent of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":", which, combined with the fact that ","element":"span"},{"style":{"height":17.02},"width":51.27,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-15.png","element":"img","alt":" Hj","inline":true,"padRight":true},{"text":"is non-increasing along trajectories, readily implies stability of the origin (uniformly in ","element":"span"},{"style":{"height":16.4},"width":140.09,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-16.png","element":"img","alt":" j and t0","inline":true},{"text":") (see, for example, ","element":"span"},{"href":"#id-41","referenceIndex":36,"text":"Sastry, ","element":"a"},{"href":"#id-41","referenceIndex":36,"text":"1999, ","element":"a"},{"text":"p. 189). The same argument applies in the discrete-time case, where, by assumption, ","element":"span"},{"style":{"height":17.42},"width":157.65,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-17.png","element":"img","alt":" f and fj","inline":true,"padRight":true},{"text":"are locally strongly convex. Thus, uniform stability follows from Appendix ","element":"span"},{"text":"D.","element":"span"}],[{"text":"The fact that the convergence ","element":"span"},{"style":{"height":14.22},"width":140.15,"height":35.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-18.png","element":"img","alt":" zj → z","inline":true,"padRight":true},{"text":"is uniform in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"implies that the convergence estimates form Proposition ","element":"span"},{"href":"#id-82","text":"12 ","element":"a"},{"text":"and Proposition ","element":"span"},{"href":"#id-52","text":"3 ","element":"a"},{"text":"apply for ","element":"span"},{"style":{"height":16},"width":132.36,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-19.png","element":"img","alt":" j → ∞","inline":true},{"text":". More precisely, we have","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Proposition 19 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let the origin be a stable equilibrium for ","element":"span"},{"href":"#id-77","text":"(38) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"uniformly in ","element":"span"},{"style":{"height":16.4},"width":359.8,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-20.png","element":"img","alt":" j and t0 (for j > 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"sufficiently large). Let ","element":"span"},{"style":{"height":15.53},"width":167.61,"height":38.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-21.png","element":"img","alt":" A ⊂ R2n ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be compact and such that ","element":"span"},{"style":{"height":17.6},"width":675.35,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-22.png","element":"img","alt":" z0 ∈ A implies z(t) → 0 for t → ∞.","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"Then, provided that for sufficiently large ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j > ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"the trajectories ","element":"span"},{"style":{"height":18.22},"width":87.55,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-23.png","element":"img","alt":" zj(t)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfy the estimates","element":"span"}],[{"id":"id-125","style":{"width":"84%"},"width":1468,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-24.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":21.83},"width":361.7,"height":54.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-25.png","element":"img","alt":"ˆCj > 0 and α > 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are constant, and ","element":"span"},{"style":{"height":17.02},"width":307.94,"height":42.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-26.png","element":"img","alt":" ρ : R≥0 → R>0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is continuous and monotonically decreasing, there exists a constant ","element":"span"},{"style":{"height":21.21},"width":286.2,"height":53.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-27.png","element":"img","alt":"ˆC such that z(t)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"satisfies the estimate","element":"span"}],[{"style":{"width":"84%"},"width":1453,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-28.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for any constant ","element":"span"},{"style":{"height":11.2},"width":125.33,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-29.png","element":"img","alt":" ¯α < α.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"Proof ","element":"span"},{"text":"For any ","element":"span"},{"style":{"height":12.4},"width":97.89,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-30.png","element":"img","alt":" ϵ > 0","inline":true},{"text":", ","element":"span"},{"href":"#id-125","text":"(165) ","element":"a"},{"text":"implies that for any ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j > ","element":"span"},{"text":"0 ","element":"span"},{"text":"sufficiently large,","element":"span"}],[{"style":{"width":"71%"},"width":1243,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/46-31.png","element":"img"}],[{"text":"Due to the fact that ","element":"span"},{"style":{"height":13.02},"width":35.29,"height":32.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/47-0.png","element":"img","alt":" zj","inline":true,"padRight":true},{"text":"converges uniformly to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"z","element":"span"},{"text":", it follows that","element":"span"}],[{"style":{"width":"98%"},"width":1696,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/47-1.png","element":"img"}],[{"text":"which yields the desired result.","element":"span"}]]},{"heading":"References","paragraphs":[[{"id":"id-98","text":"Ravi P. Agarwal. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Difference Equations and Inequalities","element":"span"},{"text":". Marcel Dekker, Inc., second edition, 2000.","element":"span"}],[{"id":"id-5","text":"Zeyuan Allen-Zhu and Lorenzo Orecchia. Linear coupling: An ultimate unification of gradient and ","element":"span"},{"text":"mirror descent. ","element":"span"},{"text":"arXiv:1407.1537 [cs.DS]","element":"span"},{"text":", 2014.","element":"span"}],[{"id":"id-37","text":"Vladimir I. Arnol’d. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Ordinary Differential Equations","element":"span"},{"text":". Springer, third edition, 1992.","element":"span"}],[{"id":"id-12","text":"Hedy Attouch, Zaki Chbani, Juan Peypouquet, and Patrick Redont. Fast convergence of inertial dy- ","element":"span"},{"text":"namics and algorithms with asymptotic vanishing viscosity. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematical Programming, Series B","element":"span"},{"text":", 168(1-2):123–175, 2018.","element":"span"}],[{"text":"Richard Bellman. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Stability Theory of Differential Equations","element":"span"},{"text":". Dover, 1969.","element":"span"}],[{"id":"id-14","text":"Michael Betancourt, Michael I. Jordan, and Ashia C. Wilson. ","element":"span"},{"text":"On symplectic optimization. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv:1802.03653v2","element":"span"},{"text":", pages 1–20, 2018.","element":"span"}],[{"id":"id-4","text":"S´ebastien Bubeck, Yin Tat Lee, and Mohit Singh. A geometric alternative to Nesterov’s accelerated ","element":"span"},{"text":"gradient descent. ","element":"span"},{"text":"arXiv:1506.08187 [math.OC]","element":"span"},{"text":", 2015.","element":"span"}],[{"text":"S´ebastien Bubeck, Qijia Jiang, Yin Tat Lee, Yuanzhi Li, and Aaron Sidford. Near-optimal method ","element":"span"},{"text":"for highly smooth convex optimization. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of Machine Learning Research","element":"span"},{"text":", 99:492– 507, 2019.","element":"span"}],[{"id":"id-25","text":"Frank E. Curtis, Daniel P. Robinson, and Mohammadreza Samadi. A trust region algorithm with ","element":"span"},{"text":"a worst-case iteration complexity of ","element":"span"},{"style":{"height":20.33},"width":166.78,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/47-2.png","element":"img","alt":" O(ϵ−3/2)","inline":true,"padRight":true},{"text":"for nonconvex optimization. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematical Programming, Series A","element":"span"},{"text":", 162(1-2):1–32, 2017.","element":"span"}],[{"id":"id-8","text":"Jelena Diakonikolas and Lorenzo Orecchia. The approximate duality gap technique: A unified ","element":"span"},{"text":"theory of first-order methods. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"SIAM Journal on Optimization","element":"span"},{"text":", 29(1):660–689, 2018.","element":"span"}],[{"id":"id-20","text":"Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Barnab´as P´oczos, and Aarti Singh. Gradi- ","element":"span"},{"text":"ent descent can take exponential time to escape saddle points. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems 30","element":"span"},{"text":", pages 1067–1077, 2017.","element":"span"}],[{"id":"id-15","text":"Hans-Bernd D¨urr and Christian Ebenbauer. On a class of smooth optimization algorithms with ","element":"span"},{"text":"applications in control. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the IFAC Nonlinear Model Predictive Control Conference","element":"span"},{"text":", pages 291–298, 2012.","element":"span"}],[{"id":"id-92","text":"Saber Elaydi. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"An Introduction to Difference Equations","element":"span"},{"text":". Springer, third edition, 2005.","element":"span"}],[{"id":"id-16","text":"Guilherme Franc¸a, Jeremias Sulam, Daniel Robinso, and Ren´e Vidal. Conformal symplectic and ","element":"span"},{"text":"relativistic optimization. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv:1903.04100 [math.OC]","element":"span"},{"text":", pages 1–27, 2019.","element":"span"}],[{"id":"id-64","text":"S´ebastien Gadat, Fabien Panloup, and Sofiane Saadane. Stochastic heavy ball. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Electronic Journal of Statistics","element":"span"},{"text":", 12(1):461–529, 2018.","element":"span"}],[{"id":"id-21","text":"Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. Escaping from saddle points - online stochastic ","element":"span"},{"text":"gradient for tensor decomposition. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of Machine Learning Research","element":"span"},{"text":", 40:797–842, 2015.","element":"span"}],[{"id":"id-45","text":"Ernst Hairer, Gerhard Wanner, and Christian Lubich. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Geometric Numerical Integration","element":"span"},{"text":". Springer, 2002.","element":"span"}],[{"id":"id-71","text":"Jack K. Hale. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Asymptotic Behavior of Dissipative Systems","element":"span"},{"text":". American Mathematical Society, 1988.","element":"span"}],[{"id":"id-72","text":"Alain Haraux. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Syst`emes Dynamiques Dissipatifs et Applications","element":"span"},{"text":". Masson, 1991.","element":"span"}],[{"id":"id-22","text":"Chi Jin, Praneeth Netrapalli, and Michael I. Jordan. Accelerated gradient descent escapes saddle ","element":"span"},{"text":"points faster than gradient descent. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv:1711.10456","element":"span"},{"text":", pages 1–43, 2017.","element":"span"}],[{"id":"id-23","text":"Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, and Michael I. Jordan. On nonconvex op- ","element":"span"},{"text":"timization for machine learning: Gradients, stochasticity, and saddle points. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv:1902.04811","element":"span"},{"text":", pages 1–31, 2019.","element":"span"}],[{"id":"id-11","text":"Walid Krichene, Alexandre M. Bayen, and Peter L. Bartlett. Accelerated mirror descent in continu- ","element":"span"},{"text":"ous and discrete time. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems 28","element":"span"},{"text":", pages 2845–2853, 2015.","element":"span"}],[{"id":"id-19","text":"Jason D. Lee, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. Gradient descent only ","element":"span"},{"text":"converges to minimizers. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of Machine Learning Research","element":"span"},{"text":", 49(1):1246–1257, 2016.","element":"span"}],[{"id":"id-6","text":"Laurent Lessard, Benjamin Recht, and Andrew Packard. Analysis and design of optimization algo- ","element":"span"},{"text":"rithms via integral quadratic constraints. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"SIAM Journal on Optimization","element":"span"},{"text":", 26(1):57–95, 2016.","element":"span"}],[{"id":"id-18","text":"Chris J. Maddison, Daniel Paulin, Yee Whye Teh, and Brendan O’Donoghue. Hamiltonian descent ","element":"span"},{"text":"methods. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv:1809.05042 [math.OC]","element":"span"},{"text":", pages 1–72, 2018.","element":"span"}],[{"id":"id-7","text":"Simon Michalowsky, Carsten Scherer, and Christian Ebenbauer. Robust and structure exploiting ","element":"span"},{"text":"optimization algorithms: An integral quadratic constraint approach. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv:1905.00279","element":"span"},{"text":", pages 1–30, 2019.","element":"span"}],[{"id":"id-55","text":"J. Milnor. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Morse Theory","element":"span"},{"text":". Princeton University Press, 1963.","element":"span"}],[{"id":"id-17","text":"Michael Muehlebach and Michael I. Jordan. A dynamical systems perspective on Nesterov accel- ","element":"span"},{"text":"eration. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of Machine Learning Research","element":"span"},{"text":", 97:4656–4662, 2019.","element":"span"}],[{"id":"id-0","text":"Yuri E. Nesterov. A method of solving a convex programming problem with convergence rate ","element":"span"},{"style":{"height":19.13},"width":166.09,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2002.12493/images/48-0.png","element":"img","alt":"O(1/k2).","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"Soviet Mathematics Doklady","element":"span"},{"text":", 27(2):372–376, 1983.","element":"span"}],[{"id":"id-24","text":"Yuri E. Nesterov and Boris T. Polyak. Cubic regularization of Newton method and its global per- ","element":"span"},{"text":"formance. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Mathematical Programming, Series A","element":"span"},{"text":", 108:177–205, 2006.","element":"span"}],[{"id":"id-3","text":"Yurii Nesterov. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Introductory Lectures on Convex Optimization - A Basic Course","element":"span"},{"text":". Springer Science+Business Media, LLC, 2004.","element":"span"}],[{"id":"id-1","text":"Boris T. Polyak. Some methods of speeding up the convergence of iteration methods. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"USSR Computational Mathematics and Mathematical Physics","element":"span"},{"text":", 4(5):1–17, 1964.","element":"span"}],[{"id":"id-63","text":"Boris T. Polyak. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Introduction to Optimization","element":"span"},{"text":". Optimization Software, Inc., 1987.","element":"span"}],[{"id":"id-65","text":"Walter Rudin. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Principles of Mathematical Analysis","element":"span"},{"text":". McGraw-Hill, third edition, 1976.","element":"span"}],[{"id":"id-44","text":"J. M. Sanz-Serna. Symplectic integrators for Hamiltonian problems: an overview. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Acta Numerica","element":"span"},{"text":", pages 243–286, 1992.","element":"span"}],[{"id":"id-41","text":"Shankar Sastry. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Nonlinear Systems","element":"span"},{"text":". Springer, 1999.","element":"span"}],[{"id":"id-9","text":"Damien Scieur, Vincent Roulet, Francis Bach, and Alexandre d’Aspremont. Integration methods ","element":"span"},{"text":"and optimization algorithms. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems 30","element":"span"},{"text":", pages 1109–1118, 2017.","element":"span"}],[{"id":"id-10","text":"Weijie Su, Stephen Boyd, and Emmanuel J. Cand`es. A differential equation for modeling Nesterov’s ","element":"span"},{"text":"accelerated gradient method: Theory and insights. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Machine Learning Research","element":"span"},{"text":", 17 (153):1–43, 2016.","element":"span"}],[{"id":"id-86","text":"Roscoe B White. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Asymptotic Analysis of Differential Equations","element":"span"},{"text":". Imperial College Press, 2010.","element":"span"}],[{"id":"id-13","text":"Andre Wibisono, Ashia C. Wilson, and Michael I. Jordan. A variational perspective on accelerated ","element":"span"},{"text":"methods in optimization. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the National Academy of Sciences","element":"span"},{"text":", 113(47):E7351– E7358, 2016.","element":"span"}]]}],"_version":"3.3.2"},"paperNode":"$28:props:children:props:children:0:props:product"}]]]}]}]